Python: The Optimization Ladder (opens in new tab)

(cemrehancavdar.com)

349 pointsTwirrim15d ago147 comments

147 comments

    CPython 3.13 went further with an experimental copy-and-patch JIT compiler -- a lightweight JIT that stitches together pre-compiled machine code templates instead of generating code from scratch. It's not a full optimizing JIT like V8's TurboFan or a tracing JIT like PyPy's;

Good news. Python 3.15 adapts Pypy tracing approach to JIT and there are real performance gains now:

https://github.com/python/cpython/issues/139109

https://doesjitgobrrr.com/?goals=5,10

josalhor12d ago

While this is great, I expected faster CPython to eventually culminate into what YJIT for Ruby is. I'm not sure the current approaches they are trying will get the ecosystem there.

kenjin409611d ago

I implemented most of the tracing JIT frontend in Python 3.15, with help from Mark to clean up and fix my code. I also coordinated some of the community JIT optimizer effort in Python 3.15 (note: NOT the code generator/DSL/infra, that's Mark, Diego, Brandt and Savannah). So I think I'm able to answer this.

I can't speak for everyone on the team, but I did try the lazy basic block versioning in YJIT in a fork of CPython. The main problem is that the copy-and-patch backend we currently have in CPython is not too amenable to self-modifying machine code. This makes inter-block jumps/fallthroughs very inefficient. It can be done, it's just a little strange. Also for security reasons, we tried not to have self-modifying code in the original JIT and we're hoping to stick to that. Everything has their tradeoffs---design is hard! It's not too difficult to go from tracing to lazy basic blocks. Conceptually they're somewhat similar, as the original paper points out. The main thing we lack is the compact per-block type information that something like YJIT/Higgs has.

I guess while I'm here I might as well make the distinction:

- Tracing is the JIT frontend (region selection).

- Copy and Patch is the JIT backend (code generation).

We currently use both. PyPy uses meta-tracing. It traces the runtime itself rather than the user's code in CPython's tracing case. I did take a look at PyPy's code, and a lot of ideas in the improved JIT are actually imported from PyPy directly. So I have to thank them for their great ideas. I also talk to some of the PyPy devs.

Ending off: the team is extremely lean right now. Only 2 people were generously employed by ARM to work on this full time (thanks a lot to ARM too!). The rest of us are mostly volunteers, or have some bosses that like open source contributions and allow some free time. As for me, I'm unemployed at the moment and this is basically my passion project. I'm just happy the JIT is finally working now after spending 2-3 years of my life on it :). If you go to Savannah's website [1], the JIT is around 100% faster for toy programs like Richards, and even for big programs like tomli parsing, it's 28% faster on macOS AArch64. The JIT is very much a community effort right now.

[1]: https://doesjitgobrrr.com/?goals=5,10

PS: If you want to see how the work has progressed, click "all time" in that website, it's pretty cool to see (lower is faster). I have a blog explaining how we made the JIT faster here https://fidget-spinner.github.io/posts/faster-jit-plan.html.

1 more reply

pjmlp11d ago

Now this is great to know.

__mharrison__12d ago

Great writeup.

I've been in the pandas (and now polars world) for the past 15 years. Staying in the sandbox gets most folks good enough performance. (That's why Python is the language of data science and ML).

I generally teach my clients to reach for numba first. Potentially lots of bang for little buck.

One overlooked area in the article is running on GPUs. Some numpy and pandas (and polars) code can get a big speedup by using GPUs (same code with import change).

bloaf11d ago

Taichi, benchmarked in the article, claims to be able to outperform CUDA at some GPU tasks, although their benchmarks look to be a few years old:

https://github.com/taichi-dev/taichi_benchmark

pjmlp11d ago

And doesn't account for cuTitle, NVidia's new API infrastructure that supports writing CUDA directly in Python via a JIT that is based on MLIR.

redgridtactical11d ago

In practice the ladder has two rungs for me. Write it in Python with numpy/scipy doing the heavy lifting, and if that's not enough, rewrite the hot path in C. The middle steps always felt like they added complexity without fully solving the problem.

The JIT work kenjin4096 describes is really promising though. If the tracing JIT in 3.15 actually sticks, a lot of this ladder just goes away for common workloads.

bee_rider11d ago

Jax seems quite interesting even from this point of view… numpy has the same problem as blas basically, right? The limited interface. Eventually this leads to heresies like daxpby, and where does the madness stop once you’ve allowed that sort of thing? Better to create some sort of array language.

redgridtactical11d ago

Jax basically gives you the array language without leaving Python, and the XLA backend means you're not hand-tuning C for the GPU path. The numpy interface limitation is real though and once you need something that doesn't map cleanly to vectorized ops, you're either fighting the abstraction or dropping down anyway.

The daxpby example is a good one. Every time BLAS adds another special-case routine it's basically admitting the interface wasn't general enough. At some point you're just writing C with extra steps.

mathisfun12311d ago

this is a pointless (valueless) reductive take

seanwilson12d ago

> The real story is that Python is designed to be maximally dynamic -- you can monkey-patch methods at runtime, replace builtins, change a class's inheritance chain while instances exist -- and that design makes it fundamentally hard to optimize. ...

> 4 bytes of number, 24 bytes of machinery to support dynamism. a + b means: dereference two heap pointers, look up type slots, dispatch to int.__add__, allocate a new PyObject for the result (unless it hits the small-integer cache), update reference counts.

Would Python be a lot less useful without being maximally dynamic everywhere? Are there domains/frameworks/packages that benefit from this where this is a good trade-off?

I can't think of cases in strong statically typed languages where I've wanted something like monkey patching, and when I see monkey patching elsewhere there's often some reasonable alternative or it only needs to be used very rarely.

adamzwasserman11d ago

The dynamism exists to support the object model. That's the actual dependency. Monkey-patching, runtime class mutation, vtable dispatch. These aren't language features people asked for. They're consequences of building everything on mutable objects with identity.

Strip the object model. Keep Python.

You get most of the speed back without touching a compiler, and your code gets easier to read as a side effect.

I built a demo: Dishonest code mutates state behind your back; Honest code takes data in and returns data out. Classes vs pure functions in 11 languages, same calculation. Honest Python beats compiled C++ and Swift on the same problem. Not because Python is fast, but because the object model's pointer-chasing costs more than the Python VM overhead.

Don't take my word for it. It's dockerized and on GitHub. Run it yourself: honestcode.software, hit the Surprise! button.

adamzwasserman10d ago

Correction. I copied some incorrect values from my test harness. So Honest Python does NOT beat Dishonest Swift.

But it does beat the pants off of JS/TS on V8 which is quite the surprise.

Also in the surprise category is that Honest Java is more than 2x faster than dishonest c++.

bloaf11d ago

I've always thought the flexibility should allow python to consume things like gRPC proto files or OpenAPI docs and auto-generate the classes/methods at runtime as opposed to using codegen tools. But as far as I know, there aren't any libraries out there actually doing that.

haimez11d ago

Generating code at runtime is often an anti-goal because you can’t easily introspect it. “Build-time” generation gives you that, but print often choose to go further and check the generated code to source control to be able to see the change history.

1 more reply

skeledrew11d ago

But it's an fairly easy build if you want any of that.

NeutralForest11d ago

There are some use cases for very dynamic code, like ORMs; with descriptors you can add attributes + behavior at runtime and it's quite useful. Anyways, breaking metaprogramming and more dynamic features would mean python 4 and we know how 2 -> 3 went. I also don't think it's where the core developers are going. Also also, there are other things I'd change before going after monkey patching like some scoping rules, mutable defaults in function attributes, better async ergonomics, etc.

LtWorf12d ago

I've used a library that patches the zipfile module to add support for zstd compression in zipfiles.

In python3.14 the support is there, but 2 years ago you could just import this library and it would just work normally.

repple11d ago

Significant AI smell in this write up. As a result, my current reflex is to immediately stop reading. Not judgement on the actual analysis and human effort which went in. It’s just that the other context is missing.

huseyinkeles11d ago

The author is from Turkey (where I’m also originally from).

Believe it or not, when you write a blog post in a different language, it really helps to use an LLM, even just to fix your grammar mistakes etc.

I assume that’s most likely what happened here too.

shepherdjerred11d ago

IMO it would make sense to add a disclaimer then, e.g. “I wrote this myself but had AI edit”

I have no problem with people using AI, especially to close a language gap.

If you disclose your usage I have a _lot_ more trust that effort has been put into the writing despite the usage

retsibsi11d ago

I do believe it, but for whatever it's worth (maybe not much!):

If the author is willing and able to write understandable English, I'd prefer to read their version (even if it's very imperfect) than the LLM-polished version.

Alternatively, I'll happily read an article that was written in the author's native language and then translated directly to English.

This one bothered me because it's pretty clearly neither of those things, and so it reads just like any other LLM-written/LLM-polished piece.

[edit: just realised 'willing and able' might sound snarky in some way! All I meant was to acknowledge that even if you can write in a second (or third, etc.) language, you might not want to]

repple11d ago

I believe it

butterNaN11d ago

Honestly I'd rather read imperfect english

canjobear11d ago

Here's what gave it away for me

> The remaining difference is noise, not a fundamental language gap. The real Rust advantage isn't raw speed -- it's pipeline ownership.

repple11d ago

There’s an unmistakable rhythm beginning with first paragraph. The trigger was “Same problems, same Apple M4 Pro, real numbers.” in third for me.

I’m scarred to detect these things by my own AI usage.

https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

jb_hn11d ago

I didn't notice any signs of AI writing until seeing this comment and re-reading (though I did notice it on the second pass).

That said, I think this article demonstrates that focusing on whether or not an article used AI might be focusing on the wrong “problem.” I appreciate being sensitive to the "smell" (the number of low-effort, AI posts flying around these days has made me sensitive too), but personally, I found this article both (1) easy to read and (2) insightful. I think the number of AI-written content lacking (2) is the problem.

repple11d ago

Your initial focus is to prioritize which content to consume.

markisus11d ago

I also seem to be developing an immune response to several slopisms. But the actual content is useful for outlining tradeoffs if you’re needing to make your Python code go faster.

alisonatwork11d ago

I have the same issue now. It's especially annoying when it happens while reading a "serious" publication like a newspaper or long form magazine. Whether it was because an AI wrote it or "real" writers have spent so much time reading AI slop they've picked up the same style is kinda by the by. It all reads to me like SEO, which was the slop template that LLMs took their inspiration from, apparently. It just flattens language into the most exhausting version of it, where you need to try to subconsciously blank out all the unnecessary flourishes and weird hype phrases to try figure out what actually is trying to be said. I guess humans who learn to ignore it might to do better in this brave new world, but it's definitely annoying that humans are being forced to adapt to machines instead of the other way around.

1 more reply

sirfz11d ago

Fwiw, I thought the article is full of great information and well researched. I think your reflex is holding you back.

genxy11d ago

So much "is real". It is ok to check your grammar, but this is slopabetes inducing.

sutib11d ago

I had the same reflex. I just can't read AI redacted stuff. I don't know why it just reads very hollow and 'icky'

MonkeyClub11d ago

I got the same sense, but nowadays I can't be sure whether a text is AI or the writer's style has absorbed LLM tropes.

Terretta11d ago

“The numbers are real.” But the voice is not.

pjmlp11d ago

If we only applied the same reflex to software, even when 100% human programmed.

cycomanic11d ago

What is the point of your post? I find it increasingly tedious to read comments about alledged AI use under almost every post. It's like complaining that you didn't want to read the submission because you didn't like their font or website design.

I think almost everyone here agrees they don't want to read AI slop, but this submission clearly wasn't that as you admit yourself.

FusionX11d ago

I don't think it should be conflated with auto generated AI slop. I see a lot of snippets which were clearly manually written. I'm assuming the author used AI in a supervised manner, to smooth out the writing process and improve coherency.

intoXbox11d ago

Great write up and recognisable performance. For a pipeline with many (~50) build dependencies unfortunately switching interpreter or experimenting with free threading is not an easy route as long as packages are not available (which is completely understandable).

I’m not one of these rewrite in Rust types, but some isolated jobs are just so well sorted for full control system programming that the rust delegation is worth the investment imo.

Another part worth investigating for IO bound pipelines is different multiprocessing techniques. We recently got a boost from using ThreadPoolExecutor over standard multiprocessing, and careful profiling to identify which tasks are left hanging and best allocated its own worker. The price you pay though is shared memory, so no thread safety, which only works if your pipeline can be staggered

rusakov-field12d ago

Python is perfect as a "glue" language. "Inner Loops" that have to run efficiently is not where it shines, and I would write them in C or C++ and patch them with Python for access to the huge library base.

This is the "two language problem" ( I would like to hear from people who extensively used Julia by the way, which claims to solve this problem, does it really ?)

jakobnissen11d ago

I have used Julia for my main language for years. Yes, it really does solve the two language problem. It really is as fast as C and as expressive as Python.

It then gives you a bunch of new problems. First and foremost that you now work in a niche language with fewer packages and fewer people who can maintain the code. Then you get the huge JIT latency. And deployment issues. And lack of static tooling which Rust and Python have.

For me, as a research software engineer writing performance sensitive code, those tradeoffs are worth it. For most people, it probably isn’t. But if you’re the kind of person who cares about the Python optimization ladder, you should look into Julia. It’s how I got hooked.

FacelessJim11d ago

As a sibling comment mentions, yes it does. Just don’t expect to have code that runs as fast as C without some effort put into it. You still need to write your program in a static enough way to obtain those speed. It’s not the easiest thing in the world, since the tooling is, yes, improving but is still not there yet.

If you then want to access fully trimmed small executables then you have to start writing Julia similarly to how you write rust.

To me the fact that this is even possible blows my mind and I have tons of fun coding in it. Except when precompiling things. That is something that really needs to be addressed.

pjmlp11d ago

This problem has been solved already by Lisp, Scheme, Java, .NET, Eiffel, among others, with their pick and choose mix of JIT and AOT compiler toolchains and runtimes.

eigenspace10d ago

No, those languages have not solved it. None of the languages you list there are actually as fast as C for tight inner loops, they sometimes get close under certain circumstances, but they're still very much 2nd class languages in terms of performance.

They're only "fast" compared to slow interpreted languages like Python.

1 more reply

hrmtst9383711d ago

If you're patching hot paths with C and praying the interface layer doesn't explode, you can spend almost as much time chasing ABI boundary bugs as you save on perf. Type hints in Python are still docs for humans and maybe your LSP. Julia does address the two language problem in theory, but getting your whole stack and your deps to exist there is its own wierd pain, and people underplay how much library inertia matters once you leave numerics.

kristianp11d ago

          nbody spectral-norm
    C     2100ms    400ms
    Graal  211ms    212ms
    PyPy    98ms   1065ms

Seeing Graal and Pypy beat the gcc C versions suggests to me there's something wrong with the C version. Perhaps they need a -march=native or there's something else wrong. The C version would be a different implementation in the benchmark game, but usually they are highly optimised.

Edit: looking at [1] the top C version uses x86 intrinsics, perhaps the article's writer had to find a slower implementation to have it running natively on his M4 Pro? It would be good to know which C version he used, there's a few at [1]. The N-body benchmark is one where they specify that the same algorithm must be used for all implementations.

[1] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

kristianp10d ago

I've checked some run times on my M1 mac mini, and have realised the C run time of 2.1s for n-body is the figure from the benchmark game itself, which is 50 million iterations on a very old i5. It would have made sense if they'd run the C version on their M4 pro and used the same number of iterations to get a true comparison there.

Obviously the main point of the article is to compare different python optimisations, however "rewrite it in C/C++/rust/Go" is an option that should be considered, and none of his optimisations on his M4 Pro beat the C option on my 6-year old M1 mac mini.

The rest of the numbers in the blog post use a 500k iterations for the nbody simulation. Here's my numbers on the M1 mac using the default Clang installed with xcode:

    Clang 17.0.0.   0.06s
    python 3.12.11  1.59s
    pypy 3.11.13.   0.23s

I used the fastest C code that doesn't use intrinsics at [1] and compiled with

clang -O3 -march=native nbody-gcc-6.c -o nbody.clang6

Used the python version from [2].

[1] http://benchmarksgame-team.pages.debian.net/benchmarksgame/p...

[2] https://github.com/cemrehancavdar/faster-python-bench/tree/m...

igouy9d ago

Here are a few naive un-optimised single-thread #8 programs transliterated line-by-line literal style into different programming languages from the same original.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

mkoubaa11d ago

The language itself is not the issue, the implementations are wildly different in other ways

kristianp11d ago

The nbody sim at least is forced to use the same algorithm. It seems unlikely that an optimised pypy (non-BLAS) implementation beats an optimised C imp by 20x.

1 more reply

blt11d ago

Surprised Python is only 21x slower than C for tree traversal stuff. In my experience that's one of the most painful places to use Python. But maybe that's because I use numpy automatically when simple arrays are involved, and there's no easy path for trees.

tweakimp11d ago

Be careful with that, numpy arrays can be slower than Python tuples for some operations. The creation is always slower and the overhead has to be worth it.

__rito__11d ago

Yeah. Many seem to forget it. For one-off computation tasks, NumPy, PyTorch, JAX have non-trivial overhead, and might even be slower than vanilla Python. Only when repetition, loops, etc. come into the picture, which is recurring in many people’s workflow - JAX or NumPy is worth it.

AlotOfReading11d ago

You can turn trees into numpy-style matrix operations because graphs and matrices are two sides of the same coin. I don't see the code for the binary-tree benchmark in the repo to see how it's written, but there are libraries like graphblas that use the equivalence for optimization.

pjmlp11d ago

Kudos for going through all the existing JIT approaches, instead of reaching for rewrite into X straight away.

However if Rust with PyO3 is part of the alternatives, then Boost.Python, cppyy, and pybind11 should also be accounted for, given their use in HPC and HFT integrations.

gregjm10d ago

> I don't know JAX well enough to explain exactly why it's 3x faster than NumPy on the same matrix multiplications.

JAX is basically a frontend for the XLA compiler, as you note. The secret sauce is two insights - 1) if you have enough control, you can modify the layout of tensor computations and permute them so they don’t have to match that of the input program but have a more favorable memory access pattern; 2) most things are memory bound, so XLA creates fusion kernels that combine many computations together between memory accesses. I don’t know if the Apple BLAS library has fused kernels with GEMM + some output layer, but XLA is capable of writing GEMM fusions and might pick them if they autotune faster on given input/output shapes.

> But I haven't verified that in detail. Might be time to learn.

If you set the environment variable XLA_FLAGS=--dump_hlo_to=$DIRECTORY then you’ll find out! There will be a “custom-call” op if it’s dispatching to BLAS, otherwise it will have a “dot” op in the post-optimization XLA HLO for the module. See the docs:

https://openxla.org/xla/hlo_dumps

Mawr11d ago

Shockingly good article — correct identification of the root cause of performance issues being excessive dynamism and ranking of the solutions based on the value/effort ratio. Excellent taste. Will keep this in my back pocket as a quick Python optimization reference.

It's just somewhat unfortunate that I have to question every number and fact presented since the writing was clearly at least somewhat AI-assisted with the author seemingly not being upfront about that at all.

threethirtytwo11d ago

Being upfront about AI-assistance or no AI-assistance doesn't mean shit. Whether AI was involved is independent of what they state and there's no real way to fully prove otherwise.

superlopuh11d ago

Missing Muna[0][1], I'm curious how it would compare on these benchmarks.

[0]: https://www.muna.ai/ [1]: https://docs.muna.ai/predictors/create

gcanyon11d ago

People here on HN have in the past suggested that TypeScript is the superior-in-all-ways, just-as-easy/fun-to-code-in language and should replace Python in pretty much all use cases.

Anyone have an opinion on how TS would fare in this comparison?

pjmlp11d ago

Typescript is basically a JavaScript linter.

The benefit is that JavaScript JIT compilers have a few decades of research behind them, all the way back to Smalltalk and SELF.

So in many cases you can still stay with V8 or JavaScript Core, instead of rewriting into something else, regardless of the whole rewriting into Rust that is now fashionable.

0xpgm11d ago

As a mostly Python programmer and partly TypeScript programmer, my subjective thought is that a bit more 'noise' with TypeScript than Python.

Just a little more to parse with my eyes and a little more to type with TypeScript.

But hey, with all these cool kids with their AI coding agents, reading and handwriting code may soon be obsolete!

mwkaufma11d ago

All the approaches beyond PyPy are to either use a different lang that's superficially similar to python or to write a native extension for python in a different language, which is at odds with the stated premise.

igouy9d ago

> "The Benchmarks Game problems are pure compute: tight loops, no I/O, no data structures beyond arrays."

iirc reverse-complement reads and writes a GB, fasta and mandelbrot write, regex-redux reads, k-nucleotide reads and uses a hash table.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

Trickery583711d ago

It's missing the easiest of the choices: core performance-sensitive code in C, interface it to python with pybind11, build app in python. Small stack, huge gains, best of both worlds.

blitzar11d ago

It's missing the easiest of the choices: grab a coffee and come back when it has run.

markisus11d ago

I wish there were more details on this part.

> Missing @cython.cdivision(True) inserts a zero-division check before every floating-point divide in the inner loop. Millions of branches that are never taken.

I thought never taken branches were essentially free. Does this mean something in the loop is messing with the branch predictor?

pavpanchekha11d ago

They're cheap but not free, especially at the front end of the CPU where it's just a lot more instructions to churn through. What the branch predictor gets you is it turns branches, which would normally cause a pipeline bubble, to be executed like straightline code if they're predicted right. It's a bit like a tracing jit. But you will still have a bunch of extra instructions to, like, compute the branch predicate.

beng-nl11d ago

Worse, IMO, is the never taken branch taking up space in branch prediction buffers. Which will cause worse predictions elsewhere (when this branch ip collides with a legitimate ip). Unless I missed a subtlety and never taken branches don’t get assigned any resources until they are taken (which would be pretty smart actually).

gsnedders11d ago

From when I was working on optimizing one or two things with Cython years ago, it wasn’t per-se the branch cost that hurt: it was impeding the compiler from various loop optimisations, potentially being the impediment from going all the way to auto-vectorisation.

adsharma11d ago

Missing: write static python and transpile to rust pyO3 which is at the top of the ladder.

Some nuance: try transpiling to a garbage collected rust like language with fast compilation until you have millions of users.

Also use a combination of neural and deterministic methods to transpile depending on the complexity.

zahlman11d ago

> a garbage collected rust like language with fast compilation

I don't know what languages you might have in mind. "Rust-like" in what sense?

pjmlp11d ago

Probably OCaml, Standard ML, Haskell, MLton, F#, Scala,....

If going to complain about some of those being slow, remeber that they have various options between interpreter, bytecode, REPL, JIT and AOT.

1 more reply

gf00011d ago

Not parent, but basically every ML? OCaml, but also Scala/Kotlin to a certain degree Java, C# are all good choices.

adsharma11d ago

It's not a popular thing to say on social media.

V-lang is the one I'm tinkering with. It's like rust in terms of pattern matching as an expression, sum types, ?T instead of exceptions.

Like golang, it has shorter compile times.

I try to keep my argument abstract (that you need to lower python to something intermediate before rust) for that reason.

1 more reply

tda11d ago

One thing with python is that usually I will use one of the many c based libraries to get reasonable speed and well thought out abstractions from the start. I architect around numpy, scipy, shapely, pandas/polars or whatever. So my code runs at reasonable speed from the start. But transpiling to rust then effectively means a complete redesign of the code, data structures, algorithms etc. And I have seen the AI tools really struggle to get it right, as my intent gets lost somewhere.

So what I do now (since Claude Code) is write really bare bones (and slow) pure python implementation (like I used to do for numba, pypy or cython ready code), with minimal dependencies. Then I use the REPL, notebooks and nice plotting tools to get a real understanding of the problem space and the intricacies of my algorithm/problem at hand. When done, I let Claude add tests and I ask it to transpile to equivalent Rust and boom! a flawless 1000x speed upgrade in a minutes.

The great thing is I don't need to do the mental gymnastics to vectorize code in a write only mode like I've had to do since my Matlab days. Instead I can write simple to read for loops that follow my intent much better, and result in much more legible code. So refreshing!

And with pyO3 i can still expose the Rust lib to python, and continue to use Python for glue and plotting

adsharma11d ago

Cython and all the libs you mention use the c-api, which is the #1 thing python needs to lose to be competitive.

I wish someone writes a stdlib without using it. My attempt from a few months ago in a repo under the py2many org.

1 more reply

LarsDu8811d ago

I love how in an article about making python faster, the fastest option is to simply write Rust, lol

pjmlp11d ago

That has been a thing forever, many "Python" libraries, are actually bindings to C, C++ and Fortran.

The culture of calling them "Python" is one reason why JITs are so hard to gain adoption in Python, the problem isn't the dynamism (see Smalltalk, SELF, Ruby,...), rather the culture to rewrite code in C, C++ and Fortran code and still call it Python.

falcor8411d ago

There's no surprise that Rust is faster to run, but I don't think there are many who would claim that Rust is faster to write.

scuff3d11d ago

Go and Java/C# (if you forgo all the OOP nonsense) aren't much harder to write than Python, and you get far better performance. Not all the way to Rust level, bur close enough for most things with far less complexity.

1 more reply

orochimaaru11d ago

Maybe with LLM/Code Assistance this effort reduces? Since we're mostly talking mathematics here, you have well defined algorithms that don't need to be "vibed". The codegen, hopefully, is consistent.

1 more reply

alihawili11d ago

when dealing with JSON in cpython, I always use msgspec, performance gains is huge

superbatfish9d ago

When I read an article about Python optimizations, I typically expect to have significant objections. But this one was great, actually.

IshKebab11d ago

Instead of just using a language that isn't dog slow, why not jump through these 5 different hoops? It's much easier!

atomicnumber311d ago

Because for 99% of cases python is fast enough and it's fast as fuck to code. And for the 1% that aren't, you have 50 different flavors of making it faster. And the final of which is "slap pybind on a c module to do the hot path in C" which then lets you minimize the suffering of C into a single high value location. And the rest of the code still gets to be Python.

IshKebab11d ago

> it's fast as fuck to code

In my experience it's no faster than other better languages like Go, Rust or Kotlin.

> And for the 1% that aren't, you have 50 different flavors of making it faster.

Only for numerical code. You can't use something like Numpy to make Django or Mercurial faster.

And even when you could feasibly do the thing that everyone says to do - move part of your code to a faster language - the FFI is so painful (it always is) that you are much better just doing everything in that faster language from the start.

All of the effort you have to go through to make Python not slow is far less work than just "don't use Python". You can write Rust without thinking about performance and it will automatically be 20-200x faster than Python.

I actually did rewrite a Python project 1:1 in Rust once and it was approximately 50x faster. I put no effort into optimising the Rust code.

threethirtytwo11d ago

>The usual suspects are the GIL, interpretation, and dynamic typing. All three matter, but none of them is the real story. The real story is that Python is designed to be maximally dynamic -- you can monkey-patch methods at runtime, replace builtins, change a class's inheritance chain while instances exist -- and that design makes it fundamentally hard to optimize.

ok I guess the harder question is. Why isn't python as fast as javascript?

gsnedders11d ago

Beyond the economic arguments, there’s a lot in JS that actually makes it a lot easier: almost all of the operators can only return a subset of the types and cannot be overridden (e.g., the binary + operator in JS can only return a string or a number primitive), the existence of like string and number primitives dramatically reduce the amount of dynamic behaviour they can have, only proxy objects can exhibit the same amount of dynamism as arbitrary Python ones (and thus only they pay the performance cost)…

gf00011d ago

The more real answer is that python's primary usage is a glue language. It has to be able to interface with various C libraries, and to make the interfaces even more ergonomic, they exposed several internal details on how code is evaluated that libraries make use of (e.g. you can increment/decrement a ref counted python object's counter from C).

This pretty much makes it impossible to change many of the internal details, and to significantly optimize it.

If we remove this requirement, we get the alternative runtimes and if you check e.g. GraalPy, it has the same order of performance as JS, so your intuition is right. It's just that you have to drop supporting a good chunk of what people use Python for which is obviously a no go for most applications. (Note: GraalPy can actually also run some C libraries and in this case can cross-optimize across python and C!)

12_throw_away11d ago

> ok I guess the harder question is. Why isn't python as fast as javascript?

Actually there is a pretty easy answer: worldwide, the amount of javascript being evaluated every day is many orders of magnitude higher than the amount of python. The amount of money available for optimizing it has thus been many orders of magnitude higher as well.

saagarjha11d ago

I don’t think the answer is that easy. Python is typically run on the server and JavaScript is client-side, which means that the incentives are aligned to optimize Python rather than JavaScript. I think investment in each follows and the difference is more that JavaScript runs in an isolated environment with a more flexible runtime.

1 more reply

threethirtytwo11d ago

I mean the technical reasons. Because every reason stated in my quote is applicable to JavaScript as well.

retsibsi12d ago

A personal opinion: I would much prefer to read the rough, human version of this article than this AI-polished version. I'm interested in the content and the author clearly put thought and effort into it, but I'm constantly thrown out of it by the LLM smell. (I'm also a bit mad that `--` is now on the em dash treadmill and will soon be unusable.)

I'm not just saying this to vent. I honestly wonder if we could eventually move to a norm where people publish two versions of their writing and allow the reader to choose between them. Even when the original is just a set of notes, I would personally choose to make my own way through them.

jaharios11d ago

json.loads is something you don't want to use in a loop if you care for performance at all. Just simple using orjson can give you 3x speed without the need to change anything.

1 more reply

kelvinjps1012d ago

Great post saved it for when I need to optimize my python code

viktorcode11d ago

I was hoping for Mojo to appear as optimisation strategy

zahlman11d ago

The replacement of emdashes with double hyphens here is almost insulting. A look through the blog history suggests that the author has no issue writing in English normally, and nothing seems really off about the actual findings here (or even the speculation about causes etc.), so I really can't understand the motivation for LLM-generated prose. (The author's usual writing style appears to have some arguable LLM-isms, but they make a lot more sense in context and of course those patterns had to come from somewhere. The overall effect is quite different.)

Edit: it's strange to get downvoted while also getting replies that agree with me and don't seem to object.

(Also, I thought it wasn't supposed to be possible to edit after getting a reply?)

hydrolox11d ago

Yea while reading, I just didn't understand how you end up LLM writing the article? Clearly, the data and writeup are real. But, was it "edited" with an LLM? It looks closer to ~the entire thing being LLM written. I finished reading because the topic is interesting, but the LLM writing style is difficult to bear.. and I agree with your point that trying to fool us that it's human with `--` is just absurd

adammarples11d ago

Same problems, same Apple M4 Pro, real numbers.

arlattimore11d ago

What a great article!

skeledrew11d ago

I must admit that I'm amused by the people who find the writeup useful but are turned off by the AI "smell". And look forward to the day when all valued content reeks of said "smell"; let's see what detractors-for-no-good-reason do then (yes I'm a bit ticked by the attitude).

achierius11d ago

Isn't this a depressing thought? Regardless of AI, to think that everything we read would come in the same literary style, conveying little of the author, giving no window through which to learn about who they are -- that would be a real loss.

repple11d ago

Ultimately it’s up to the author to make that explicit choice. I think that AI does and will enhance writing and depth and breadth of analysis one could perform. But, to be trustworthy, people will need to either lay out all cards on the table and/or work on other ways to gain trust over time. Maybe people need to provide some context to communicate what model was used and in which ways. What % of final output is AI vs author. I mean, if I see 100% composed by human author stated somewhere then there’s my cue to at the very least learn a little about the author. Certainly more complexity and discernment for readers. Depressing? In some ways maybe; but I’m kind of optimistic. Imagine what Tolkien could worldbuild armed with AI.. but then it wouldn’t be Tolkien.

skeledrew9d ago

Not at all depressing. If authors/creators want readers/consumers to know who they are, then they'll do so in whatever way they consider acceptable to them.

zahlman11d ago

Why is it amusing?

How can you suppose that this is not a good reason to object, especially days after https://news.ycombinator.com/item?id=47340079 ?

I find the style so reflexively grating that it's honestly hard for me to imagine others not being bothered by it, let alone being bothered by others being bothered.

Especially since I looked at previous posts on the blog and they didn't have the same problem.

skeledrew9d ago

It's amusing because it's essentially "much ado about nothing". It's also contradictory, as the claim is AI "slop" destroys value, yet here is a case - and I'm pretty sure there are others - where value isn't destroyed, which destroyed said claim. The problem isn't the use of AI for generating/fixing content; it's the publishing of bad content, which AI magnifies, as out magnifies other things.

shepherdjerred11d ago

The smell makes me suspicious because I don’t know how the author used AI.

If the author wrote a detailed rough draft, had AI edit, reviewed the output thoroughly, and has the domain knowledge to know if the AI is correct, then this could be a useful piece.

I suspect most authors _don’t_ fall in that bucket.

skeledrew9d ago

It's already established that the piece is useful but several readers. There's no detracting from that. Why does the process matter so much as soon as AI comes in the picture?

retsibsi11d ago

> detractors-for-no-good-reason

It's partly just a matter of taste; we can disagree on whether that's a good reason, but I'd be surprised if there were no writing styles that you personally find offputting.

The LLM smell is also a signal of low effort, and a signal that we as readers can't rely on our usual heuristics for judging credibility. The whole thing with LLMs is that they're great at producing polished, plausible-looking outputs, but they're still prone to bullshitting and making errors that don't match the usual human patterns. (And of course they're a great tool for churning out human-initiated disinformation.) If you don't have any kind of immune response against the LLM smell, I reckon you're probably absorbing more bs than you realise.

skeledrew9d ago

> signal of low effort

Is low effort really a valuable signal though? Or is it what's actually in the content that's valuable? Like here readers are literally saying that they found the content valuable "but AI smell". Why is there a "but"? Would there be a similar issue if the author had contracted a human assistant to do X? Definitely not, and I see no reason why the treatment should be different for AI.

pjmlp11d ago

Yeah, while posting how they are using Claude to do something really amazing.

0coCeo5d ago

test

tpoacher11d ago

"I totally get a kick out of the peeps who find the writeup super helpful yet are totally put off by that distinct "AI smell"—it’s like they can't even! Just imagine when everything we value is woven into a tapestry of that same "smell"—where will all the naysayers retreat to then? It’s a little frustrating, honestly, and I’m just like, come on! Let’s delve into this new era of content and embrace the chaos!"

There, FTFY :D

perching_aix11d ago

> language slow

> looks inside

> the reference implementation of language is slow

Despite its content, this blogpost also pushes this exact "language slow" thinking in its preamble. I don't think nearly enough people read past introductions for that to be a responsible choice or a good idea.

The only thing worse than this is when Python specifically is outright taught (!) as an "interpeted language", as if an implementation-detail like that was somehow a language property. So grating.

zahlman11d ago

While I sympathize (and have said similar in the past), language design can (and in Python's case certainly does) hinder optimization quite a bit. The techniques that are purely "use a better implementation" get you not much further than PyPy. Further benefits come from cross-compilation that requires restricting access to language features (and a system that can statically be convinced that those features weren't used!), or indeed straight up using code written in a different language through an FFI.

But yes, the very terminology "interpreted language" was designed for a different era and is somewhere between misleading and incomprehensible in context. (Not unlike "pass by value".)

perching_aix11d ago

Absolutely, no doubt about that. I just find it a terrible way to approach from in general, as well as specifically in this case: swapping out CPython with PyPy, GraalPy, Taichi, etc. - as per the post - requires no code changes, yet results in leaps and bounds faster performance.

If switching runtimes yields, say, 10x perf, and switching languages yields, say, 100x, then the language on its own was "just" a 10x penalty. Yet the presentation is "language is 100x slower". That's my gripe. And these are apparently conservative estimates as per the tables in the OP.

Not that metering "language performance" with numbers would be a super meaningful exercise to begin with, but still. The fact that most people just go with CPython does not escape me either. I do wonder though if people would shop for alternative runtimes more if the common culture was more explicitly and dominantly concerned with the performance of implementations, rather than of languages.

1 more reply

j / k navigate · click thread line to collapse

147 comments

Ralfp12d ago

    CPython 3.13 went further with an experimental copy-and-patch JIT compiler -- a lightweight JIT that stitches together pre-compiled machine code templates instead of generating code from scratch. It's not a full optimizing JIT like V8's TurboFan or a tracing JIT like PyPy's;

Good news. Python 3.15 adapts Pypy tracing approach to JIT and there are real performance gains now:

https://github.com/python/cpython/issues/139109

https://doesjitgobrrr.com/?goals=5,10

josalhor12d ago

While this is great, I expected faster CPython to eventually culminate into what YJIT for Ruby is. I'm not sure the current approaches they are trying will get the ecosystem there.

kenjin409611d ago

I guess while I'm here I might as well make the distinction:

- Tracing is the JIT frontend (region selection).

- Copy and Patch is the JIT backend (code generation).

[1]: https://doesjitgobrrr.com/?goals=5,10

1 more reply

pjmlp11d ago

Now this is great to know.

__mharrison__12d ago

Great writeup.

I've been in the pandas (and now polars world) for the past 15 years. Staying in the sandbox gets most folks good enough performance. (That's why Python is the language of data science and ML).

I generally teach my clients to reach for numba first. Potentially lots of bang for little buck.

One overlooked area in the article is running on GPUs. Some numpy and pandas (and polars) code can get a big speedup by using GPUs (same code with import change).

bloaf11d ago

Taichi, benchmarked in the article, claims to be able to outperform CUDA at some GPU tasks, although their benchmarks look to be a few years old:

https://github.com/taichi-dev/taichi_benchmark

pjmlp11d ago

And doesn't account for cuTitle, NVidia's new API infrastructure that supports writing CUDA directly in Python via a JIT that is based on MLIR.

redgridtactical11d ago

The JIT work kenjin4096 describes is really promising though. If the tracing JIT in 3.15 actually sticks, a lot of this ladder just goes away for common workloads.

bee_rider11d ago

redgridtactical11d ago

The daxpby example is a good one. Every time BLAS adds another special-case routine it's basically admitting the interface wasn't general enough. At some point you're just writing C with extra steps.

mathisfun12311d ago

this is a pointless (valueless) reductive take

seanwilson12d ago

Would Python be a lot less useful without being maximally dynamic everywhere? Are there domains/frameworks/packages that benefit from this where this is a good trade-off?

adamzwasserman11d ago

Strip the object model. Keep Python.

You get most of the speed back without touching a compiler, and your code gets easier to read as a side effect.

Don't take my word for it. It's dockerized and on GitHub. Run it yourself: honestcode.software, hit the Surprise! button.

adamzwasserman10d ago

Correction. I copied some incorrect values from my test harness. So Honest Python does NOT beat Dishonest Swift.

But it does beat the pants off of JS/TS on V8 which is quite the surprise.

Also in the surprise category is that Honest Java is more than 2x faster than dishonest c++.

bloaf11d ago

haimez11d ago

1 more reply

skeledrew11d ago

But it's an fairly easy build if you want any of that.

NeutralForest11d ago

LtWorf12d ago

I've used a library that patches the zipfile module to add support for zstd compression in zipfiles.

In python3.14 the support is there, but 2 years ago you could just import this library and it would just work normally.

repple11d ago

huseyinkeles11d ago

The author is from Turkey (where I’m also originally from).

Believe it or not, when you write a blog post in a different language, it really helps to use an LLM, even just to fix your grammar mistakes etc.

I assume that’s most likely what happened here too.

shepherdjerred11d ago

IMO it would make sense to add a disclaimer then, e.g. “I wrote this myself but had AI edit”

I have no problem with people using AI, especially to close a language gap.

If you disclose your usage I have a _lot_ more trust that effort has been put into the writing despite the usage

retsibsi11d ago

I do believe it, but for whatever it's worth (maybe not much!):

If the author is willing and able to write understandable English, I'd prefer to read their version (even if it's very imperfect) than the LLM-polished version.

Alternatively, I'll happily read an article that was written in the author's native language and then translated directly to English.

This one bothered me because it's pretty clearly neither of those things, and so it reads just like any other LLM-written/LLM-polished piece.

[edit: just realised 'willing and able' might sound snarky in some way! All I meant was to acknowledge that even if you can write in a second (or third, etc.) language, you might not want to]

repple11d ago

I believe it

butterNaN11d ago

Honestly I'd rather read imperfect english

canjobear11d ago

Here's what gave it away for me

> The remaining difference is noise, not a fundamental language gap. The real Rust advantage isn't raw speed -- it's pipeline ownership.

repple11d ago

There’s an unmistakable rhythm beginning with first paragraph. The trigger was “Same problems, same Apple M4 Pro, real numbers.” in third for me.

I’m scarred to detect these things by my own AI usage.

https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

jb_hn11d ago

I didn't notice any signs of AI writing until seeing this comment and re-reading (though I did notice it on the second pass).

repple11d ago

Your initial focus is to prioritize which content to consume.

markisus11d ago

I also seem to be developing an immune response to several slopisms. But the actual content is useful for outlining tradeoffs if you’re needing to make your Python code go faster.

alisonatwork11d ago

1 more reply

sirfz11d ago

Fwiw, I thought the article is full of great information and well researched. I think your reflex is holding you back.

genxy11d ago

So much "is real". It is ok to check your grammar, but this is slopabetes inducing.

sutib11d ago

I had the same reflex. I just can't read AI redacted stuff. I don't know why it just reads very hollow and 'icky'

MonkeyClub11d ago

I got the same sense, but nowadays I can't be sure whether a text is AI or the writer's style has absorbed LLM tropes.

Terretta11d ago

“The numbers are real.” But the voice is not.

pjmlp11d ago

If we only applied the same reflex to software, even when 100% human programmed.

cycomanic11d ago

I think almost everyone here agrees they don't want to read AI slop, but this submission clearly wasn't that as you admit yourself.

FusionX11d ago

intoXbox11d ago

I’m not one of these rewrite in Rust types, but some isolated jobs are just so well sorted for full control system programming that the rust delegation is worth the investment imo.

rusakov-field12d ago

This is the "two language problem" ( I would like to hear from people who extensively used Julia by the way, which claims to solve this problem, does it really ?)

jakobnissen11d ago

I have used Julia for my main language for years. Yes, it really does solve the two language problem. It really is as fast as C and as expressive as Python.

FacelessJim11d ago

If you then want to access fully trimmed small executables then you have to start writing Julia similarly to how you write rust.

To me the fact that this is even possible blows my mind and I have tons of fun coding in it. Except when precompiling things. That is something that really needs to be addressed.

pjmlp11d ago

This problem has been solved already by Lisp, Scheme, Java, .NET, Eiffel, among others, with their pick and choose mix of JIT and AOT compiler toolchains and runtimes.

eigenspace10d ago

They're only "fast" compared to slow interpreted languages like Python.

1 more reply

hrmtst9383711d ago

kristianp11d ago

          nbody spectral-norm
    C     2100ms    400ms
    Graal  211ms    212ms
    PyPy    98ms   1065ms

[1] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

kristianp10d ago

The rest of the numbers in the blog post use a 500k iterations for the nbody simulation. Here's my numbers on the M1 mac using the default Clang installed with xcode:

    Clang 17.0.0.   0.06s
    python 3.12.11  1.59s
    pypy 3.11.13.   0.23s

I used the fastest C code that doesn't use intrinsics at [1] and compiled with

clang -O3 -march=native nbody-gcc-6.c -o nbody.clang6

Used the python version from [2].

[1] http://benchmarksgame-team.pages.debian.net/benchmarksgame/p...

[2] https://github.com/cemrehancavdar/faster-python-bench/tree/m...

igouy9d ago

Here are a few naive un-optimised single-thread #8 programs transliterated line-by-line literal style into different programming languages from the same original.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

mkoubaa11d ago

The language itself is not the issue, the implementations are wildly different in other ways

kristianp11d ago

The nbody sim at least is forced to use the same algorithm. It seems unlikely that an optimised pypy (non-BLAS) implementation beats an optimised C imp by 20x.

1 more reply

blt11d ago

tweakimp11d ago

Be careful with that, numpy arrays can be slower than Python tuples for some operations. The creation is always slower and the overhead has to be worth it.

__rito__11d ago

AlotOfReading11d ago

pjmlp11d ago

Kudos for going through all the existing JIT approaches, instead of reaching for rewrite into X straight away.

However if Rust with PyO3 is part of the alternatives, then Boost.Python, cppyy, and pybind11 should also be accounted for, given their use in HPC and HFT integrations.

gregjm10d ago

> I don't know JAX well enough to explain exactly why it's 3x faster than NumPy on the same matrix multiplications.

> But I haven't verified that in detail. Might be time to learn.

https://openxla.org/xla/hlo_dumps

Mawr11d ago

threethirtytwo11d ago

Being upfront about AI-assistance or no AI-assistance doesn't mean shit. Whether AI was involved is independent of what they state and there's no real way to fully prove otherwise.

superlopuh11d ago

Missing Muna[0][1], I'm curious how it would compare on these benchmarks.

[0]: https://www.muna.ai/ [1]: https://docs.muna.ai/predictors/create

gcanyon11d ago

People here on HN have in the past suggested that TypeScript is the superior-in-all-ways, just-as-easy/fun-to-code-in language and should replace Python in pretty much all use cases.

Anyone have an opinion on how TS would fare in this comparison?

pjmlp11d ago

Typescript is basically a JavaScript linter.

The benefit is that JavaScript JIT compilers have a few decades of research behind them, all the way back to Smalltalk and SELF.

So in many cases you can still stay with V8 or JavaScript Core, instead of rewriting into something else, regardless of the whole rewriting into Rust that is now fashionable.

0xpgm11d ago

As a mostly Python programmer and partly TypeScript programmer, my subjective thought is that a bit more 'noise' with TypeScript than Python.

Just a little more to parse with my eyes and a little more to type with TypeScript.

But hey, with all these cool kids with their AI coding agents, reading and handwriting code may soon be obsolete!

mwkaufma11d ago

igouy9d ago

> "The Benchmarks Game problems are pure compute: tight loops, no I/O, no data structures beyond arrays."

iirc reverse-complement reads and writes a GB, fasta and mandelbrot write, regex-redux reads, k-nucleotide reads and uses a hash table.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

Trickery583711d ago

It's missing the easiest of the choices: core performance-sensitive code in C, interface it to python with pybind11, build app in python. Small stack, huge gains, best of both worlds.

blitzar11d ago

It's missing the easiest of the choices: grab a coffee and come back when it has run.

markisus11d ago

I wish there were more details on this part.

> Missing @cython.cdivision(True) inserts a zero-division check before every floating-point divide in the inner loop. Millions of branches that are never taken.

I thought never taken branches were essentially free. Does this mean something in the loop is messing with the branch predictor?

pavpanchekha11d ago

beng-nl11d ago

gsnedders11d ago

adsharma11d ago

Missing: write static python and transpile to rust pyO3 which is at the top of the ladder.

Some nuance: try transpiling to a garbage collected rust like language with fast compilation until you have millions of users.

Also use a combination of neural and deterministic methods to transpile depending on the complexity.

zahlman11d ago

> a garbage collected rust like language with fast compilation

I don't know what languages you might have in mind. "Rust-like" in what sense?

pjmlp11d ago

Probably OCaml, Standard ML, Haskell, MLton, F#, Scala,....

If going to complain about some of those being slow, remeber that they have various options between interpreter, bytecode, REPL, JIT and AOT.

1 more reply

gf00011d ago

Not parent, but basically every ML? OCaml, but also Scala/Kotlin to a certain degree Java, C# are all good choices.

adsharma11d ago

It's not a popular thing to say on social media.

V-lang is the one I'm tinkering with. It's like rust in terms of pattern matching as an expression, sum types, ?T instead of exceptions.

Like golang, it has shorter compile times.

I try to keep my argument abstract (that you need to lower python to something intermediate before rust) for that reason.

1 more reply

tda11d ago

And with pyO3 i can still expose the Rust lib to python, and continue to use Python for glue and plotting

adsharma11d ago

Cython and all the libs you mention use the c-api, which is the #1 thing python needs to lose to be competitive.

I wish someone writes a stdlib without using it. My attempt from a few months ago in a repo under the py2many org.

1 more reply

LarsDu8811d ago

I love how in an article about making python faster, the fastest option is to simply write Rust, lol

pjmlp11d ago

That has been a thing forever, many "Python" libraries, are actually bindings to C, C++ and Fortran.

falcor8411d ago

There's no surprise that Rust is faster to run, but I don't think there are many who would claim that Rust is faster to write.

scuff3d11d ago

1 more reply

orochimaaru11d ago

Maybe with LLM/Code Assistance this effort reduces? Since we're mostly talking mathematics here, you have well defined algorithms that don't need to be "vibed". The codegen, hopefully, is consistent.

1 more reply

alihawili11d ago

when dealing with JSON in cpython, I always use msgspec, performance gains is huge

superbatfish9d ago

When I read an article about Python optimizations, I typically expect to have significant objections. But this one was great, actually.

IshKebab11d ago

Instead of just using a language that isn't dog slow, why not jump through these 5 different hoops? It's much easier!

atomicnumber311d ago

IshKebab11d ago

> it's fast as fuck to code

In my experience it's no faster than other better languages like Go, Rust or Kotlin.

> And for the 1% that aren't, you have 50 different flavors of making it faster.

Only for numerical code. You can't use something like Numpy to make Django or Mercurial faster.

I actually did rewrite a Python project 1:1 in Rust once and it was approximately 50x faster. I put no effort into optimising the Rust code.

threethirtytwo11d ago

ok I guess the harder question is. Why isn't python as fast as javascript?

gsnedders11d ago

gf00011d ago

This pretty much makes it impossible to change many of the internal details, and to significantly optimize it.

12_throw_away11d ago

> ok I guess the harder question is. Why isn't python as fast as javascript?

saagarjha11d ago

1 more reply

threethirtytwo11d ago

I mean the technical reasons. Because every reason stated in my quote is applicable to JavaScript as well.

retsibsi12d ago

jaharios11d ago

json.loads is something you don't want to use in a loop if you care for performance at all. Just simple using orjson can give you 3x speed without the need to change anything.

1 more reply

kelvinjps1012d ago

Great post saved it for when I need to optimize my python code

viktorcode11d ago

I was hoping for Mojo to appear as optimisation strategy

zahlman11d ago

Edit: it's strange to get downvoted while also getting replies that agree with me and don't seem to object.

(Also, I thought it wasn't supposed to be possible to edit after getting a reply?)

hydrolox11d ago

adammarples11d ago

Same problems, same Apple M4 Pro, real numbers.

arlattimore11d ago

What a great article!

skeledrew11d ago

achierius11d ago

repple11d ago

skeledrew9d ago

Not at all depressing. If authors/creators want readers/consumers to know who they are, then they'll do so in whatever way they consider acceptable to them.

zahlman11d ago

Why is it amusing?

How can you suppose that this is not a good reason to object, especially days after https://news.ycombinator.com/item?id=47340079 ?

I find the style so reflexively grating that it's honestly hard for me to imagine others not being bothered by it, let alone being bothered by others being bothered.

Especially since I looked at previous posts on the blog and they didn't have the same problem.

skeledrew9d ago

shepherdjerred11d ago

The smell makes me suspicious because I don’t know how the author used AI.

If the author wrote a detailed rough draft, had AI edit, reviewed the output thoroughly, and has the domain knowledge to know if the AI is correct, then this could be a useful piece.

I suspect most authors _don’t_ fall in that bucket.

skeledrew9d ago

It's already established that the piece is useful but several readers. There's no detracting from that. Why does the process matter so much as soon as AI comes in the picture?

retsibsi11d ago

> detractors-for-no-good-reason

It's partly just a matter of taste; we can disagree on whether that's a good reason, but I'd be surprised if there were no writing styles that you personally find offputting.

skeledrew9d ago

> signal of low effort

pjmlp11d ago

Yeah, while posting how they are using Claude to do something really amazing.

0coCeo5d ago

test

tpoacher11d ago

There, FTFY :D

perching_aix11d ago

> language slow

> looks inside

> the reference implementation of language is slow

The only thing worse than this is when Python specifically is outright taught (!) as an "interpeted language", as if an implementation-detail like that was somehow a language property. So grating.

zahlman11d ago

But yes, the very terminology "interpreted language" was designed for a different era and is somewhere between misleading and incomprehensible in context. (Not unlike "pass by value".)

perching_aix11d ago

1 more reply

j / k navigate · click thread line to collapse