State of Python 3.13 performance: Free-threading (opens in new tab)

(codspeed.io)

196 pointsart0491y ago190 comments

190 comments

I don't really have a dog in this race as I don't use Python much, but this sort of thing always seemed to be of questionable utility to me.

Python is never really going to be 'fast' no matter what is done to it because its semantics make most important optimizations impossible, so high performance "python" is actually going to always rely on restricted subsets of the language that don't actually match language's "real" semantics.

On the other hand, a lot of these changes to try and speed up the base language are going to be highly disruptive. E.g. disabling the GIL will break tonnes of code, lots of compilation projects involve changes to the ABI, etc.

I guess getting loops in Python to run 5-10x faster will still save some people time, but it's also never going to be a replacement for the zoo of specialized python-like compilers because it'll never get to actual high performance territory, and it's not clear that it's worth all the ecosystem churn it might cause.

andai1y ago

There was a discussion the other day about how Python devs apparently don't care enough for backwards compatibility. I pointed out that I've often gotten Python 2 code running on Python 3 by just changing print to print().

But then a few hours later, I tried running a very small project I wrote last year and it turned out that a bunch of my dependencies had changed their APIs. I've had similar (and much worse) experiences trying to get older code with dependencies running.

My meaning with this comment is, that if the average developer's reality is that backwards compatibility isn't really a thing anyway, then we are already paying for that downside so we might as well get some upside there, is my reasoning.

adamc1y ago

It's hard to comment on this without knowing more about the dependencies and when/how they changed their APIs. I would say if it was a major version change, that isn't too shocking. For a minor version change, it should be.

Stuff that is actually included with Python tends to be more stable than random Pypi packages, though.

NPM packages also sometimes change. That's the world.

2 more replies

klysm1y ago

So pin your deps? Language backwards compatibility and an API from some random package changing are completely distinct.

3 more replies

saurik1y ago

The Python 2 to 3 thing was worse when they started: people who made the mistake of falling for the rhetoric to port to python3 early on had a much more difficult time as basic things like u"" were broken under an argument that they weren't needed anymore; over time the porting process got better as they acquiesced and unified the two languages a bit.

I thereby kind of feel like this might have happened in the other direction: a ton of developers seem to have become demoralized by python3 and threw up their hands in defeat of "backwards compatibility isn't going to happen anyway", and now we live in a world with frozen dependencies running in virtual environments tied to specific copies of Python.

1 more reply

dataflow1y ago

> Python 2 code running on Python 3 by just changing print to print().

This was very much the opposite of my experience. Consider yourself lucky.

2 more replies

salomonk_mur1y ago

What APIs were broken? They couldn't be in the standard library.

If the dependency was in external modules and you didn't have pinned versions, then it is to be expected (in almost any active language) that some APIs will break.

2 more replies

kwertzzz1y ago

Sadly, several python projects do not use semantic versioning, for example xarray [0] and dask. Numpy can make backward incompatible changes after a warning for two releases[1]. In general, the python packaging docs do not really read as an endorsement of semantic versioning [2]:

> A majority of Python projects use a scheme that resembles semantic versioning. However, most projects, especially larger ones, do not strictly adhere to semantic versioning, since many changes are technically breaking changes but affect only a small fraction of users...

[0] https://github.com/pydata/xarray/issues/6176

[1] https://numpy.org/doc/stable/dev/depending_on_numpy.html

[2] https://packaging.python.org/en/latest/discussions/versionin...

musicale1y ago

Even after it finished burning a billion lines of python 2 code (largely unnecessarily imho) python 3 seems to retain an unhealthy contempt for backward compatibility. I have had similar experiences where python 3 projects require a particular version of python 3 in order to run.

I like python (and swift for that matter) but I don't like the feeling that I am building on quicksand. Java, C++, and vanilla javascript seem more durable.

almostgotcaught1y ago

> I pointed out that I've often gotten Python 2 code running on Python 3 by just changing print to print().

...

> I wrote last year and it turned out that a bunch of my dependencies had changed their APIs

these two things have absolutely nothing to do with each other - couldn't be a more apples to oranges comparison if you tried

1 more reply

LtWorf1y ago

I'd drop libraries that do like that.

rfoo1y ago

> Python is never really going to be 'fast' no matter what is done to it because its semantics make most important optimizations impossible

Scientific computing community have a bunch of code calling numpy or whatever stuff. They are pretty fast because, well, numpy isn't written in Python. However, there is a scalability issue: they can only drive so many threads (not 1, but not many) in a process due to GIL.

Okay, you may ask, why not just use a lot of processes and message-passing? That's how historically people work around the GIL issue. However, you need to either swallow the cost of serializing data over and over again (pickle is quite slow, even it's not, it's wasting precious memory bandwidth), or do very complicated dance with shared memory.

It's not for web app bois, who may just write TypeScript.

willseth1y ago

This is misleading. Most of the compute intensive work in Numpy releases the GIL, and you can use traditional multithreading. That is the case for many other compute intensive compiled extensions as well.

2 more replies

eigenspace1y ago

Numpy is not fast enough for actual performance sensitive scientific computing. Yes threading can help, but at the end of the day the single threaded perf isn't where it needs to be, and is held back too much by the python glue between Numpy calls. This makes interproceedural optimizations impossible.

Accellerated sub-languages like Numba, Jax, Pytorch, etc. or just whole new languages are really the only way forward here unless massive semantic changes are made to Python.

1 more reply

wormlord1y ago

> On the other hand, a lot of these changes to try and speed up the base language are going to be highly disruptive. E.g. disabling the GIL will break tonnes of code, lots of compilation projects involve changes to the ABI, etc.

Kind of related, the other day I was cursing like a sailor because I was having issues with some code I wrote that uses StrEnum not working with older versions of Python, and wondering why I did that, and trying to find the combination of packages that would work for the version of Python I needed-- wondering why there was so much goddamn churn in this stupid [expletive] scripting language.

But then I took a step back and realized that, actually, I should be glad about the churn because it means that there is a community of developers who care enough about the language to add new features and maintain this language so that I can just pipe PyQt and Numpy into each other and get paid.

I don't have any argument, just trying to give an optimistic perspective.

d0mine1y ago

At least bugfix versions could have kept Enum behavior the same. Postponing breaking changes until the next minor version. Some Enum features work differently (incompatible) in Python 3.11.x versions.

2 more replies

lmm1y ago

Meh. Anyone can improve a language if they don't care about keeping backward compatibility, just like new programming languages and new codebases always look gloriously clean. The part that requires actual discipline and expertise is improving it while keeping the breakages under control.

6gvONxR4sf7o1y ago

> so high performance "python" is actually going to always rely on restricted subsets of the language that don't actually match language's "real" semantics.

I don't even understand what this means. If I write `def foo(x):` versus `def foo(x: int) -> float:`, one is a restricted subset of the other, but both are the language's "real" semantics. Restricted subsets of languages are wildly popular in programming languages, and for very varied reasons. Why should that be a barrier here?

Personally, if I have to annotate some of my code that run with C style semantics, but in return that part runs with C speed, for example, then I just don't really mind it. Different tools for different jobs.

almostgotcaught1y ago

> If I write `def foo(x):` versus `def foo(x: int) -> float:`, one is a restricted subset of the other, but both are the language's "real" semantics.

You either are performing some wordplay here or you don't understand but type hints are not part of the semantics at all: since they are not processed at all they do not affect the behavior of the function (that's what semantics means).

EDIT: according to the language spec and current implementation

`def foo(x: int) -> float`

and

`def foo(x: float) -> int`

are the same exact function

1 more reply

the__alchemist1y ago

This is a good question, and I think about it as well. My best guess for a simple explanation: Python is very popular; it makes sense to improve performance for python users, given many do not wish to learn to use a more performant language, or to use a more performant Python implementation. Becoming proficient in a range of tools so you can use the right one for the right job is high enough friction that it is not the path chosen by many.

devjab1y ago

You should really add that Python is also a very good tool for people who know more performant languages. I think one of the sides which often gets forgotten is that a lot of software will never actually need to be very performant and often you’re not going to know the bottlenecks beforehand. If you even get to the bottlenecks it means you’ve succeeded enough to get to the bottlenecks. Somewhere you might not have gotten if you over engineered things before you needed it.

What makes Python brilliant is that it’s easy to deliver on business needs. It’s easy to include people who aren’t actually software engineers but can write Python to do their stuff. It’s easy to make that Wild West code sane. Most importantly, however, it’s extremely easy to replace parts of your Python code with something like C (or Zig).

So even if you know performant languages, you can still use Python for most things and then as glue for heavy computation.

Now I may have made it sound like I think Python is brilliant so I’d like to add that I actually think it’s absolute trash. Loveable trash.

2 more replies

eigenspace1y ago

Oh yeah, I totally get the motivation behind it. It's always very tempting to want to make things faster. But I can't help but wondering if these attempts to make it faster might end up just making it worse.

On the other hand though, Python is so big and there's so many corps using it with so much cash that maybe they can get away with just breaking shit every few releases and people will just go adapt packages to the changes.

2 more replies

sneed_chucker1y ago

If JavaScript (V8) and PyPy can be fast, then CPython can be fast too.

It's just that the CPython developers and much of the Python community sat on their hands for 15 years and said stuff like "performance isn't a primary goal" and "speed doesn't really matter since most workloads are IO-bound anyway".

jerf1y ago

In this context, V8 and PyPy aren't fast. Or at least, not generally; they may actually do well on this task because pure number tasks are the only things they can sometimes, as long as you don't mess them up, get to compiled language-like performance. But they don't in general to compiled language performance, despite common belief to the contrary.

3 more replies

jillesvangurp1y ago

Why does python have to be slow? Improvements over the last few releases have made it quite a bit faster. So that kind of counters that a bit. Apparently it didn't need to be quite as slow all along. Other languages can be fast. So, why not python?

I think with the GIL some people are overreacting: most python code is single threaded because of the GIL. So removing it doesn't actually break anything. The GIL was just making the use of threads kind of pointless. Removing it and making a lot of code thread safe benefits people who do want to use threads.

It's very simple. Either you did not care about performance anyway and nothing really changes for you. You'd need to add threading to your project to see any changes. Unless you do that, there's no practical reason to disable the GIL for you. Or to re-enable that once disabled becomes the default. If your python project doesn't spawn threads now, it won't matter to you either way. Your code won't have deadlocking threads because it has only 1 thread and there was never anything to do for the GIL anyway. For code like that compatibility issues would be fairly minimal.

If it does use threads, against most popular advise of that being quite pointless in python (because of the GIL), you might see some benefits and you might have to deal with some threading issues.

I don't see why a lot of packages would break. At best some of them would be not thread safe and it's probably a good idea to mark the ones that are thread safe as such in some way. Some nice package management challenge there. And probably you'd want to know which packages you can safely use.

eigenspace1y ago

> Why does python have to be slow?

Because the language's semantics promise that a bunch of insane stuff can happen at any time during the running of a program, including but not limited to the fields of classes changing at any time. Furthermore, they promise that their integers are aribtrary precision which are fundamentally slower to do operations with than fixed precision machine integers, etc.

The list of stuff like this goes on and on and on. You fundamentally just cannot compile most python programs to efficient machine code without making (sometimes subtle) changes to its semantics.

_________

> I don't see why a lot of packages would break. At best some of them would be not thread safe and it's probably a good idea to mark the ones that are thread safe as such in some way. Some nice package management challenge there. And probably you'd want to know which packages you can safely use.

They're not thread safe because it was semantically guaranteed to them that it was okay to write code that's not thread safe.

3 more replies

willvarfar1y ago

(As a happy pypy user in previous jobs, I want to chime in and say python _can_ be fast.

It can be so fast that it completely mooted the discussions that often happen when wanting to move from a python prototype to 'fast enough for production'.)

eigenspace1y ago

PyPy is still slow compared to actual fast languages. It's just fast compared to Python, and it achieves that speed by not being compatible with most of the Python ecosystem.

Seems like a lose-lose to me. (which is presumably why it never caught on)

1 more reply

yunohn1y ago

> I guess getting loops in Python to run 5-10x faster will still save some people time

I would recommend being less reductively dismissive, after claiming you “don’t really have a dog in this race”.

Edit: Lots of recent changes have done way more than just loop unrolling JIT stuff.

Capricorn24811y ago

I don't really get this. They have already made Python faster in the past while maintaining the same semantics. Seems like a good goal to me.

ggm1y ago

You may be right. I personally think this work is net beneficial, and although I never expected to be in MP or threads, I now find doing a lot of DNS (processing end of day logs of 300m records per day, trying to farm them out over public DNS resolvers doing multiple RR checks per FQDN) that the MP efficiency is lower than threads, because of this serialisation cost. So, improving threading has shown me I could be 4-5x faster in this solution space, IFF I learn how to use the thread.lock to gatekeep updates on the shared structures.

My alternative is to serialise in heavy processes and then incur a post process unification pass, because the cost of serialise send/receive deserialise to unify this stuff is too much. If somebody showed me how to use shm models to do this so it came back to the cost of threading.lock I'd do the IPC over a shared memory dict, but I can't find examples and now suspect multiprocessing in Python3 just doesn't do that (happy, delighted even to be proved wrong)

nickpsecurity1y ago

I spent some time looking into it. I believe it could be done with a source-to-source transpiler with zero-cost abstractions and some term rewriting. It’s a lot of work.

The real barrier my thought experiment hit were the extensions. Many uses of Python are glue around C extensions designed for the CPython interpreter. Accelerating “Python” might actually be accelerating Python, C, and hybrid code that’s CPython-specific. Every solution seemed like more trouble than just rewriting those libraries to not be CPython-specific. Or maybe to work with the accelerators better.

Most people are just using high-level C++ and Rust in the areas I was considering. If using Python, the slowdown of Python doesn’t impact them much anyway since their execution time is mostly in the C code. I’m not sure if much will change.

rmbyrro1y ago

Python 3.12 will be officially supported until October 2028, so there's plenty of time to migrate to no-GIL if anyone wants to.

zurfer1y ago

Python 3.13 is not removing the GIL. You just have an option to run without it.

Someone1y ago

I think the reasoning is like this:

- People choose Python to get ease of programming, knowing that they give up performance.

- With multi-core machines now the norm, they’re relatively giving up more performance to get the same amount of ease of programming.

- so, basically, the price of ease of programming has gone up.

- economics 101 is that rising prices will decrease demand, in this case demand for programming in Python.

- that may be problematic for the long-term survival of Python, especially with new other languages aiming to provide python’s ease of use while supporting multi-processing.

So, Python must get even easier to use and/or it must get faster.

PaulHoule1y ago

As a Java dev I think people don’t always appreciate the flexiblity of threads for manage parallelism, concurrency, and memory. In particular with threads you can have a large number of thread share a large data structure. Say you have an ML inference model that takes 1GB of RAM. No way can you let 25 Celery workers each have a copy of that model, so if you are using Python you have to introduce a special worker class. It’s one more thing to worry about and more parameters to tune. With Java all your “workers” could be in one process, even the same process as your web server and that will be the case in Python.

Threads break down so many bottlenecks of CPU resources, memory, data serialization, waiting for communications, etc. I have a 16-core computer on my desk and getting a 12x speed-up is possible for many jobs and often worth worse less efficient use of the individual CPU. Java has many primitives for threads that include tools for specialized communications (e.g. barrier synchronization) that are necessary to really get those speed-ups and I hope Python gets those too.

Netcob1y ago

I was experimenting with Python for some personal data-based project. My goal was to play around with some statistics functions, maybe train some models and so on. Unfortunately most of my data needs a lot of preprocessing before it will be of any use to numpy&co. Mostly just picking out the right objects from a stream of compressed JSON data and turning them into time series. All of that is easy to do in Python, which is what I'd expect for the language used this much in data science (or so I've heard).

But then I do have a lot of data, there's a lot of trial&error there for me so I'm not doing these tasks only once, and I would have appreciated a speedup of up to 16x. I don't know about "high performance", but that's the difference between a short coffee break and going to lunch.

And if I was a working on an actual data-oriented workstation, I would be using only one of possibly 100+ cores.

That just seems silly to me.

ogrisel1y ago

The IPC overhead of process-based parallelism in Python is a pain to deal with in general, even when the underlying computational bottleneck are already written CPU optimized (calls to compiled extensions written in Cython/C/C++/Rust, call to CPU optimized operations written with CPU architecture-specific intrinsics/assembly from OpenBLAS via NumPy/SciPy, calls to e.g. GPU CUDA kernels via PyTorch/Triton, ...).

Sometimes the optimal level of parallelism lies in an outer loop written in Python instead of just relying on the parallelism opportunities of the inner calls written using hardware specific native libraries. Free-threading Python makes it possible to choose which level of parallelism is best for a given workload without having to rewrite everything in a low-level programming language.

emgeee1y ago

One area where this absolutely makes a difference is when embedding python. Like it or not Python is extreme popular in data/AI/ML so if you want to build an experience where users can deploy custom functions, removing the GIL allows you to more efficiently scale these workloads.

kevincox1y ago

Even if something is slow there is utility to have it run faster. Sure, Python will never be with for the most demanding performance requirements but that doesn't mean we should deny people performance.

There are lots of use case where Python's performance is acceptable but a 10x speed boost would be much appreciated. Or where Python's performance is not acceptable but it would be if I could fully utilize my 32 cores.

For example look at Instagram. They run tons of Python but need to run many processes on each machine wasting memory. I'm sure they would love to save that memory, the environment would too. Sure, they could rewrite their service in C but that is most likely not the best tradeoff.

runeblaze1y ago

For machine learning (un)fortunately, lots of the stack runs on Python. Lots of ceremony was done to circumvent GIL (e.g. PyTorch data loader). “Reasonable performance Python” I imagine is actually something in huge demand for lots of ML shops

pjmlp1y ago

Since Python has become the new Lisp, the minimum is to have the performance tooling Common Lisp has had for several decades, in native code generation and multithreading (yes I know that in CL this is implementation specific).

graemep1y ago

I do use Python and I am not that bothered about speed.

Very little of what I use it for has performance bottlenecks in the Python. Its the database or the network or IO or whatever.

On the few occasions when it does I can rewrite critical bits of code.

I definitely care more about backward compatibility than I do about performance.

It feels like Python development is being driven by the needs of one particular group (people who use ML heavily, possibly because they have deep pockets) and I wonder whether this, and a few other things will make it less attractive a language for me and others.

winrid1y ago

Your DB is probably faster than you think. I rewrote an API in Python to Java and it is around 6x faster with just same dumb N+1 queries, and the new API also includes all the frontend calculations that Python wasn't doing before.

DeathArrow1y ago

>Python is never really going to be 'fast' no matter what is done to it because its semantics make most important optimizations impossible, so high performance "python" is actually going to always rely on restricted subsets of the language that don't actually match language's "real" semantics.

Have you heard of Mojo? It is a very performant superset of Python. https://www.modular.com/mojo

pansa21y ago

Mojo isn’t anywhere near a superset of Python. It doesn’t even support classes!

otabdeveloper41y ago

Parallelism != speed.

If you have 128 cores you can compromise on being a bit slow. You can't compromise on being single-threaded though.

lanstin1y ago

And for the code running on the 128 core machine, which is not really rare these days, for that code to be pythonic, it should be dead simple to use all the cores in the obvious correct way. We have language features that enable simple multiple semantics but haven't yet got the easy way to do it.

MP has the stupid serialization issues and anything involving the Python coder doing locking or mutexes is not Pyrhinic as they will struggle.

zelphirkalt1y ago

I am always wondering who writes code so badly unaware of concurrency issues, to rely on the GIL. Wondering how many libraries and programs will actually break. But probably the number is way higher than even I imagine.

ogrisel1y ago

The race condition bugs are typically hidden by different software layers. For instance, we found one that involves OpenBLAS's pthreads-based thread pool management and maybe its scipy bindings:

- https://github.com/scipy/scipy/issues/21479

it might be the same as this one that further involves OpenMP code generated by Cython:

- https://github.com/scikit-learn/scikit-learn/issues/30151

We haven't managed to write minimal reproducers for either of those but as you can observe, those race conditions can only be triggered when composing many independently developed components.

stevofolife1y ago

Can you help me understand, if libraries like pandas and numpy also applies to your comment? Or are they truely optimized and you’re just referring to the standard Python language?

lijok1y ago

Given the scale at which python is run, how much energy are we saving by improving its performance by 1%?

cma1y ago

How about when there are 128-256 core consumer CPUs?

lanstin1y ago

And people are running things that take minutes to run when a good multi-cpu framework would make it seconds.

Maybe add a dataflow analyzer to Python and do it for people.

Decabytes1y ago

I'm glad the Python community is focusing more on CPython's performance. Getting speed ups on existing code for free feels great. As much as I hate how slow Python is, I do think its popularity indicates it made the correct tradeoffs in regards to developer ease vs being fast enough.

Learning it has only continued to be a huge benefit to my career, as it's used everywhere which underlies how important popularity of a language can be for developers when evaluating languages for career choices

ijl1y ago

Performance for python3.14t alpha 1 is more like 3.11 in what I've tested. Not good enough if Python doesn't meet your needs, but this comes after 3.12 and 3.13 have both performed worse for me.

3.13t doesn't seem to have been meant for any serious use. Bugs in gc and so on are reported, and not all fixes will be backported apparently. And 3.14t still has unavoidable crashes. Just too early.

bastawhiz1y ago

> 3.13t doesn't seem to have been meant for any serious use.

I don't think anyone would suggest using it in production. The point was to put something usable out into the world so package maintainers could kick the tires and start working on building compatible versions. Now is exactly the time for weird bug reports! It's a thirty year old runtime and one of its oldest constraints is being removed!

kristianp1y ago

> 3.12 and 3.13 have both performed worse for me

That's interesting, I wouldn't have expected performance regressions coming from those releases. How can that be?

runjake1y ago

If it were ever open sourced, I could see Mojo filling the performance niche for Python programmers. I'm hopeful because Lattner certainly has the track record, if he doesn't move on beforehand.

https://en.wikipedia.org/wiki/Mojo_(programming_language)

qaq1y ago

https://github.com/modularml/mojo/blob/main/LICENSE

kaanyalova1y ago

The source code for the compiler hasn't released yet

misswaterfairy1y ago

If not, Nim is probably the closest most 'Python-like' language that is almost as fast as C, and is released under the MIT licence.

https://nim-lang.org/

https://en.wikipedia.org/wiki/Nim_(programming_language)

the5avage1y ago

Can someone share insight into what was technically done to enable this? What replaced the global lock? Is the GC stopping all threads during collection or an other locking mechanism?

throwaway3133731y ago

The most interesting idea in my opinion is biased reference counting [0].

An oversimplified explanation (and maybe wrong) of it goes like this:

problem:

- each object needs a reference counter, because of how memory management in Python works

- we cannot modify ref counters concurrently because it will lead to incorrect results

- we cannot make each ref counter atomic because atomic operations have too large performance overhead

therefore, we need GIL.

Solution, proposed in [0]:

- let's have two ref counters for each object, one is normal, another one is atomic

- normal ref counter counts references created from the same thread where the object was originally created, atomic counts references from other threads

- because of an empirical observation that objects are mostly accessed from the same thread that created them, it allows us to avoid paying atomic operations penalty most of the time

Anyway, that's what I understood from the articles/papers. See my other comment [1] for the links to write-ups by people who actually know what they're talking about.

[0] https://dl.acm.org/doi/10.1145/3243176.3243195

[1] https://news.ycombinator.com/item?id=42059605

throwaway3133731y ago

AFAIK the initial prototype called nogil was developed by a person named Sam Gross who also wrote a detailed article [0] about it.

He also had a meeting with Python core. Notes from this meeting [1] by Łukasz Langa provide more high-level overview, so I think that they are a good starting point.

[0] https://docs.google.com/document/u/0/d/18CXhDb1ygxg-YXNBJNzf...

[1] https://lukasz.langa.pl/5d044f91-49c1-4170-aed1-62b6763e6ad0...

nas1y ago

The key enabling tech is thread safe reference counting. There are many other problems that Sam Gross solved in order to make it happen but the reference counting was one of the major blockers.

the5avage1y ago

Is this implemented with lockless programming? Is it a reason for the performance drop in single thread code?

Does it eliminate the need for a GC pause completely?

1 more reply

tightbookkeeper1y ago

Lots of little locks littered all over the place.

biglost1y ago

I'm not smart nor have any university title butmy opinion is this it's very good, but efforts should also go into remove features, not just python, i get it, it would breake anything.

0xDEADFED51y ago

Nice benchmarks. Hopefully some benevolent soul with more spare time than I can pitch in on threadsafe CFFI

santiagobasulto1y ago

With these new additions it might make sense to have a synchronized block as in Java?

aitchnyu1y ago

Are there web frameworks taking advantage of subinterpreters and free threading yet?

j / k navigate · click thread line to collapse

190 comments

eigenspace1y ago

I don't really have a dog in this race as I don't use Python much, but this sort of thing always seemed to be of questionable utility to me.

andai1y ago

adamc1y ago

Stuff that is actually included with Python tends to be more stable than random Pypi packages, though.

NPM packages also sometimes change. That's the world.

2 more replies

klysm1y ago

So pin your deps? Language backwards compatibility and an API from some random package changing are completely distinct.

3 more replies

saurik1y ago

1 more reply

dataflow1y ago

> Python 2 code running on Python 3 by just changing print to print().

This was very much the opposite of my experience. Consider yourself lucky.

2 more replies

salomonk_mur1y ago

What APIs were broken? They couldn't be in the standard library.

If the dependency was in external modules and you didn't have pinned versions, then it is to be expected (in almost any active language) that some APIs will break.

2 more replies

kwertzzz1y ago

[0] https://github.com/pydata/xarray/issues/6176

[1] https://numpy.org/doc/stable/dev/depending_on_numpy.html

[2] https://packaging.python.org/en/latest/discussions/versionin...

musicale1y ago

I like python (and swift for that matter) but I don't like the feeling that I am building on quicksand. Java, C++, and vanilla javascript seem more durable.

almostgotcaught1y ago

> I pointed out that I've often gotten Python 2 code running on Python 3 by just changing print to print().

...

> I wrote last year and it turned out that a bunch of my dependencies had changed their APIs

these two things have absolutely nothing to do with each other - couldn't be a more apples to oranges comparison if you tried

1 more reply

LtWorf1y ago

I'd drop libraries that do like that.

rfoo1y ago

> Python is never really going to be 'fast' no matter what is done to it because its semantics make most important optimizations impossible

It's not for web app bois, who may just write TypeScript.

willseth1y ago

2 more replies

eigenspace1y ago

Accellerated sub-languages like Numba, Jax, Pytorch, etc. or just whole new languages are really the only way forward here unless massive semantic changes are made to Python.

1 more reply

wormlord1y ago

I don't have any argument, just trying to give an optimistic perspective.

d0mine1y ago

2 more replies

lmm1y ago

6gvONxR4sf7o1y ago

> so high performance "python" is actually going to always rely on restricted subsets of the language that don't actually match language's "real" semantics.

almostgotcaught1y ago

> If I write `def foo(x):` versus `def foo(x: int) -> float:`, one is a restricted subset of the other, but both are the language's "real" semantics.

EDIT: according to the language spec and current implementation

`def foo(x: int) -> float`

and

`def foo(x: float) -> int`

are the same exact function

1 more reply

the__alchemist1y ago

devjab1y ago

So even if you know performant languages, you can still use Python for most things and then as glue for heavy computation.

Now I may have made it sound like I think Python is brilliant so I’d like to add that I actually think it’s absolute trash. Loveable trash.

2 more replies

eigenspace1y ago

2 more replies

sneed_chucker1y ago

If JavaScript (V8) and PyPy can be fast, then CPython can be fast too.

jerf1y ago

3 more replies

jillesvangurp1y ago

If it does use threads, against most popular advise of that being quite pointless in python (because of the GIL), you might see some benefits and you might have to deal with some threading issues.

eigenspace1y ago

> Why does python have to be slow?

The list of stuff like this goes on and on and on. You fundamentally just cannot compile most python programs to efficient machine code without making (sometimes subtle) changes to its semantics.

_________

They're not thread safe because it was semantically guaranteed to them that it was okay to write code that's not thread safe.

3 more replies

willvarfar1y ago

(As a happy pypy user in previous jobs, I want to chime in and say python _can_ be fast.

It can be so fast that it completely mooted the discussions that often happen when wanting to move from a python prototype to 'fast enough for production'.)

eigenspace1y ago

PyPy is still slow compared to actual fast languages. It's just fast compared to Python, and it achieves that speed by not being compatible with most of the Python ecosystem.

Seems like a lose-lose to me. (which is presumably why it never caught on)

1 more reply

yunohn1y ago

> I guess getting loops in Python to run 5-10x faster will still save some people time

I would recommend being less reductively dismissive, after claiming you “don’t really have a dog in this race”.

Edit: Lots of recent changes have done way more than just loop unrolling JIT stuff.

Capricorn24811y ago

I don't really get this. They have already made Python faster in the past while maintaining the same semantics. Seems like a good goal to me.

ggm1y ago

nickpsecurity1y ago

I spent some time looking into it. I believe it could be done with a source-to-source transpiler with zero-cost abstractions and some term rewriting. It’s a lot of work.

rmbyrro1y ago

Python 3.12 will be officially supported until October 2028, so there's plenty of time to migrate to no-GIL if anyone wants to.

zurfer1y ago

Python 3.13 is not removing the GIL. You just have an option to run without it.

Someone1y ago

I think the reasoning is like this:

- People choose Python to get ease of programming, knowing that they give up performance.

- With multi-core machines now the norm, they’re relatively giving up more performance to get the same amount of ease of programming.

- so, basically, the price of ease of programming has gone up.

- economics 101 is that rising prices will decrease demand, in this case demand for programming in Python.

- that may be problematic for the long-term survival of Python, especially with new other languages aiming to provide python’s ease of use while supporting multi-processing.

So, Python must get even easier to use and/or it must get faster.

PaulHoule1y ago

Netcob1y ago

And if I was a working on an actual data-oriented workstation, I would be using only one of possibly 100+ cores.

That just seems silly to me.

ogrisel1y ago

emgeee1y ago

kevincox1y ago

runeblaze1y ago

pjmlp1y ago

graemep1y ago

I do use Python and I am not that bothered about speed.

Very little of what I use it for has performance bottlenecks in the Python. Its the database or the network or IO or whatever.

On the few occasions when it does I can rewrite critical bits of code.

I definitely care more about backward compatibility than I do about performance.

winrid1y ago

DeathArrow1y ago

Have you heard of Mojo? It is a very performant superset of Python. https://www.modular.com/mojo

pansa21y ago

Mojo isn’t anywhere near a superset of Python. It doesn’t even support classes!

otabdeveloper41y ago

Parallelism != speed.

If you have 128 cores you can compromise on being a bit slow. You can't compromise on being single-threaded though.

lanstin1y ago

MP has the stupid serialization issues and anything involving the Python coder doing locking or mutexes is not Pyrhinic as they will struggle.

zelphirkalt1y ago

ogrisel1y ago

The race condition bugs are typically hidden by different software layers. For instance, we found one that involves OpenBLAS's pthreads-based thread pool management and maybe its scipy bindings:

- https://github.com/scipy/scipy/issues/21479

it might be the same as this one that further involves OpenMP code generated by Cython:

- https://github.com/scikit-learn/scikit-learn/issues/30151

We haven't managed to write minimal reproducers for either of those but as you can observe, those race conditions can only be triggered when composing many independently developed components.

stevofolife1y ago

Can you help me understand, if libraries like pandas and numpy also applies to your comment? Or are they truely optimized and you’re just referring to the standard Python language?

lijok1y ago

Given the scale at which python is run, how much energy are we saving by improving its performance by 1%?

cma1y ago

How about when there are 128-256 core consumer CPUs?

lanstin1y ago

And people are running things that take minutes to run when a good multi-cpu framework would make it seconds.

Maybe add a dataflow analyzer to Python and do it for people.

Decabytes1y ago

ijl1y ago

Performance for python3.14t alpha 1 is more like 3.11 in what I've tested. Not good enough if Python doesn't meet your needs, but this comes after 3.12 and 3.13 have both performed worse for me.

3.13t doesn't seem to have been meant for any serious use. Bugs in gc and so on are reported, and not all fixes will be backported apparently. And 3.14t still has unavoidable crashes. Just too early.

bastawhiz1y ago

> 3.13t doesn't seem to have been meant for any serious use.

kristianp1y ago

> 3.12 and 3.13 have both performed worse for me

That's interesting, I wouldn't have expected performance regressions coming from those releases. How can that be?

runjake1y ago

If it were ever open sourced, I could see Mojo filling the performance niche for Python programmers. I'm hopeful because Lattner certainly has the track record, if he doesn't move on beforehand.

https://en.wikipedia.org/wiki/Mojo_(programming_language)

qaq1y ago

https://github.com/modularml/mojo/blob/main/LICENSE

kaanyalova1y ago

The source code for the compiler hasn't released yet

misswaterfairy1y ago

If not, Nim is probably the closest most 'Python-like' language that is almost as fast as C, and is released under the MIT licence.

https://nim-lang.org/

https://en.wikipedia.org/wiki/Nim_(programming_language)

the5avage1y ago

Can someone share insight into what was technically done to enable this? What replaced the global lock? Is the GC stopping all threads during collection or an other locking mechanism?

throwaway3133731y ago

The most interesting idea in my opinion is biased reference counting [0].

An oversimplified explanation (and maybe wrong) of it goes like this:

problem:

- each object needs a reference counter, because of how memory management in Python works

- we cannot modify ref counters concurrently because it will lead to incorrect results

- we cannot make each ref counter atomic because atomic operations have too large performance overhead

therefore, we need GIL.

Solution, proposed in [0]:

- let's have two ref counters for each object, one is normal, another one is atomic

- normal ref counter counts references created from the same thread where the object was originally created, atomic counts references from other threads

- because of an empirical observation that objects are mostly accessed from the same thread that created them, it allows us to avoid paying atomic operations penalty most of the time

Anyway, that's what I understood from the articles/papers. See my other comment [1] for the links to write-ups by people who actually know what they're talking about.

[0] https://dl.acm.org/doi/10.1145/3243176.3243195

[1] https://news.ycombinator.com/item?id=42059605

throwaway3133731y ago

AFAIK the initial prototype called nogil was developed by a person named Sam Gross who also wrote a detailed article [0] about it.

He also had a meeting with Python core. Notes from this meeting [1] by Łukasz Langa provide more high-level overview, so I think that they are a good starting point.

[0] https://docs.google.com/document/u/0/d/18CXhDb1ygxg-YXNBJNzf...

[1] https://lukasz.langa.pl/5d044f91-49c1-4170-aed1-62b6763e6ad0...

nas1y ago

The key enabling tech is thread safe reference counting. There are many other problems that Sam Gross solved in order to make it happen but the reference counting was one of the major blockers.

the5avage1y ago

Is this implemented with lockless programming? Is it a reason for the performance drop in single thread code?

Does it eliminate the need for a GC pause completely?

1 more reply

tightbookkeeper1y ago

Lots of little locks littered all over the place.

biglost1y ago

I'm not smart nor have any university title butmy opinion is this it's very good, but efforts should also go into remove features, not just python, i get it, it would breake anything.

0xDEADFED51y ago

Nice benchmarks. Hopefully some benevolent soul with more spare time than I can pitch in on threadsafe CFFI

santiagobasulto1y ago

With these new additions it might make sense to have a synchronized block as in Java?

aitchnyu1y ago

Are there web frameworks taking advantage of subinterpreters and free threading yet?

j / k navigate · click thread line to collapse