The only way to speed it up would be to change the language.
First of all that's only true when it managed to jit the code, secondly only until you try to do any of those slow things. For instance the C ABI emulation they have both cannot support all of CPython and wrecks performance. The same is true if you try to do fancy things with sys._getframe which a lot of code does in the wild (eg: all of logging).
In addition PyPy has to do a lot of special casing for all the crazy things CPython does. I recommend looking into the amount of engineering that went into it.
https://youtu.be/qCGofLIzX6g https://youtu.be/IeSu_odkI5I
I wish these were the things Python 3 addressed, rather than Unicode. I guess it's much more obvious in hindsight than back when Python 3 was designed.
Python's still a great language for the things it was being designed for back in the 2000s. But adding decent Unicode support is a big part of what helped it become an attractive language for use cases where I wish it performed better or had better support for parallelism. Natural language processing, for example.
A point made in the video that seems to highlight the issue:
> Just adding two numbers requires 400 lines of code.
In compiled languages, this is one instruction! Think about the cache thrashing and memory loading involved in this one operation too. How can this possibly be fixed?
Python is a great language, but I don't know if it can ever be high performance on its own.
Never the best tool if you have strict performance requirements, but so damn versatile it will never go away.
Cython does need better docs though, the steep learning curve means it is under-utilized.
For some that glue is Forth. :D
> A guy named Jean-Paul Wippler is considering using Forth as a super glue language to bind Python Perl and Tcl together in a project called Minotaur (http://www.equi4.com/minotaur/minotaur.html).
> Forth is an ideal intermediary language, precisely because it's so agile. Otherwise, it wouldn't have been chosen for OpenFirmware, which when you think about it, is a Forth system that must interface to a potentially wide variety of programming language environments.
Also see PyPy, which manages to squeeze a lot more performance out of Python for many use cases without changing the language.
> This is a common view but I've never heard it from someone who has tried to optimize Python. Personally I think that Python is as much more dynamic than JavaScript as JavaScript is than C.
Ultimately JS can be reduced to a very tight engine. This is not possible with Python, it's just too dynamic.
For general use cases the performance is fine, but only thanks to the hard work of C/CPython/Cython programmers who give up Python's rich expressibility to gain this performance. It seems like you simply have to use another language to get anything running fast.
Having said all that, Pyc seems interesting as it apparently compiles Python. Has anyone had any experience of this?
What aspects of the language are you convinced cannot be optimised? There's tons of research in this area.
That said, for a lot of other projects which haven't yet looked, there may be some low-hanging fruit. For example, I was doing some looking at this recently on a highly pluggable workspace build tool called colcon [1], and found that of 5+ seconds of startup time, I could save about 1 second with "business logic" changes (adding caching to a recursive operation), another 1 second by switching some filesystem operations to use multiprocessing, and about 1.5 seconds from making some big imports (requests, httpx, sanic) happen lazily on first use.
Most of these uses are very rare, but the tail is incredibly long for Python, and the problem is that you can't even compile a "likely normal" and a "here be dragons" versions, and switch only when needed - you need to constantly verify. The same is not true, AFAIK, with Common Lisp - being a lisp1 and having a stronger lexical scope than python does.
Shedskin is a Python to C++ compiler that mostly requires the commonly-honoured constrained that a variable is only assigned a single type throughout its lifetime. (And that you don't modify classes after creation, and that you don't need integers longer than machine precision, and ....); While many programs seem to satisfy these requirements on superficial inspection, it turns out that almost all programs violate them in some way (directly or through a library).
The probability that Shedskin will manage to compile a program that was not written with Shedskin in mind is almost zero.
Nuitka was started with the idea that, unlike shedskin, it will start by compiling the bytecode to an interpreter-equivalent execution (which it does, quite well), to get a minor speed up - and then gradually compile "provably simple enough" things to fast C++; a decade or so later, that's not working out as well as hoped, AFAIK because everything depends on something that violates simplicity.
There's research towards solving all of these problems.
> The only way to speed it up would be to change the language.
Maybe we just haven't worked out how yet? Nothing you've mentioned is known to be impossible to make fast.
> The only way to speed it up would be to change the language.
What specifically? Most of your points are not related to the language. And even current Smalltalk engines are much faster than CPython (see https://github.com/OpenSmalltalk/opensmalltalk-vm).
Each VM op for Python or Ruby ends up being bigger and having more branches. For Ruby this is quite painful on the numeric types. Branching, boxing and unboxing is far slower than just testing and adding floats in the LuaJIT VM.
Due assignment as an expression and things like x = foo(x, x+=1) Ruby, Python and JS all need to copy x into a new VM Register when it’s used. LuaJIT can assume locals aren’t reassigned mid statement and doesn’t need copies.
Oh, wait...
I'm pretty sure both Guido for Python and Larry for Perl were explicitly aware of the impossibility of designing for processors that wouldn't exist for 20 years, though digging up quotes to that effect would be quite difficult.
A mantra of that era is "There are no slow languages, only slow implementations." I, for one, consider this mantra to be effectively refuted. Even if there is a hypothetical Python interpreter/compiler/runtime/whatever that can run effectively as fast as C with no significant overhead (excepting perhaps some reasonable increase in compile time), there is no longer any reason to believe that mere mortal humans are capable of producing it, after all the effort that has been poured into trying, as document by the original link. Whatever may be true for God or superhuman AIs, for human beings, there are slow languages that build intrinsically slow operations into their base semantics.
Why should this make python slow?
https://www.youtube.com/watch?v=qCGofLIzX6g&t=31m44s
PyPy is faster for pure Python code, but that comes at the expense of having a far slower interface with C code. There's an entire ecosystem built around the fact that while Python itself is slow, it can very easily interface with native code (Numpy, Scipy, OpenCV) with very little overhead.
So sure, you can make Python much faster, if you're willing to piss off the very Python users who care the most about performance in the first place (the data science / ML people and anyone else using native extensions).
It's looking like HPy is going to (hopefully) solve this. But finishing HPy and getting it adopted is likely to be a pretty massive undertaking.
I love the idea of typed base language to implement a higher level more flexible language while still being able to drop down for correctness and speed. Gradually dynamically typed, ;)
Another thing to look at is https://chocopy.org/ a typed subset of Python for teaching compilers courses. Might be worthwhile pinging Chocopy students and enticing them towards epython.
What is the semantic union and intersection between EPython and Chocopy?
I think the approach where a typed subset of Python is used to compile a fast extension module is the way forward for Python. This would leave us with a slow but dynamic high-level-variant (CPython) and typed lower-level-variant (EPython, mypyc & co) to compile performant extension modules, which you can easily import into your CPython code.
The most prominent of such projects I know of is mypyc [0], which is already used to improve performance for mypy itself and the black [1] code formatter. I think it would be interesting to see how EPython compares to mypyc.
The C API is what prevents PyPy or other Python runtimes from being able to compete and interop. The community could do this, rebase Python modules with native code to cffi so that they can run in all Pythons. The C API is neither good, nor necessary and only serves to gate keep CPython's access to the rest of the Python user community.
It's a bit like python3 from python2, it's been so slow and painful to transition because you cannot just "drop all your code" (I'm simplifying the issue).
Doesn't look like that from over here.
Many times the difference between failure and the magic spell working is 1 more late night iteration. In this specific case you are working against some difficult constraints that are deep in the language. That said, there is almost always a way to side-step a problem altogether. You may find that one workaround is to amortize the startup concern over time - I.e. reorient the problem domain so you only have start the python process once a day. Or, find a way to defer loading of required components until the runtime actually needs them.
However, idiomatic Python shortcuts to expose everything at the top level (star imports or imports of everything in the top-level __init__.py) cause everything to be imported everywhere. __all__ is all but forgotten, so importing things like flask, sqlalchemy, requests and similar will take anywhere from 100-500ms each, even if you just need a single function from a submodule.
Worst offenders are things which embed their own copy of requests (likely for reproducible builds) taking upwards of 800ms just to import even if your project already imported requests directly.
I don't think it has anything to do with search paths, but simply with loading and executing hundreds of files. If you need those modules, Python will read them. Perhaps moving your venv to a "ramdisk" might help?
python -s [-S]Now that's how you title a thesis paper.
Yes, there was a Python 3.1: https://www.python.org/download/releases/3.1/
[1]: https://github.com/microsoft/Pyjion#how-do-this-compare-to-
This also means that one could implement an alternative JIT using Rust or OCaml.
https://github.com/microsoft/Pyjion#what-are-the-goals-of-th...
Not all of these were designed for speed,l. For example jython was also intended for Java/python interoperability.
Some of the interpreters on the list haven't seen updates in a while, or don't support python 3.x
Highly recommend it for anyone doing scientific computing
I think for Python to get decent speedups the semantics for the code being optimized needs to be highly constrained.
Optimizing full in the wild Python code is a huge huge task. Optimizing for operations over constant type arrays is much much easier.
Yes this doesn't speed up the call or the allocation rate, but start with some easy stuff or nothing will improve.
For example the "jit compile a single function" feature is gold when you need to pass a function callback pointer into a C library. This is how pygraphblas compiles Python functions into semiring operators that are then passed to the library which has no idea that the pointer is to jit compiled python function:
https://github.com/michelp/pygraphblas/blob/master/tests/tes...
When I reach for python its not for speed. Its because its fairly easy to write and has some good libraries.
Either its done in a few seconds, or I can wait a few hours as it runs as a background slurm task..
I feel like there is a group that wants python to be the ideal language for all things, maybe because I'm not in love with the syntax, but I'm ok having multiple languages.
Eventually I found Nim and never looked back. Python is simply not built for speed but for productivity. Nim is built for both from the start. It's certainly lacking the ecosystem of Python, but for my use cases that doesn't matter.
To make it more concrete, here is an experimental DSL for embedded high-performance computing that uses static analysis and source-to-source (Python-to-C, actually) code transformation: https://github.com/zanellia/prometeo.
There were a couple of GIL-less variations, but they were either incredibly slow, or suffered serious compatibility problems (and often both).
Also, some relevant old post: