Pybind11 is also great, but quite different in aims - I feel like it's more like a project for C++ programmers wanting to expose functionality to Python.
I already wrote a few patch for pysfml, which is written in cython, it was a bit awkward, and now I'm asking myself if cython is really the right tool to write bindings, compared to cpython, for example.
It's very fast to write for, that's the main benefit. Use it together with profiling and just pick off the slowest part first.
But some packages can utilize it for higher performance but most of the time it'll be slower cause you need to parse extra information if you want to reuse it in pure python.
If you really care about performance called from Python, consider something like NVIDIA Warp (Preview). Warp jits and runs your code on CUDA or CPU. Although Warp targets physics simulation, geometry processing, and procedural animation, it can be used for other tasks as well. https://github.com/NVIDIA/warp
Google Jax is another option, jitting and vectorizing code for TPU, GPU or CPU. https://github.com/google/jax
Why would you recommend that? It's all way more effort than just writing Cython, especially in a Jupyter Notebook. And Cython code can be just as fast as C/C++ code unless you're doing something really fancy. It's a bunch of work for no benefit.
>Warp jits and runs your code on CUDA or CPU
If someone's writing Cython it's probably because they found something that couldn't be done efficiently in Numpy because it was sequential, not easily vectorisable. Such code is going to get zero benefit from Cuda or running on the GPU.
In general, all your jitted code is not going to be as fast as code compiled with an ahead-of-time compiler like the C compiler that Cython uses. Moreover if you use a JIT then it makes your code a pain in the ass to embed in a C/C++ application, unlike Cython code.
nanobind/pybind11 (co-)author here. The space of python bindings is extremely diverse and on the whole probably looks very different from your use case. nanobind/pybind11 target the 'really fancy' case you mention specifically for codebases that are "at home" in C++, but which want natural Pythonic bindings. There is near-zero overlap with Cython.
Agreed, if you have bad performing spaghetti Python code, none of those tools are going to help indeed. Then I would rather rewrite it all in C/C++ instead of fiddling with Cython.
I believe there was a time very early on (like 2003) when there was discussion about maybe including Pyrex in CPython proper to get a more Common-Lisp like gradually typed system. (I mostly recall some comment of Greg's along the lines of being intimidated by such. I'm not sure how seriously the idea was entertained by PyCore.)
CPython is in the hands of not really productive bigcorp representatives who care about large legacy code bases. My guess is that CPython will be largely the same in 10 years, with the usual widely hyped initiatives that go nowhere ("need for speed etc.").
It's clear that Python's main strength is its vast libraries, priority number one is not breaking them. If it could be possible to speed up Python without breaking changes I would be surprised precisely because with so much large codebases speed and efficiency would translate directly to money.
The Microsoft funded project is different, they're merging things. I don't think they've started on a JIT translator yet, though, last time I looked they were busy picking lower-hanging fruit. From watching their communications, I think they might get there at some point.
It's not as simple as just emitting machine code, though. To get something in the same magnitude of typical C code, you need to deduce types and peel away the boxing and unboxing layers.
That is another thing that is nice about Cython, you don't have to learn all of Cython to be productive. Take your existing python function and just add some type annotations and you'll see real performance gains. Then you can profile your code and see what the next bottle neck is and fix that and so on.
So, yes, Cython gives you the power to manually control the GIL and the Python API calls and manage your own memory management and layout for those corner cases where that is what you need. Most of the time you can happily ignore all of that and get almost all of the speedup available.
The other place it shines is if you ever need to loop over an array of data that cannot easily be represented as numpy arrays, like strings or more complex structs. Here you can get significant speedups compared to python.
The third use of Cython I really like is with C and C++ interop. Sure there are lots of ways of calling C code from Python, but to me Cython is probably the quickest and cleanest.
Compared to Numba, it's harder to say. Numba, when it works, is easily as fast as Cython. However I find Numba hard to reason about and it's still a bit of a black box as to when and why it does and doesn't work. The nice thing about Cython is that it is pretty simple so you can easily reason about what it will do your code and how it will perform. It's been a long time since Cython 'surprised' me by performing much better or worse than I expected.
If you want to see Cython in action, take a look at the source code of scikit-image or scikit-learn. They implement many of their core algorithms in Cython
Numba is a JIT, and only covers some of Numpy. I'd say it's amazing at how well it works, but it "only" covers certain aspects of the language. It's also a bit of an all-or-nothing - if it doesn't cover a certain class of syntax, it just won't JIT.
Cython is ahead-of-time compiled, and much more comprehensive. It turns Python, effectively, into C, and compiles it as a Python extension. The possible scope is thus much greater, and although Cython comes with built-in support for Numpy, it is much more broad in principle.
So... it's a very different set of trade-offs. Like with Numba, out of the box, with no changes, you will typically see a significant improvement (what's significant? From experience about 2x). You have much more scope for tweaking your code to speed things up - move some of the execution to C, disable bounds checking, outright call C libraries, etc. It comes with a suite of tools for analysing performance bottlenecks. It used to come with a lot of special syntax, which nowadays is done with annotations and decorators - much neater IMO. And of course, no run-time compilation delay, it's moved to, well, compilation time.
However, (I think) cython is superior when:
- you want to distribute (eg as a pypi package) your code
- you want to interface with C/C++ code libs
I found out I almost never have to do this and did not touch cython since I started using numba.
Is there a way to convert a Cython module(s) to C++, or at least a .o file? They are so dang close.
The only drawback is that a Cython module still loads the CPython interpreter, so I personally prefer writing performance critical code in Rust instead. Writing in Julia has the same drawbacks of not being embeddable that writing in Cython does.
Julia has multiple dispatch and may seem more appealing but at scale it is a very slow language to develop in. And for scripts it takes FOREVER (try loading Plots, CSV, DataFrames, Makie etc every time you restart. It’s genuinely insane that that’s the norm.)
If the whole Python ecosystem was in Cython (i.e. numpy, scipy, etc) I’d never use another backend language again.
I guess Cython is not really made to write bindings, but is it easier to write bindings with cython or cpython?
Writing bindings in Cython is much, much faster in terms of development time. It fits nicely and unintrusively in an already python packaged library. You can gently add some C functions or call C libraries in minutes.
You won't have a full control of what's happening though. Just have a look at the generated code and you'll see the mess of indirections that are generated.
Cython bindings become limited when you have to build more complex stuff though, going deeper than just calling some C functions. The typical case is when you have to actually handle the lifetime and borrowing of C native objects.
At that point, CPython will be the way to go, but it's much more code, and very error prone: you have to manually keep track of reference counting.
With a bit of care (and benchmarking) you can get very respectable speed. The main drawback is that the further you go, the more C knowledge you need in order to not blast your own feet off.
If you're just after a bit more performance in general, a drop in solution like pypy might be enough.
I'm curious, since most of the big libraries are already just cuda calls anyway but I'm always interested in anything to speed up the full process.
if you successfully use numba, probably nothing that you couldn't already do.
if you want something that lives much closer to C, it's perfect.