Cython is 20 (opens in new tab)

(blog.behnel.de)

210 pointsgeococcyxc4y ago70 comments

70 comments

I love Cython. I really feel like it's the right balance of usability and allowing you to do what you want/need. Want to make your code a bit faster? Write Python with type annotations. Want to call a C library? Just import the header, and then use it from a function.

Pybind11 is also great, but quite different in aims - I feel like it's more like a project for C++ programmers wanting to expose functionality to Python.

jokoon4y ago

Any benchmarks for type annotations?

I already wrote a few patch for pysfml, which is written in cython, it was a bit awkward, and now I'm asking myself if cython is really the right tool to write bindings, compared to cpython, for example.

physicsguy4y ago

Gonna depend a huge amount on what you're doing to be honest. I used it for physics modelling codes and it made a bit of a difference (comparable to Numba) but dropping to C for the main computation routines was what we ended up doing, and that worked very well for us.

It's very fast to write for, that's the main benefit. Use it together with profiling and just pick off the slowest part first.

tristan9574y ago

For what it's worth I wrote Python bindings using Cython for our open source C-API storage engine and the performance was fairly close to on par with C.

machinekob4y ago

Python with type annotations isnt faster using python runtime.

But some packages can utilize it for higher performance but most of the time it'll be slower cause you need to parse extra information if you want to reuse it in pure python.

quietbritishjim4y ago

It's true that if your Python code is being interpreted with CPython then adding type annotations won't make it any faster. But the comment said that if you're already compiling your Python code with Cython then adding type annotations will allow Cython to make your code a little faster.

bobbylarrybobby4y ago

You're thinking of CPython, the standard implementation of Python. Cython is a (barely) separate language that looks a lot like Python but gets compiled to something like C. When you need performance, you can drop down from (C)Python into Cython

machinekob4y ago

Mb im blind ofc Cython would be faster :P

erwincoumans4y ago

I would recommend considering using NanoBind, the follow up of PyBind11 by the same author (Wensel Jakob), and move as much performance critical code to C or C++. https://github.com/wjakob/nanobind

If you really care about performance called from Python, consider something like NVIDIA Warp (Preview). Warp jits and runs your code on CUDA or CPU. Although Warp targets physics simulation, geometry processing, and procedural animation, it can be used for other tasks as well. https://github.com/NVIDIA/warp

Google Jax is another option, jitting and vectorizing code for TPU, GPU or CPU. https://github.com/google/jax

logicchains4y ago

>I would recommend considering using NanoBind, the follow up of PyBind11 by the same author (Wensel Jakob), and move as much performance critical code to C or C++

Why would you recommend that? It's all way more effort than just writing Cython, especially in a Jupyter Notebook. And Cython code can be just as fast as C/C++ code unless you're doing something really fancy. It's a bunch of work for no benefit.

>Warp jits and runs your code on CUDA or CPU

If someone's writing Cython it's probably because they found something that couldn't be done efficiently in Numpy because it was sequential, not easily vectorisable. Such code is going to get zero benefit from Cuda or running on the GPU.

In general, all your jitted code is not going to be as fast as code compiled with an ahead-of-time compiler like the C compiler that Cython uses. Moreover if you use a JIT then it makes your code a pain in the ass to embed in a C/C++ application, unlike Cython code.

wjakob4y ago

> Why would you recommend that? [..] It's a bunch of work for no benefit.

nanobind/pybind11 (co-)author here. The space of python bindings is extremely diverse and on the whole probably looks very different from your use case. nanobind/pybind11 target the 'really fancy' case you mention specifically for codebases that are "at home" in C++, but which want natural Pythonic bindings. There is near-zero overlap with Cython.

1 more reply

erwincoumans4y ago

Warp generates C/C++ code, that can be trivially used in a pure C++ or CUDA project without issues. So it is not strictly jit, since it calls the regular ahead-of-time compiler (gcc, llvm or nvcc) only when de code changes (using hashes to check for changes), so performance is good. Also, random non-vectorizable branchy code will run fine on cpu with Warp, but you loose many benefits indeed.

Agreed, if you have bad performing spaghetti Python code, none of those tools are going to help indeed. Then I would rather rewrite it all in C/C++ instead of fiddling with Cython.

beltsazar4y ago

Or alternatively, PyO3 if you use Rust instead of C++: https://github.com/PyO3/pyo3

jcelerier4y ago

if the object you wanna bind fits into the mold of "an algorithm with inputs and outputs, and some helper methods" I've got automatic binding of a limited set of C++ features working in https://github.com/celtera/avendish ; for now I've been using pybind11 but I guess everything I need is supported by nanobind so maybe i'll do the port...

fermigier4y ago

I made an "Awesome Cython" page last year. I welcome pull requests (or you can fork it as you want):

https://github.com/sfermigier/awesome-cython

cb3214y ago

cython --annotate is an ok way to learn the whys & whereabouts of the rather hairy CPython API. That gives you an HTML page you can click on to expand your python code into equivalent-ish C API calls. Darker yellow means more calls, too. So, it's not a terrible start to do static analysis to guide optimization, but a combination score (with a run-time profile) would be even better.

I believe there was a time very early on (like 2003) when there was discussion about maybe including Pyrex in CPython proper to get a more Common-Lisp like gradually typed system. (I mostly recall some comment of Greg's along the lines of being intimidated by such. I'm not sure how seriously the idea was entertained by PyCore.)

pjmlp4y ago

While it is nice that this option is available, it would be much better if Python itself would embrance the necessary runtime capabilties to not have to rely on it.

fname114y ago

This is not going to happen. GvR has successfully ignored Cython and PyPy for decades and has attached himself to a JIT project at Microsoft (has anything emerged?).

CPython is in the hands of not really productive bigcorp representatives who care about large legacy code bases. My guess is that CPython will be largely the same in 10 years, with the usual widely hyped initiatives that go nowhere ("need for speed etc.").

linspace4y ago

> who care about large legacy code bases

It's clear that Python's main strength is its vast libraries, priority number one is not breaking them. If it could be possible to speed up Python without breaking changes I would be surprised precisely because with so much large codebases speed and efficiency would translate directly to money.

2 more replies

olau4y ago

While it's true speed was not a priority, I think most of those initiatives didn't try hard to work with upstream.

The Microsoft funded project is different, they're merging things. I don't think they've started on a JIT translator yet, though, last time I looked they were busy picking lower-hanging fruit. From watching their communications, I think they might get there at some point.

It's not as simple as just emitting machine code, though. To get something in the same magnitude of typical C code, you need to deduce types and peel away the boxing and unboxing layers.

f311a4y ago

To write fast Cython, you basically need to write in C and control everything including Python API calls. No runtime will help with this.

dagw4y ago

That is really only true if you want to squeeze every drop of performance out of Cython. For the first 80% of performance gains you don't have to go that deep.

That is another thing that is nice about Cython, you don't have to learn all of Cython to be productive. Take your existing python function and just add some type annotations and you'll see real performance gains. Then you can profile your code and see what the next bottle neck is and fix that and so on.

So, yes, Cython gives you the power to manually control the GIL and the Python API calls and manage your own memory management and layout for those corner cases where that is what you need. Most of the time you can happily ignore all of that and get almost all of the speedup available.

throwaway8943454y ago

The parent is talking about Python (specifically CPython, I assume), not Cython. Moreover, performance isn't a binary and there's lots of headroom for CPython to improve if they would be willing to drive the community through a breaking change (but the Python community generally assumes that any kind of breaking change necessarily has to look like the Python 2->3 transition and there's no political will for this). Note how many interpreted dynamic languages absolutely trounce CPython for anything that isn't a microbenchmark.

pjmlp4y ago

Common Lisp and simlar languages are a prof of what is possible.

2 more replies

chaxor4y ago

Don't count on it - switch to Julia instead.

physicsguy4y ago

Julia is great if you can afford to spend 5 minutes sitting around for your session to load, but most people have things to do.

3 more replies

machinekob4y ago

Would love to but community is way to small for my use case :(

1 more reply

pjmlp4y ago

Yeah, looking forward for Julia to pressure Python.

nickmain4y ago

Mypyc is an alternative tool for speeding up type-annotated Python code. It doesn't help with calling existing C code, unfortunately.

[0] https://mypyc.readthedocs.io/en/latest/

victoryhb4y ago

I tried to learn Cython last year, but was thwarted by two issues: (1) its syntax was too ugly for my taste and support for the pure Python mode was immature; (2) performance bottlenecks were opaque and hard to profile (at least for beginners). I ended up picking up Nim, a language with Python-like syntax and C-like performance, and was productive within hours (literally). I never looked back.

bminusl4y ago

Maybe you will also be interested in Cython+: "Multi-core concurrent programming in Python" [0].

[0]: https://www.cython.plus/en/

Bostonian4y ago

Is the 2015 O'Reilly book on Cython by Kurt Smith still a good starting point to learn about it, or is it outdated?

c-fe4y ago

I have heard about Cython before but I have never actually used it. I have however used Numpy, Scipy and Numba. Are there any reasons to also consider Cython in combination with those other libraries? E.g. in which cases would Cython be considerably better than Numpy or Numba? My workload consists mostly of data science and statistics, running models and simulations.

dagw4y ago

Cython works great in conjunction with Numpy arrays and you can easily call numpy and scipy methods from within Cython. The big win comes when you have to do some operation to a numpy array that doesn't have a 'fast' path within numpy. If you ever find yourself in a situation where you have to loop over or apply any sort of custom operation to every element in a numpy array then Cython can be a huge win, especially since Cython also makes it possible to parallelise those loops.

The other place it shines is if you ever need to loop over an array of data that cannot easily be represented as numpy arrays, like strings or more complex structs. Here you can get significant speedups compared to python.

The third use of Cython I really like is with C and C++ interop. Sure there are lots of ways of calling C code from Python, but to me Cython is probably the quickest and cleanest.

Compared to Numba, it's harder to say. Numba, when it works, is easily as fast as Cython. However I find Numba hard to reason about and it's still a bit of a black box as to when and why it does and doesn't work. The nice thing about Cython is that it is pretty simple so you can easily reason about what it will do your code and how it will perform. It's been a long time since Cython 'surprised' me by performing much better or worse than I expected.

If you want to see Cython in action, take a look at the source code of scikit-image or scikit-learn. They implement many of their core algorithms in Cython

rich_sasha4y ago

Numpy and Scipy do the heavy lifting in fast compiled C / Fortran, but if you write a for-loop doing these things, it will still be (comparatively) slow.

Numba is a JIT, and only covers some of Numpy. I'd say it's amazing at how well it works, but it "only" covers certain aspects of the language. It's also a bit of an all-or-nothing - if it doesn't cover a certain class of syntax, it just won't JIT.

Cython is ahead-of-time compiled, and much more comprehensive. It turns Python, effectively, into C, and compiles it as a Python extension. The possible scope is thus much greater, and although Cython comes with built-in support for Numpy, it is much more broad in principle.

So... it's a very different set of trade-offs. Like with Numba, out of the box, with no changes, you will typically see a significant improvement (what's significant? From experience about 2x). You have much more scope for tweaking your code to speed things up - move some of the execution to C, disable bounds checking, outright call C libraries, etc. It comes with a suite of tools for analysing performance bottlenecks. It used to come with a lot of special syntax, which nowadays is done with annotations and decorators - much neater IMO. And of course, no run-time compilation delay, it's moved to, well, compilation time.

nicoco4y ago

Numba is better in my opinion for the use case you describe, less hassle.

However, (I think) cython is superior when:

- you want to distribute (eg as a pypi package) your code

- you want to interface with C/C++ code libs

I found out I almost never have to do this and did not touch cython since I started using numba.

gotaquestion4y ago

I like cython, but I think it pigeon-holes developers: i've seen hardware modules written in Cython, when they could easily switch to C++ and provide a library that could be used as an FFI in any language, but instead locked themselves (and users) into the narrow world of Python.

Is there a way to convert a Cython module(s) to C++, or at least a .o file? They are so dang close.

ok1234564y ago

https://github.com/Nuitka/Nuitka

blindseer4y ago

I wish Cython was a more popular option than choosing Julia or Go. Cython is great and you can get some real performance out of it.

The only drawback is that a Cython module still loads the CPython interpreter, so I personally prefer writing performance critical code in Rust instead. Writing in Julia has the same drawbacks of not being embeddable that writing in Cython does.

Julia has multiple dispatch and may seem more appealing but at scale it is a very slow language to develop in. And for scripts it takes FOREVER (try loading Plots, CSV, DataFrames, Makie etc every time you restart. It’s genuinely insane that that’s the norm.)

If the whole Python ecosystem was in Cython (i.e. numpy, scipy, etc) I’d never use another backend language again.

jokoon4y ago

Question: I found python "bindings" for SFML, written in cython, and patched them a bit.

I guess Cython is not really made to write bindings, but is it easier to write bindings with cython or cpython?

Galanwe4y ago

As someone who has been writing python bindings regularly for 10 years:

Writing bindings in Cython is much, much faster in terms of development time. It fits nicely and unintrusively in an already python packaged library. You can gently add some C functions or call C libraries in minutes.

You won't have a full control of what's happening though. Just have a look at the generated code and you'll see the mess of indirections that are generated.

Cython bindings become limited when you have to build more complex stuff though, going deeper than just calling some C functions. The typical case is when you have to actually handle the lifetime and borrowing of C native objects.

At that point, CPython will be the way to go, but it's much more code, and very error prone: you have to manually keep track of reference counting.

kubb4y ago

and i still have no idea what i could use it for...

dagw4y ago

It's great for speeding up 'hot' functions in your python code and makes it easy to call C libraries from python.

nurbl4y ago

It's easy to drop in Cython in an existing project where you need some performance, and start gradually "cythonizing" modules from the inside out. The rest of the code does not need to care.

With a bit of care (and benchmarking) you can get very respectable speed. The main drawback is that the further you go, the more C knowledge you need in order to not blast your own feet off.

If you're just after a bit more performance in general, a drop in solution like pypy might be enough.

ergo144y ago

We speed up our ML code 40 times using it.

DaedPsyker4y ago

Would that be in the data loading that you are getting the most benefit?

I'm curious, since most of the big libraries are already just cuda calls anyway but I'm always interested in anything to speed up the full process.

2 more replies

hansor4y ago

We had to parse dozeon of 20GB files daily with super complex structure and not in linear structure. With Cython (finally we migrated to Pypy) we gained around 20-60x speedup.

baq4y ago

it's C with Python syntax and syntactic sugar for Python objects on C level, including refcounting, which is the hard part.

if you successfully use numba, probably nothing that you couldn't already do.

if you want something that lives much closer to C, it's perfect.

j / k navigate · click thread line to collapse

70 comments

physicsguy4y ago

Pybind11 is also great, but quite different in aims - I feel like it's more like a project for C++ programmers wanting to expose functionality to Python.

jokoon4y ago

Any benchmarks for type annotations?

physicsguy4y ago

It's very fast to write for, that's the main benefit. Use it together with profiling and just pick off the slowest part first.

tristan9574y ago

For what it's worth I wrote Python bindings using Cython for our open source C-API storage engine and the performance was fairly close to on par with C.

machinekob4y ago

Python with type annotations isnt faster using python runtime.

But some packages can utilize it for higher performance but most of the time it'll be slower cause you need to parse extra information if you want to reuse it in pure python.

quietbritishjim4y ago

bobbylarrybobby4y ago

machinekob4y ago

Mb im blind ofc Cython would be faster :P

erwincoumans4y ago

I would recommend considering using NanoBind, the follow up of PyBind11 by the same author (Wensel Jakob), and move as much performance critical code to C or C++. https://github.com/wjakob/nanobind

Google Jax is another option, jitting and vectorizing code for TPU, GPU or CPU. https://github.com/google/jax

logicchains4y ago

>I would recommend considering using NanoBind, the follow up of PyBind11 by the same author (Wensel Jakob), and move as much performance critical code to C or C++

>Warp jits and runs your code on CUDA or CPU

wjakob4y ago

> Why would you recommend that? [..] It's a bunch of work for no benefit.

1 more reply

erwincoumans4y ago

Agreed, if you have bad performing spaghetti Python code, none of those tools are going to help indeed. Then I would rather rewrite it all in C/C++ instead of fiddling with Cython.

beltsazar4y ago

Or alternatively, PyO3 if you use Rust instead of C++: https://github.com/PyO3/pyo3

jcelerier4y ago

fermigier4y ago

I made an "Awesome Cython" page last year. I welcome pull requests (or you can fork it as you want):

https://github.com/sfermigier/awesome-cython

cb3214y ago

pjmlp4y ago

While it is nice that this option is available, it would be much better if Python itself would embrance the necessary runtime capabilties to not have to rely on it.

fname114y ago

This is not going to happen. GvR has successfully ignored Cython and PyPy for decades and has attached himself to a JIT project at Microsoft (has anything emerged?).

linspace4y ago

> who care about large legacy code bases

2 more replies

olau4y ago

While it's true speed was not a priority, I think most of those initiatives didn't try hard to work with upstream.

It's not as simple as just emitting machine code, though. To get something in the same magnitude of typical C code, you need to deduce types and peel away the boxing and unboxing layers.

f311a4y ago

To write fast Cython, you basically need to write in C and control everything including Python API calls. No runtime will help with this.

dagw4y ago

That is really only true if you want to squeeze every drop of performance out of Cython. For the first 80% of performance gains you don't have to go that deep.

throwaway8943454y ago

pjmlp4y ago

Common Lisp and simlar languages are a prof of what is possible.

2 more replies

chaxor4y ago

Don't count on it - switch to Julia instead.

physicsguy4y ago

Julia is great if you can afford to spend 5 minutes sitting around for your session to load, but most people have things to do.

3 more replies

machinekob4y ago

Would love to but community is way to small for my use case :(

1 more reply

pjmlp4y ago

Yeah, looking forward for Julia to pressure Python.

nickmain4y ago

Mypyc is an alternative tool for speeding up type-annotated Python code. It doesn't help with calling existing C code, unfortunately.

[0] https://mypyc.readthedocs.io/en/latest/

victoryhb4y ago

bminusl4y ago

Maybe you will also be interested in Cython+: "Multi-core concurrent programming in Python" [0].

[0]: https://www.cython.plus/en/

Bostonian4y ago

Is the 2015 O'Reilly book on Cython by Kurt Smith still a good starting point to learn about it, or is it outdated?

c-fe4y ago

dagw4y ago

The third use of Cython I really like is with C and C++ interop. Sure there are lots of ways of calling C code from Python, but to me Cython is probably the quickest and cleanest.

If you want to see Cython in action, take a look at the source code of scikit-image or scikit-learn. They implement many of their core algorithms in Cython

rich_sasha4y ago

Numpy and Scipy do the heavy lifting in fast compiled C / Fortran, but if you write a for-loop doing these things, it will still be (comparatively) slow.

nicoco4y ago

Numba is better in my opinion for the use case you describe, less hassle.

However, (I think) cython is superior when:

- you want to distribute (eg as a pypi package) your code

- you want to interface with C/C++ code libs

I found out I almost never have to do this and did not touch cython since I started using numba.

gotaquestion4y ago

Is there a way to convert a Cython module(s) to C++, or at least a .o file? They are so dang close.

ok1234564y ago

https://github.com/Nuitka/Nuitka

blindseer4y ago

I wish Cython was a more popular option than choosing Julia or Go. Cython is great and you can get some real performance out of it.

If the whole Python ecosystem was in Cython (i.e. numpy, scipy, etc) I’d never use another backend language again.

jokoon4y ago

Question: I found python "bindings" for SFML, written in cython, and patched them a bit.

I guess Cython is not really made to write bindings, but is it easier to write bindings with cython or cpython?

Galanwe4y ago

As someone who has been writing python bindings regularly for 10 years:

You won't have a full control of what's happening though. Just have a look at the generated code and you'll see the mess of indirections that are generated.

At that point, CPython will be the way to go, but it's much more code, and very error prone: you have to manually keep track of reference counting.

kubb4y ago

and i still have no idea what i could use it for...

dagw4y ago

It's great for speeding up 'hot' functions in your python code and makes it easy to call C libraries from python.

nurbl4y ago

It's easy to drop in Cython in an existing project where you need some performance, and start gradually "cythonizing" modules from the inside out. The rest of the code does not need to care.

With a bit of care (and benchmarking) you can get very respectable speed. The main drawback is that the further you go, the more C knowledge you need in order to not blast your own feet off.

If you're just after a bit more performance in general, a drop in solution like pypy might be enough.

ergo144y ago

We speed up our ML code 40 times using it.

DaedPsyker4y ago

Would that be in the data loading that you are getting the most benefit?

I'm curious, since most of the big libraries are already just cuda calls anyway but I'm always interested in anything to speed up the full process.

2 more replies

hansor4y ago

We had to parse dozeon of 20GB files daily with super complex structure and not in linear structure. With Cython (finally we migrated to Pypy) we gained around 20-60x speedup.

baq4y ago

it's C with Python syntax and syntactic sugar for Python objects on C level, including refcounting, which is the hard part.

if you successfully use numba, probably nothing that you couldn't already do.

if you want something that lives much closer to C, it's perfect.

j / k navigate · click thread line to collapse