Nuitka — A Python Compiler (opens in new tab)

(kayhayen24x7.homelinux.org)

55 pointslehmannro15y ago33 comments

33 comments

I sometimes see people ask about translating a language like Python to Common Lisp (or another language that can be compiled) as a kind of optimization.

The problem, in general, isn't that Python and languages like it don't have a compiler, it's that the semantics of the language are hostile to good performance by traditional means of compilation. To do what the programmer requests requires doing things at runtime that are hard to make fast. That's why things like tracing JITs are being used for things like JavaScript.

The speedup you get from actually compiling Python programs is because the CPython interpreter is pretty awful, not because compilation is a magic solution to performance problems. The IronPython guy gave a nice explanation of this at OOPSLA 2007's Dynamic Languages Symposium - maybe things have changed in CPython since then.

pbiggar15y ago

> The problem, in general, isn't that Python and languages like it don't have a compiler, it's that the semantics of the language are hostile to good performance by traditional means of compilation.

You are correct, but this approach (using libpython) is probably as good as you can do for static compilation. I did my PhD on a very similar compiler, just for PHP (phc - http://phpcompiler.org). Something like 90% of variables had known static types (and that excludes results of arithmetic which could be either reals or ints).

The best approach would be a hybrid. Throw as much static stuff at it as you can, then encode it for a JIT to use later. That's what I'm planning when I (eventually) get round to writing my language.

acqq15y ago

They however got a small, just 50% speedup, which doesn't sound like any static types were really used. They'd be able to get something like 100 times faster FPU calculations, for example if they'd knew the variable is only used as the number and nothing else. I know that factor since I measured the overhead of CPython in FPU calculations, and you can also see http://shootout.alioth.debian.org/u32q/benchmark.php?test=al....

I guess they call everything that would be called for each variable reference in CPython (including checks like "is this a string or a number or...") and that they save more or less just in having the calls encoded one after another, and maybe some internal arguments needed for that, but not in knowing the types.

Goladus15y ago

Compilation also helps distribution, though. Distributing a single compiled binary for a particular platform is a lot easier than telling people they need to have some particular version of Python and libraries installed.

zachbeane15y ago

I hear that idea put forth sometimes. When was the last time you downloaded a single compiled binary from someone? I don't think I've ever done that, except maybe for darcs.

2 more replies

eru15y ago

Though you could just bundle your program up with the interpreter and all the libraries to make this work. No translation to binary necessary.

1 more reply

truiu15y ago

The licence (GPLv3) limits its use a bit - at least for people who prefer other licences like BSD or MIT.

The generated C++ source contains the following comment:

// This code is in part copyright Kay Hayen, license GPLv3. This has the consequence that // your must either obtain a commercial license or also publish your original source code // under the same license unless you don't distribute this source or its binary.

mycroftiv15y ago

I'm confused by this. I thought the license used by a compiler had no effect on the licenses that could be used for programs compiled by it. If the author of Nuitka is claiming that software compiled by Nuitka is in fact a derivative work of Nuitka, that is indeed very problematic.

reitzensteinm15y ago

That was my immediate assumption too, but part of the compiler's output is going to be some kind of a runtime library. If that itself is GPL 3 and the code generated by the compiler statically links to it, then I'm pretty sure a case could be made that the compiler's output is a derivitive work. Kind of sneaky and non intuitive, though.

This is all way out of my area of expertise, so take it with a grain of salt.

2 more replies

dmm15y ago

GCC has a runtime exception for exactly this reason.

eru15y ago

Couldn't you construct such a license?

stygianguest15y ago

Personally I have more faith in JITs for dynamic languages such as Python. It just seems a more natural match. That said, I'm sure there are many Python programs out there that are essencially static.

Did anybody else notice the large number of compilers/interpreters/tools built for python in comparison to many other languages out there? I think it might partly be the advantage of having an easy to parse language with well defined semantics.

vanschelven15y ago

"I think it might partly be the advantage of having an easy to parse language with well defined semantics."

Either that or the combination of a popular language and poor performance

pbiggar15y ago

I think you're right about the popular language part. But also important is that it's popular amongst really talented hackers. By contrast, PHP is a million times more popular than Python, but has almost nobody building tools for it. The overlap between the type of people who like PHP and those who have the ability and desire to hack on tools for it, is very very small.

scg15y ago

Here's a simple test for the curious. It's not a benchmark.

  import math
  num_primes = 0
  for i in xrange(2, 500000):
    if all(i % j for j in xrange(2, int(math.sqrt(i)) + 1)):
      num_primes += 1
  print num_primes

Here's the code above translated to C++ by Nuitka: http://pastebin.com/41ueyTEB

  # CPython 2.6.6
  $ time python hello.py 
  41538
  real	0m6.377s
  user	0m6.350s
  sys	0m0.020s

  # Nuitka & g++-4.5
  $ time ./hello.exe
  41538
  real	0m4.573s
  user	0m4.270s
  sys	0m0.300s

DisposaBoy15y ago

i ran each test 3 times and picked the fastest. I shutdown all servers(mysql, apache), closed music player to minimize the system effect on the test.

Python:

    real    0m12.775s
    user    0m12.636s
    sys    0m0.037s

Nuitka:

    real	0m7.096s
    user	0m6.930s
    sys	0m0.093s

Lua:

    real	0m2.641s
    user	0m2.410s
    sys	0m0.010s

LuaJit:

    real	0m0.613s
    user	0m0.600s
    sys	0m0.000s

from experience experimenting with a toy scripting language where tried to make it as minimal as possible, essentially every operation was a function call so it just figured out what the right function was and called the corresponding function directly via a C++ function pointer. In the end it was slightly faster than LuaJit at doing some math for 100,000 times. It was a file with the same operation pasted 100,000 times which tested parsing speed... anyway...

TL;DR If you want to know why Python and Nuitka are so much slower, run the test through callgrind or something that reports the number of functions calls being made. You will find Python(possibly Nuitka as well) making billions of functions and allocations while lua's count in maybe a couple hundred million at most.

Also, I tested my Lua code converted to Python but it only shaved less than 1 second of fastest so no difference.

test.lua:

    local sqrt = math.sqrt
    num_primes = 0
    for i = 2, 500000 do
        n = 1
        for j = 2, sqrt(i) do
            if (i % j) == 0 then
                n = 0
                break
            end
        end
        num_primes = num_primes + n
    end
    print (num_primes)

ableal15y ago

In this Python-to-C++ vein, there's also Shed Skin ( http://shed-skin.blogspot.com/ ), which has been at it for a few years.

codedivine15y ago

I was developing a compiler called unPython for a while but I have not yet released it openly. Plan to do so "soon". It is a compiler for annotated subset of Python (particularly NumPy, rest of it being very slow or not supported) to a C++ Python module. Will post here once I release it.

hogu15y ago

Why wouldn't you just use Cython? it has good numpy integration as well as being quite good for everything else?

beambot15y ago

Should also just check out Psyco: http://psyco.sourceforge.net/

"Psyco is a Python extension module which can greatly speed up the execution of any Python code."

jaen15y ago

Psyco is unmaintained, the developer of it (Armin Rigo) is working on PyPy (a Python compiler written in Python) instead: http://codespeak.net/pypy/dist/pypy/doc/

They are already seeing quite nice results for computation-heavy benchmarks with the (tracing) JIT: http://speed.pypy.org/comparison/

Jach15y ago

While psyco is great you can't really rely on it anymore since it's 32-bit only. Though simply using it if it's available does help slower, older machines that need the speedup, so I wouldn't write it off completely.

Hopefully Unladen Swallow will make even more progress over the rest of the year.

njharman15y ago

50% speed up, or even 2x, 3x matters to a few niches and users. But for the vast majority it's not significant enough to change/accept limitations (not 2.7/3.1)/accept risks (is this as tested/supported as CPython). We'll just wait for CPython's regular speed improvements and/or effective processing power to increase another order of magnitude.

Research like this is very important. I just don't think it's wise to be viewing it as a silver bullet for use in production.

pbiggar15y ago

Ah, it's phc (http://phpcompiler.org) for Python. Excellent!

j / k navigate · click thread line to collapse

33 comments

zachbeane15y ago

I sometimes see people ask about translating a language like Python to Common Lisp (or another language that can be compiled) as a kind of optimization.

pbiggar15y ago

> The problem, in general, isn't that Python and languages like it don't have a compiler, it's that the semantics of the language are hostile to good performance by traditional means of compilation.

The best approach would be a hybrid. Throw as much static stuff at it as you can, then encode it for a JIT to use later. That's what I'm planning when I (eventually) get round to writing my language.

acqq15y ago

Goladus15y ago

zachbeane15y ago

I hear that idea put forth sometimes. When was the last time you downloaded a single compiled binary from someone? I don't think I've ever done that, except maybe for darcs.

2 more replies

eru15y ago

Though you could just bundle your program up with the interpreter and all the libraries to make this work. No translation to binary necessary.

1 more reply

truiu15y ago

The licence (GPLv3) limits its use a bit - at least for people who prefer other licences like BSD or MIT.

The generated C++ source contains the following comment:

mycroftiv15y ago

reitzensteinm15y ago

This is all way out of my area of expertise, so take it with a grain of salt.

2 more replies

dmm15y ago

GCC has a runtime exception for exactly this reason.

eru15y ago

Couldn't you construct such a license?

stygianguest15y ago

vanschelven15y ago

"I think it might partly be the advantage of having an easy to parse language with well defined semantics."

Either that or the combination of a popular language and poor performance

pbiggar15y ago

scg15y ago

Here's a simple test for the curious. It's not a benchmark.

  import math
  num_primes = 0
  for i in xrange(2, 500000):
    if all(i % j for j in xrange(2, int(math.sqrt(i)) + 1)):
      num_primes += 1
  print num_primes

Here's the code above translated to C++ by Nuitka: http://pastebin.com/41ueyTEB

  # CPython 2.6.6
  $ time python hello.py 
  41538
  real	0m6.377s
  user	0m6.350s
  sys	0m0.020s

  # Nuitka & g++-4.5
  $ time ./hello.exe
  41538
  real	0m4.573s
  user	0m4.270s
  sys	0m0.300s

DisposaBoy15y ago

i ran each test 3 times and picked the fastest. I shutdown all servers(mysql, apache), closed music player to minimize the system effect on the test.

Python:

    real    0m12.775s
    user    0m12.636s
    sys    0m0.037s

Nuitka:

    real	0m7.096s
    user	0m6.930s
    sys	0m0.093s

Lua:

    real	0m2.641s
    user	0m2.410s
    sys	0m0.010s

LuaJit:

    real	0m0.613s
    user	0m0.600s
    sys	0m0.000s

Also, I tested my Lua code converted to Python but it only shaved less than 1 second of fastest so no difference.

test.lua:

    local sqrt = math.sqrt
    num_primes = 0
    for i = 2, 500000 do
        n = 1
        for j = 2, sqrt(i) do
            if (i % j) == 0 then
                n = 0
                break
            end
        end
        num_primes = num_primes + n
    end
    print (num_primes)

ableal15y ago

In this Python-to-C++ vein, there's also Shed Skin ( http://shed-skin.blogspot.com/ ), which has been at it for a few years.

codedivine15y ago

hogu15y ago

Why wouldn't you just use Cython? it has good numpy integration as well as being quite good for everything else?

beambot15y ago

Should also just check out Psyco: http://psyco.sourceforge.net/

"Psyco is a Python extension module which can greatly speed up the execution of any Python code."

jaen15y ago

Psyco is unmaintained, the developer of it (Armin Rigo) is working on PyPy (a Python compiler written in Python) instead: http://codespeak.net/pypy/dist/pypy/doc/

They are already seeing quite nice results for computation-heavy benchmarks with the (tracing) JIT: http://speed.pypy.org/comparison/

Jach15y ago

Hopefully Unladen Swallow will make even more progress over the rest of the year.

njharman15y ago

Research like this is very important. I just don't think it's wise to be viewing it as a silver bullet for use in production.

pbiggar15y ago

Ah, it's phc (http://phpcompiler.org) for Python. Excellent!

j / k navigate · click thread line to collapse