The problem, in general, isn't that Python and languages like it don't have a compiler, it's that the semantics of the language are hostile to good performance by traditional means of compilation. To do what the programmer requests requires doing things at runtime that are hard to make fast. That's why things like tracing JITs are being used for things like JavaScript.
The speedup you get from actually compiling Python programs is because the CPython interpreter is pretty awful, not because compilation is a magic solution to performance problems. The IronPython guy gave a nice explanation of this at OOPSLA 2007's Dynamic Languages Symposium - maybe things have changed in CPython since then.
You are correct, but this approach (using libpython) is probably as good as you can do for static compilation. I did my PhD on a very similar compiler, just for PHP (phc - http://phpcompiler.org). Something like 90% of variables had known static types (and that excludes results of arithmetic which could be either reals or ints).
The best approach would be a hybrid. Throw as much static stuff at it as you can, then encode it for a JIT to use later. That's what I'm planning when I (eventually) get round to writing my language.
I guess they call everything that would be called for each variable reference in CPython (including checks like "is this a string or a number or...") and that they save more or less just in having the calls encoded one after another, and maybe some internal arguments needed for that, but not in knowing the types.
The generated C++ source contains the following comment:
// This code is in part copyright Kay Hayen, license GPLv3. This has the consequence that // your must either obtain a commercial license or also publish your original source code // under the same license unless you don't distribute this source or its binary.
This is all way out of my area of expertise, so take it with a grain of salt.
Did anybody else notice the large number of compilers/interpreters/tools built for python in comparison to many other languages out there? I think it might partly be the advantage of having an easy to parse language with well defined semantics.
Either that or the combination of a popular language and poor performance
import math
num_primes = 0
for i in xrange(2, 500000):
if all(i % j for j in xrange(2, int(math.sqrt(i)) + 1)):
num_primes += 1
print num_primes
Here's the code above translated to C++ by Nuitka: http://pastebin.com/41ueyTEB # CPython 2.6.6
$ time python hello.py
41538
real 0m6.377s
user 0m6.350s
sys 0m0.020s
# Nuitka & g++-4.5
$ time ./hello.exe
41538
real 0m4.573s
user 0m4.270s
sys 0m0.300sPython:
real 0m12.775s
user 0m12.636s
sys 0m0.037s
Nuitka: real 0m7.096s
user 0m6.930s
sys 0m0.093s
Lua: real 0m2.641s
user 0m2.410s
sys 0m0.010s
LuaJit: real 0m0.613s
user 0m0.600s
sys 0m0.000s
from experience experimenting with a toy scripting language where tried to make it as minimal as possible, essentially every operation was a function call so it just figured out what the right function was and called the corresponding function directly via a C++ function pointer. In the end it was slightly faster than LuaJit at doing some math for 100,000 times. It was a file with the same operation pasted 100,000 times which tested parsing speed... anyway...TL;DR If you want to know why Python and Nuitka are so much slower, run the test through callgrind or something that reports the number of functions calls being made. You will find Python(possibly Nuitka as well) making billions of functions and allocations while lua's count in maybe a couple hundred million at most.
Also, I tested my Lua code converted to Python but it only shaved less than 1 second of fastest so no difference.
test.lua:
local sqrt = math.sqrt
num_primes = 0
for i = 2, 500000 do
n = 1
for j = 2, sqrt(i) do
if (i % j) == 0 then
n = 0
break
end
end
num_primes = num_primes + n
end
print (num_primes)"Psyco is a Python extension module which can greatly speed up the execution of any Python code."
They are already seeing quite nice results for computation-heavy benchmarks with the (tracing) JIT: http://speed.pypy.org/comparison/
Hopefully Unladen Swallow will make even more progress over the rest of the year.
Research like this is very important. I just don't think it's wise to be viewing it as a silver bullet for use in production.