now that it is kind of done, i want to make it faster :) for example, having a home-grown vector_3d class kind of sucks (performance wise). it might be better to have vector_3d be actually based on, say, numpy.array ? once that is done, and i start seeing some real improvement, it might be possible to go the other route as well i.e. write the hot-spots in c++/c, and interface that with python.
or go with lua all the way ? or maybe try a hybrid approach (which would allow you to see how embeddable the language really is)
possibilities are endless, and as you so rightly said, gratification is instantaneous :)