Strong disagree, this is a skill issue. I have written C++ modules for Python with pybind11 to speed up some code significantly but ended up reverting to pure Python once I learned how to move memory efficiently in Python. Numpy is very good at what it does and if you really have some custom code that needs to go faster you can run it externally through something like pybind11. It is a skill issue mostly. If you are writing ultra low latency code then you're right. You can make Python really fast if you are hyper aware of how memory is managed; I recommend tracemalloc. For instance, instead of pickling numpy arrays to send them to child processes, you can use shared memory and mutexes to define a common buffer which can be represented as a numpy array and shared between parent and child processes. Massive performance win right there, and most people simply would have never realized Python is capable of such things.