> Note that no combination of any of the faster implementations of Python + Numpy libraries has ever been used at the most demanding level of scientific computation. That has always been Fortran, with some C and C++, and now Julia.
This still seems like an overstatement, but maybe it depends on what you mean by "most demanding level." I work on systems for the Rubin Observatory, which is going to be the largest astronomical survey by a lot. There's a bunch of C++ certainly, but heaps of Python. For example, catalog simulation (https://www.lsst.org/scientists/simulations/catsim) is pretty much entirely in Python.
Take a look at `lsst/imsim`, for example, from the Dark Energy collaboration at LSST: https://github.com/LSSTDESC/imSim.
Maybe this isn't the "most demanding" but I don't really know why.
> But if A and B are numpy arrays, then A + B will calculate the elementwise sum on a single core only, correct? It will vectorize, but not parallelize.
That's correct, but numba will parallelize the computation for you (https://numba.pydata.org/numba-doc/latest/user/parallel.html). It's pretty common to use numba's parallelization when relevant.