(a) writing your research as a bunch of Fortran or C kernels and intrgrate them with automatic bindings such as f2py? Especially Fortran is a great fit for NumPy datastructures - because they are the same.
(b) Use high performance python environments like Numba, NumbaPro (GPU) or even Cython?
If I have to write everything important and difficult in Fortran or C kernels, then I'm losing the productivity of the higher level language. And in this case it would be writing kernels from scratch instead of tweaking high level package code if I want to change data structures and implement new algorithms for existing problems. This makes it hard to directly re-purpose the internal linesearch code from the SciPy optimizers without Python in-between into my own Fortran optimizer, or add new projections and stuff to existing SciPy implementations and have that be a separately maintained package but dependent on SciPy.
Even just talking about this starts to sound like a mess from the past, but fortunately this problem is already solved by Julia so there's no reason to have to deal with it.
>(b) Use high performance python environments like Numba, NumbaPro (GPU) or even Cython?
Well the biggest thing against Numba is once again you have to write everything from scratch if you want to do things non-trivial. They finally have a way for you to extend Numba to add a class, but you don't by default get the whole Python standard library compatible with it, and classes defined in user's packages aren't automatically compatible with it. So you have to start re-writing basic data structures and other types to get the full algorithm JIT'd. And Python's standard library isn't all setup with @generated_jit to then dispatch everything effectively. So you end up having to build tons of things on your own because of the lack of library support, and it adds development time. Then there are restrictions on allocations, no easy way to do distributed parallelism, take in functions to my functions to inline them (this is important for writing things like ODE solvers, but with Numba you have to do a hack to make this almost work but it won't optimize this well), etc. Essentially it hits a feature wall when it gets past microbenchmarks and to real programming. This all hearkens back to your first comment:
>Let's say you're exploring with NumPy
Exactly: if you're content with building everything off of arrays yourself then Numba will do fine. But its limitations significantly impact the ability to compose full packages in native Python with constant propagation, interprocedural optimization, etc.: all the stuff you you depend on static languages doing in order to optimize large-scale software. On top of that, it's a difficult dependency to install. So in the end, it's great for JITing little blurbs, but I quickly hit the edge when trying to fight it.
Cython is similar to Numba, where you can setup your own fused types and extension classes, re-writing the standard library, etc., but now you've taken to writing essentially the stdlib for a language (or at least the parts you need) just to be allowed to use this "easy accelerator" for a demanding project.
If it was 10 years ago these issues would be okay, but these days it stands out as a productivity and performance barrier. Julia has already solved this problem, and hell even LuaJIT is easier to build this kind of software with. Or if you're going to fuss around with compiled code, we can use D or Rust, or even go all the way back to C++ and Fortran. In fact, it's quite telling that even fairly recent big "Python" projects like TensorFlow, PyTorch, FEniCS, etc. all built their whole package in C++ and made Python bindings instead of trying to use these accelerators. Meanwhile people are punching out large Julia projects are using pure Julia without hesitation.
To an end user of packages it really doesn't matter how it was implemented, but when your work and research is in developing these kinds of large software projects, these limitations make a huge difference.