You weren't doing the same thing. Julia's `pmap` and `@parallel` are multiprocessing. These will parallelize across multiple computers, like multiple nodes of a cluster. It has much larger scaling potential (it's more like MPI) but at the cost of a larger overhead (like MPI). It for example was used to achieve >1 petaflops in the Celeste.jl application on the Cori supercomputer.
Cython's parallelism is via OpenMP which is shared memory multithreading. Of course multithreading is faster, but it's restricted to a single computer. Julia does have multithreading as well via `Threads.@threads`. This is shared memory and restricted to a single computer just like Cython, and will have a lot lower overhead than `pmap` and `@parallel`. If you want to directly compare something to Cython's parallelism, this is what you should be looking at.
On a side note, it looks like Cython doesn't have any native multiprocessing or multinode parallelism that would be the direct comparison to `pmap` or `@parallel`.