Zig, unlike C++ and Rust, doesn't need an optimized general purpose allocator in order to be fast. Zig outperforms its peers despite currently having a slow GPA in the standard library because the language encourages programmers down a path that avoids boxing the shit out of everything, which is inherently slow even if you have a global allocator optimized for this use case.
Rust switched away from Jemalloc because it uses global allocation for everything. Zig's convention of explicit allocator argument passing means such a compromise will never be needed.
As for "beating trees with a stick", I'll probably end up doing what I did for WebAssembly, which is to ignore the preexisting work and make my own thing that is better. Here's my 160-line wasm-only allocator that achieves the trifecta: high performance, tiny machine code size, and fundamentally simple.
https://github.com/ziglang/zig/blob/c1add1e19ea35b4d96fbab31...