That's true in literally any language. Some languages require inlined assembly. Others require preprocessor directives. In almost all languages, you need to understand the difference between stack and heap, know how to minimize allocations, know how to minimize dynamic dispatch, know how to efficiently structure cache-friendly memory layouts. And of course, data structures & algorithms 101.
In terms of performance, Julia provides the following:
1. Zero-cost abstractions. And since it has homoiconic macros, users can create their own zero-cost abstractions, e.g. AoS to SoA conversions, auto-vectorization. Managing the complexity-performance trade-off is critical. But you don't see that in micro-benchmarks.
2. Fast iteration speed. Julia is optimized for interactive computing. I can compile any function into its SSA form, LLVM bytecode, or native assembler. And I can inspect this in a Pluto notebook. Optimizing Julia is fun, which is less true in other languages.