Tracing GCs get to amortize their deletion costs over time - in theory a tracing GC with two equal spaces (thus at least 2X overhead) can just copy used data to the currently unused partition and switch over, later overwriting the former region. This combined with thread local allocation buffers where an allocation is only a pointer bump is a really great combo. There are many smart modifications, but this auto-defragments as well.
Now if you want general ref counting you have to use atomic counters, and those will trash your performance on modern machines beyond fixing. And then we didn’t even mention that big object graphs will have to be recursively freed, an overhead that can’t be amortized in this case. Oh and you do need a tracing step one way or another to free cycles.