> But it also frees memory immediately, meaning that many processes will appear to use less memory (unless fragmentation is an issue.)
I think where we disagree is I that I of course do assume fragmentation is an issue, and also maybe what "immediately" means in this case. The type of total memory consumption that matters when you look in say Task Manager is when entire pages of memory are returned to the OS, not when individual objects are marked unused/free. In practical concerns, fragmentation will always delay entire pages returning to the OS. Reference counted languages build a lot of tricks to avoid fragmentation sure, but then if you are also trying to use a "mark-and-sweep" heap you lost most of those optimizations in part because you are then already assuming fragmentation is a problem to solve.
> Don't forget that GC often adds memory overhead too: IE, mark and sweep sets a generation counter in each object that it can reach
I did mention it, but also that GCs have advanced from "include a generation counter in each object" to things like generation bitmaps where that data is stored outside of the objects themselves and then from there further optimized into even more "compressed" forms ala Bloom filters (they maybe don't track every object, but every cluster of objects, or just objects crossing generation boundaries, or they use hash buckets and probability analysis, and many of these structures don't need to be permanent but are transient only during specific types of garbage collections; there has been a lot of work in the space and many decades of efficiencies studied and built). It's still overhead, but it is now a very different class of overhead from reference counts.