Assuming you use a set of slabs of fixed size objects and keep free objects in a linked list, both malloc and free are trivial O(1) operations.
Destructors with cascading deletions can take time bounded only by memory allocation, but you can solve that for instance by destroying them on a separate thread, or having a linked list of objects to be destroyed and destroying a constant number/memory size of them on each allocation.
Plus, the free list has to occasionally be walked to actually free pages back to the OS. If you don't, then memory is never freed by free(), it is only marked for potential reuse.
There are several popular implementations of malloc (and their corresponding free), and they are all quite complex and have different tradeoffs. And, in fact, none of them is any more suitable for high performance or realtime code than a GC is. The golden rule for writing this type of code is to just never call malloc/free in critical sections.
And I will note that probably the most commonly used malloc, the one in GNU's glibc, actually uses Linux locks, not atomics. Which means that you are virtually guaranteed to deadlock if you try to use malloc() after fork() but before exec() in a multi-threaded process.
Which means calculating how long free going to take for a given object is incredibly difficult. Which is my entire point. Of course it's not non-deterministic unless someone 's destructor call s some random code but it is not easy to calculate and in the real world is basically unpredictable.
Spitting up a separate thread and doing an unknown amount of work is no more predictable than doing on the main thread.
Fundamentally destructors can do whatever the hell they want which means you never know how long destruction is going to take.
And of course if you're freeing up an object that has a linked list inside of it and you need to clear out the link list, now you're jumping all around memory and that's never fun, and the run time sure as heck is not constant based on what needs to be paged in and out and what exists in cache lines where.
I'm not saying these problems are unsolvable for a given use case and there are good reasons why custom allocators abound throughout the industry, My main point is that manual memory management does not necessarily lead easy to predict run times for freeing up memory.
This especially true because a lot of developers think that malloc and free are just magic somehow.
In reality, garbage collectors often aren't that complicated, and when it comes to large complicated user applications, you're going to be spending a lot of time in the allocator and deallocator no matter what memory management schema you choose to use.
(Edit: and then there's fragmentation which is the death of many manually managed memory systems! A naively Java or c-sharp application can run for far longer than a naively written C++ application!)
You can run your garbage collector on a separate thread, too. Or use a real time garbage collector.