My amd64 and ARM CPUs also don't have it.
On languages where the algorithm is an implementation detail, one can make use of reference counting with a cycle collector, which are just a few hundred lines.
Implementing one is pretty simple, making it perform well is another matter.