Good point. I was wondering how they'd deal with that in the actual implementation.
I think you got the answer though: They match HTTP origins instead of IP addresses - so I imagine, you could do the same in step 2: Do a HTTP HEAD query to the search word and two additional ones to random hostnames, following redirects. If the final origins are the same, there is fakery going on.
A problem with this could be unexpected HEAD requests to actual internal hosts: There is no guarantee an internal host that was never meant to receive HEAD requests would react gracefully or in any way predictable to one.
I'm not sure how they solve this currently. Maybe this could at least be mitigated by only sending the HEAD request to the search word host if there is reasonable suspicion requests are being redirected - e.g. only if the two random hosts resolved and were both redirected to the same origin.
Finally, you could cut all of this short by also connecting to (search word):443 and trying to initiate a TLS session handshake. If the host answers, you know it's probably a genuine internal host that talks HTTPS and you don't need to do any additional probes. (And you can also abort the handshake and don't ever need to send an actual HTTP request to the host)