Getting source proxy lists of high-reputation networks is just $$ and a simple API integration game anymore.
As for stuff like Luminati, if you’re being sufficiently sneaky, chances are you’re not going to snowball in the first place. I’m not sure why anyone would bother paying for Luminati to crawl sites like the one for which I work, but I have seen people use it to scam.
We can’t really be bothered to waste resources blocking well-behaved crawlers. Just keep it at a reasonable pace, respect errors (especially 429, but also 410 and 503), and ensure we have a way to contact you if things go wrong.
Frankly just any errors - if I see more than say 5-10 jobs fail within a 2-3 minute time period things are designed to wait X time, try again... and stop if they're still encountering errors and ping me to come in and investigate.
Faulty retry logic is just as dangerous as the forked/distributed run-off situation.