The discovery process is crawling I suppose, but only within the same site. It is always assured that the higher speed process accesses data that we want to parse. It does no navigation.
Aside from having the physical capacity for the suite to run 24/7, our main challenge is speed. All data must be parsed, matched to other data in our database and published with the lowest possible latency.
We have pretty strict validation. Addressing errors in retrospect is preferable to publishing incorrect data.