Common Crawl already talks about allowed use of the data in their FAQ, and in their terms of use:
https://commoncrawl.org/terms-of-use/ https://commoncrawl.org/faq
While this doesn't currently discuss AI, they could. This would allow non-AI downstream consumers to not be penalized.