No torrents at all in this data, all publicly available/open access. Mostly scientific pdfs, and a good portion of those are scans not just text. So the actual text amount is probably pretty low compared to the total. But still, a lot more than 8TB of raw data out there. I bet the total number of PDFs is close to a petabyte if not more.