I don't expect organizations to need to generate 1T output tokens, but 1T input tokens is common. Consider developers at a large company running queries with their entire codebase as context. Or lawyers plugging in the entire tax code to ask questions about. Each of them running dozens of queries per day on multi-millions of context input, it's going to add up quick.
Wouldn't a lawyer wanting to run queries against the entire tax code have a model that was fine-tuned on all of that data though? I mean, vs. doing RAG by sending the entire tax code on each request.