I think it doesn't need to be specifically text based, but given that LLMs are usually trained on primarily text (at least currently), I'm not sure they'd be meaningfully able to generate binary directly.
As for using DBs, that's certainly an option (i.e. langchain and such), but at some point you do still need to bring in the data inside the context, so I'd say it's still interesting to consider what would be an efficient way to represent that data via text.