As far as the indexing issues you mentioned, I'm not a python guy, so I can't recommend a drop in solution. But I have indexed pretty massive string datasets, and you definitely want to select an index method that was specifically devised for the key/value datatype you intend to ingest. So hashmaps are probably out :) It would also pay to adapt either the implementation or the data itself. For example, say you had a bunch of font file name you wanted to index (XyzSans.otf, AsdfSans.otf, AsdfSerif.otf, etc): a prefix tree would be a pretty good fit, especially if you reversed all the strings.
1. Postgresql supports indexing to make regex searches faster from v9.3 (the current version is 13): https://www.postgresql.org/docs/12/pgtrgm.html . However, it makes them faster only in some (simpler) cases, in others we are back to plain old full table scan.
2. Speaking of full table scan, it is not that bad of an option, the average list of file paths for a torrent file is approximately 6 KB, that's 300 GB per 50 mln torrents, that's completely within reach of some VPS providers (like BuyVM). Still, up to a few minutes per one pass.
3. So, unless I find some efficient technology for regex search in large volumes of text, I would be able to implement only "offline" search (submit query, receive link to search results in 5-30 minutes, depending on server load).
4. Another option would be to provide listings of files for downloading as csv files (torrent_id, filepath) and let user to use command line for searching (or some text viewer/editor, though most popular ones still load all contents into ram). Compressed size would be around 1 KB per one torrent file, that's near 50 GB per 50m torrents.
Give that second link a closer look if you change your mind. It demonstrates a way of indexing 1.6 billion 80B strings in 24GB, and then returning the result of a fixed string search in 100ms. That is the reward you get when you venturing outside of the LAMP stack: less resource consumption, greater performance, increased utility.