Yes, good guess! That's the size we have after deduplication across projects at
https://www.softwareheritage.org/ . We archive all the source code we can find; and would like to support some sort of full-text search on it at some point, so Glean looks interesting