What infrastructure do news aggregators typically use to do the crawling, parsing, indexing, extracting, storing/format in database, etc.? Are there open source news aggregators so that I can learn how those problems are solved?
There are a few still active or not : Gregarius, Lilina, TinTinyRSS, RSSLounge and the more recent selfoss from the same developper.
More generally you would search for new informations on the topic with something like "self hosted rss reader" in a search engine