[0]https://github.com/DIYgod/RSSHub/
edit: but the few I tried did not have the CSS Selector Bridge enabled so go with the original link or archive of it.
I have build my own solution that is automagical at https://awesomegoat.com/ but I am running into next set of issues which are various scraping protections. It seems that reasonable RSS gateway today needs to include botnet of residential proxies just to read content on the internet.
It works pretty well, although every once in a while Goodreads hiccups, and then RSS bridge gives me a bunch of "new posts" that are actually error messages.
* Generate RSS feeds from book series
* Filter out translations
* Filter out compilations (not sure if this one is really plausible)
Any pointers on how I might accomplish some of those?
I use it quite successfully to get data out of undocumented APIs and out into RSS.
You're bound at the mercy of rate-limiting firewalls (so you'll have to rotate proxies if you intend on using this heavily) on top of the standard CloudFront bot detection recaptcha, and div-obfuscation (a good example of this is Facebook).
At large scale, like the kind of traffic I started seeing when I ran a public rss-bridge Instagram/Telegram bridge - rate limits are unavoidable.
So using RSS Bridge to generate feeds from large platforms is often a lot more reliable than the typical scraping script I'd code up myself for other sites.
I've written two blog posts about how we go about using CSS selectors when working with Feed Creator. Might be useful for those looking to do the same with RSS-Bridge.
How to turn a webpage into an RSS feed using Feed Creator
Part 1: https://www.fivefilters.org/2021/how-to-turn-a-webpage-into-...
Part 2 (using more advanced selectors): https://www.fivefilters.org/2021/how-to-turn-a-webpage-into-...
- splitting the full feed by theme of the article into separate feeds and at the same time
- remove a few keywords and also
- get article length and split into a long / short feed
- Or maybe get what you used to have on some news sites - subscribe only to a specific author instead of getting bombarded with hundreds of items in a feed
I don't know any service that does that automatically but it's attainable to have a generic way of doing what you need. That's the power of rss-bridge: make the feed you want from content that already exists
For me, "CSS selectors" always seems like a deceptive term, if it means selecting HTML tag elements. What if the website does not use styling.
I read 1000s of websites, including all HN submissions, without using CSS. When I want to extract information from a website, I focus on patterns in the page. They might be HTML, they might be style elements, but they could be anything. I never assume that all websites will wrap the information I want in certain elements. There is a ridiculous amount of random variation amongst websites.
The key here is that it uses selectors, not the style sheets themselves.