Show HN: Curl modified to impersonate Firefox and mimic its TLS handshake (opens in new tab)

(github.com)

249 pointslwthiker4y ago57 comments

57 comments

I run a MITM proxy for adblocking/general filtering and within the past little while I've noticed CloudFlare and other "bot protection" tends to get me blocked out of increasingly more sites I come across in search results, so this will be very useful for fixing that.

However, I should caution that in this era of companies being particularly user-hostile and authoritarian, especially Big Tech, I would be more careful with sharing stuff like this. Being forced to run JS is bad enough; profiling users based on other traits, and essentially determining if they are using "approved" software, is a dystopia we should fight strongly against. Stallman's Right To Read comes to mind as a very relevant warning story.

fake-name4y ago

Cloudflare is likely one of the worst things that has happened to the internet in recent history.

Like, I get the need for some protective mechanisms for interactive content/posting/etc, but there should be zero cases where a simple HTTP 200 GET requires javascript/client side crap. If they serve me a slightly stale version of the remote resource (5 minutes/whatnot) that's fine.

They've effectively just turned into a google protection racket. Small/special purpose search/archive tools are just stonewalled.

buro94y ago

You can't turn it off as a Cloudflare customer either.

The best you've got is "essentially off" but that wording is such because even with everything disabled there are still edge cases where their security will enforce a JS challenge or CAPTCHA.

1 more reply

rtpg4y ago

Not to be too dismissive of this, but for companies trying to just run a service and getting constantly bombarded by stuff like DDoS issues, Cloudflare and its ilk lets them service a large portion of "legitimate" users, compared to none.

I don't really know how you resolve that absent just like... putting everything behind logins, though.

1 more reply

charcircuit4y ago

> If they serve me a slightly stale version of the remote resource (5 minutes/whatnot) that's fine.

Not all sites are configured to do this. Some pages are expensive to render and have no cache layer.

2 more replies

cma4y ago

I've noticed even GitHub has a login wall now for comments on open source projects. They truncate them if you aren't logged in, similar to reddit on mobile, instagram, twitter, etc. Hopefully the mobile version doesn't start pushing you to install some crappy apps where you can't use features like tabbed browsing, tab sync with another machine, etc.

caulk4y ago

The reasoning behind that might be the myriad of scrape-and-publish SEO spam pages with GitHub content.

1 more reply

mschuster914y ago

> profiling users based on other traits, and essentially determining if they are using "approved" software, is a dystopia we should fight strongly against. Stallman's Right To Read comes to mind as a very relevant warning story.

Right to Read indeed... fanfiction.net has over the last months become really annoying. Especially at night, when you have the FFN UI set to dark, and then out of nothing a bright white Cloudflare page appears. Or why the Cloudflare "anti bot" protection leads to an endless loop when the browser is the Android web view inside a third-party Reddit client.

oh_sigh4y ago

Maybe I'm just a techno-optimist, but I suspect big tech companies don't give a hoot about you running "unapproved" software, but rather care about their services being abused and "unapproved" software is just a useful signal that fails on a tiny percentage of total legit users.

llampx4y ago

You are a lot more charitable than I am. I believe the big tech companies use dark patterns to get us to sign up, improve their metrics and hoover up our data.

bayindirh4y ago

Just trying to keep services operational is a fine goal to pursue as an operator, but forcing users to small inbound funnels for the service is detrimental too. There needs to be better research to be done to allow simpler ways of operation to continue working.

A browser is becoming a universal agent by itself, but many people (maybe increasingly) use terminal to access to the resources, and stonewalling these paths are never OK in my book.

saurik4y ago

This is a distinction without a difference.

gruez4y ago

>impersonate Firefox 95

you should really be impersonating an ESR version (eg. 91). Versions from the release channel is updated every month or so, and everyone has autoupdate enabled. Therefore unless you keep it up to date, your fingerprint is going to stick out like a sore thumb in a few months. On the other hand, ESR sticks to one version and shouldn't change significantly during its one year lifetime. It's still going to stick out to some extent (most people don't use ESR), but at least you have some enterprises who use ESR to blend into.

kalleboo4y ago

They should really be impersonating Chrome. If this takes off, Firefox has such a small user share that I could see sites just banning Firefox altogether, like they do with Tor

jve4y ago

I suspect Tor is being banned not because of a small user share.

Perhaps you may get broken sites with Firefox, because no-one cared. But banning? Seems like a stretch.

2 more replies

lwthikerOP4y ago

Thanks for the suggestion, I had no idea ESR was a thing. I've just added support for Firefox ESR 91 (it was pretty similar and required adding one cipher to the cipher list and changing the user agent).

LanternLight834y ago

I think ESR is the way to go too, but either way, I wonder if some tests can be written to confirm the coverage/similarity of the requests? It would entail automating a both Firefox session and the recording of network traffic, and feels like it might end up as bikeshedding.

vincent-toups4y ago

Cool, can't wait for anti-bot protection to start rejecting me because I use firefox.

npteljes4y ago

Only a matter of time I'm afraid :( Firefox usage share is already low enough for many sites to make pages for Chrome and maybe Safari only.

jandrese4y ago

Given the relative market shares it might make more sense to impersonate Chrome.

lwthikerOP4y ago

I will try to impersonate Chrome next, However, I suspect this is going to be more challenging. Chrome uses BoringSSL, which curl does not support. So it means either enforcing curl to compile with BoringSSL or modifying NSS to look like BoringSSL.

7v3x3n3sem9vv4y ago

and make it seem like Firefox has less market share? sounds like a good way to kill Firefox even faster, my 2 cents.

flawi4y ago

Counter argument is service providers just choosing to block anything that looks like Firefox since the market share is so small and it's being used to circumvent their precious protections.

1 more reply

1vuio0pswjnm74y ago

"Some web services therefore use the TLS handshake to fingerprint which HTTP client is accessing them. Notably, some bot protection platforms use this to identify curl and block it."

As a user of non-browser clients (not curl though) I have not run into this in the wild.^1

Anyone have an example of a site that blocks non-browser clients based on TLS fingerprint.

1. As far as I know. The only site I know of today that is blocking non-browser clients appears to be www.startpage.com. Perhaps this is the heuristic they are using. More likely it is something simpler I have not figured out yet.

1 more reply

0xbkt4y ago

This might also be an interesting read for those curious about TLS fingerprinting: https://news.ycombinator.com/item?id=29472624

pabs34y ago

Do you plan on getting this merged back into curl with an option to enable it? I can see that being useful for some people.

lwthikerOP4y ago

I hope to do so in the future, for now the implementation is extremely hacky so I doubt it can get accepted into curl.

bburky4y ago

There was a conversation on their mailing list contemplating dropping NSS support. https://curl.se/mail/lib-2022-01/0120.html If you have a use case for NSS in curl, you may want to speak up. Perhaps "I want curl to look exactly like a browser" is a significant use case?

1 more reply

sylware4y ago

Currently, I cannot think about anything else other than "noscript/basic (x)html" /IRC to get us out of this, at least for sites where such protocols are "good enough" to provide their services to users over internet. But how? Enlighten the "javascript web" brain-washed devs to make them realize how toxic what they do is? regulations (at least for critical sites)? And how to deal with the other sites: those which devs are scammers and perfectly aware of how toxic they are and keep doing it.

In my own country, for critical sites, I will probably have go to court since 'noscript/basic (x)html" interop was broken in the last few years.

tootahe454y ago

Would be cool if there was something like this for Python. Last time i tried to scrape something interesting i found that one of Cloudflare's enterprise options was easily blocking all of the main http libraries due to the identifiable TLS handshake.

ricardo814y ago

Are you sure they blocked you because of the handshake?

Always thought it was the myriad of cookies and expiry time of said cookies that tend to make non-browser clients more obvious to CF.

tootahe454y ago

The site wasn't using it to block me, just to prompt a captcha, without doing so to 'real' browsers.

The HTTP requests were exact copies of browser requests (in terms of how the server would've seen them), so it was something below HTTP. I ended up finding a lot of info about Cloudflare and the TLS stuff on StackOverflow, with others having similar issues. Someone even made an API to do the TLS stuff as a service, but was too expensive for me. https://pixeljets.com/blog/scrape-ninja-bypassing-cloudflare...

1 more reply

tyingq4y ago

I think most of the scraping libraries have stagnated since it's hard to scrape without a headless browser these days...too many sites with client-side rendered content.

wswope4y ago

Very cool! Thanks for sharing - it’s always nice to learn about fingerprinting tricks and workarounds, from both a privacy and a “don’t unintentionally look like a bot” perspective.

What inspired the project?

oefrha4y ago

Motivation is in the blog post: https://lwthiker.com/reversing/2022/02/17/curl-impersonate-f...

jart4y ago

Good blog post. Stuff like this makes me wonder if by 2030 (1) the internet will mostly consist of machine generated content; (2) machines written by normal people in Python won't be authorized to access the machine-generated content anymore due to Protectify; (3) most client traffic will originate from Protectify's network, so people like bloggers won't have any visibility into whether their readers are humans or machines; (4) video compression algorithms will become indistinguishable from deepfakes; and (5) airborne pathogens will make alternatives to the above impractical.

octoberfranklin4y ago

This is cool, but is it really needed that often?

There are some industries (virtually all of Wall Street, for example, and certain parts of government) where the company needs to surveil 100% of what their employees do on the web from inside the office. These companies have been running MITM proxies for decades.

Wouldn't any website that rejects a non-browsery TLS client be blocking out these people as well?

lwthikerOP4y ago

They don't block you completely, just present you with a JS challenge that delays your access to the site. A browser, even if behind a MITM proxy, would be able to solve this challenge.

oefrha4y ago

Very cool. I would have used Puppeteer/Playwright in a similar scenario, but thanks for sharing the bot detection trick they employ.

javajosh4y ago

Handy. Is the TCP handshake, or other details about socket behavior, ever get used for assessing the remote process, and in turn libraries written to mimic known patterns?

tenebrisalietum4y ago

Yes. https://nmap.org/book/osdetect-methods.html

kinderjaje4y ago

Great work mate, one of my team-mates showed me this library and we might use it in near future.

j / k navigate · click thread line to collapse

57 comments

userbinator4y ago

fake-name4y ago

Cloudflare is likely one of the worst things that has happened to the internet in recent history.

They've effectively just turned into a google protection racket. Small/special purpose search/archive tools are just stonewalled.

buro94y ago

You can't turn it off as a Cloudflare customer either.

The best you've got is "essentially off" but that wording is such because even with everything disabled there are still edge cases where their security will enforce a JS challenge or CAPTCHA.

1 more reply

rtpg4y ago

I don't really know how you resolve that absent just like... putting everything behind logins, though.

1 more reply

charcircuit4y ago

> If they serve me a slightly stale version of the remote resource (5 minutes/whatnot) that's fine.

Not all sites are configured to do this. Some pages are expensive to render and have no cache layer.

2 more replies

cma4y ago

caulk4y ago

The reasoning behind that might be the myriad of scrape-and-publish SEO spam pages with GitHub content.

1 more reply

mschuster914y ago

oh_sigh4y ago

llampx4y ago

You are a lot more charitable than I am. I believe the big tech companies use dark patterns to get us to sign up, improve their metrics and hoover up our data.

bayindirh4y ago

A browser is becoming a universal agent by itself, but many people (maybe increasingly) use terminal to access to the resources, and stonewalling these paths are never OK in my book.

saurik4y ago

This is a distinction without a difference.

gruez4y ago

>impersonate Firefox 95

kalleboo4y ago

They should really be impersonating Chrome. If this takes off, Firefox has such a small user share that I could see sites just banning Firefox altogether, like they do with Tor

jve4y ago

I suspect Tor is being banned not because of a small user share.

Perhaps you may get broken sites with Firefox, because no-one cared. But banning? Seems like a stretch.

2 more replies

lwthikerOP4y ago

LanternLight834y ago

vincent-toups4y ago

Cool, can't wait for anti-bot protection to start rejecting me because I use firefox.

npteljes4y ago

Only a matter of time I'm afraid :( Firefox usage share is already low enough for many sites to make pages for Chrome and maybe Safari only.

jandrese4y ago

Given the relative market shares it might make more sense to impersonate Chrome.

lwthikerOP4y ago

7v3x3n3sem9vv4y ago

and make it seem like Firefox has less market share? sounds like a good way to kill Firefox even faster, my 2 cents.

flawi4y ago

Counter argument is service providers just choosing to block anything that looks like Firefox since the market share is so small and it's being used to circumvent their precious protections.

1 more reply

1vuio0pswjnm74y ago

"Some web services therefore use the TLS handshake to fingerprint which HTTP client is accessing them. Notably, some bot protection platforms use this to identify curl and block it."

As a user of non-browser clients (not curl though) I have not run into this in the wild.^1

Anyone have an example of a site that blocks non-browser clients based on TLS fingerprint.

1 more reply

0xbkt4y ago

This might also be an interesting read for those curious about TLS fingerprinting: https://news.ycombinator.com/item?id=29472624

pabs34y ago

Do you plan on getting this merged back into curl with an option to enable it? I can see that being useful for some people.

lwthikerOP4y ago

I hope to do so in the future, for now the implementation is extremely hacky so I doubt it can get accepted into curl.

bburky4y ago

1 more reply

sylware4y ago

In my own country, for critical sites, I will probably have go to court since 'noscript/basic (x)html" interop was broken in the last few years.

tootahe454y ago

ricardo814y ago

Are you sure they blocked you because of the handshake?

Always thought it was the myriad of cookies and expiry time of said cookies that tend to make non-browser clients more obvious to CF.

tootahe454y ago

The site wasn't using it to block me, just to prompt a captcha, without doing so to 'real' browsers.

1 more reply

tyingq4y ago

I think most of the scraping libraries have stagnated since it's hard to scrape without a headless browser these days...too many sites with client-side rendered content.

wswope4y ago

Very cool! Thanks for sharing - it’s always nice to learn about fingerprinting tricks and workarounds, from both a privacy and a “don’t unintentionally look like a bot” perspective.

What inspired the project?

oefrha4y ago

Motivation is in the blog post: https://lwthiker.com/reversing/2022/02/17/curl-impersonate-f...

jart4y ago

octoberfranklin4y ago

This is cool, but is it really needed that often?

Wouldn't any website that rejects a non-browsery TLS client be blocking out these people as well?

lwthikerOP4y ago

They don't block you completely, just present you with a JS challenge that delays your access to the site. A browser, even if behind a MITM proxy, would be able to solve this challenge.

oefrha4y ago

Very cool. I would have used Puppeteer/Playwright in a similar scenario, but thanks for sharing the bot detection trick they employ.

javajosh4y ago

Handy. Is the TCP handshake, or other details about socket behavior, ever get used for assessing the remote process, and in turn libraries written to mimic known patterns?

tenebrisalietum4y ago

Yes. https://nmap.org/book/osdetect-methods.html

kinderjaje4y ago

Great work mate, one of my team-mates showed me this library and we might use it in near future.

j / k navigate · click thread line to collapse