Breaking the 4Chan CAPTCHA (opens in new tab)

(nullpt.rs)

580 pointshazebooth1y ago349 comments

349 comments

The part about bad Keras<->Tensorflow.js interop is classic Tensorflow. Using TF always felt like using a bunch of vaguely related tools put under the same umbrella rather than an integrated, streamlined product.

Actually, I'll extend that to saying every open source Google library/tool feels like that.

alecco1y ago

related (15 days ago)

https://news.ycombinator.com/item?id=42130881 on Francois Chollet is leaving Google

> "Why did you decide to merge Keras into TensorFlow in 2019": I didn't! The decision was made in 2018 by the TF leads -- I was a L5 IC at the time and that was an L8 decision.

Retr0id1y ago

something something Conway's law

Dachande6631y ago

Semi-related but I needed a CAPTCHA on my site[0] mainly to block comment form spam and settled on repurposing a fun method I’d seen before. Is definitely not foolproof (or hard at all), but I really liked making it.

[0] https://www.hybridlogic.co.uk/contact

vunderba1y ago

Reminds me of the Doom captcha.

https://vivirenremoto.github.io/doomcaptcha/

Dachande6631y ago

99% certain this is where I copied the idea from.

winrid1y ago

It says I've been blocked when I try to view that. Not on a VPN.

Dachande6631y ago

The site runs off of a tiny little server at home so I’ve got some very aggressive firewall rules. Anything from the usual bad countries, certain signatures etc are blocked. Reduced traffic to 1% of previous load.

efilife1y ago

What are the bad countries? Russia and china?

1 more reply

winrid1y ago

I'm in silicon valley in the USA on Comcast lol

1 more reply

EasyMark1y ago

Are you in a safari browser?

1 more reply

chamomeal1y ago

No way, that is a cool fucking captcha!!

tayiorrobinson1y ago

Cool, sure, good, probably not. I've never played Halo so I didn't entirely know what I was doing (do I shoot the blue guys too? it's not letting me through so I guess I do), and I don't doubt people couldn't even get what it meant by shoot. And god forbid anyone with disabilities that affects their mouse accuracy, or needs a screen reader tries to use it

Haven't looked at the devconsole but it'd probably be easily bypassed by someone dedicated.

1 more reply

account421y ago

Cool as a one-off use on some random blog contact form. Infuriatingly annoying if used somewhere you have to solve it with any frequency.

bawolff1y ago

There is a reason why people moved away from distorted text based captcha. We are basically at the point where computers are better at them then humans.

https://www.usenix.org/system/files/conference/woot14/woot14... is a paper on the subject i think is really interesting

However a surprising amount of text based captchas can be solved in a few line shell script of, using imagemagik to convert to greyscale, dilate and undilate, then pass to teserract

However there are also sites like https://2captcha.net , so really captchas are more like putting a small min amount of effort.

noprocrasted1y ago

Just because you can technically crack them doesn't mean they're useless.

There's a significant amount of time, skill and effort that went into the solution from this post, and the end result doesn't generalize well (you'd have to start all over for a different kind of captcha).

The vast majority of spammers would not be able to replicate this; those who do would either make money legitimately, or focus their skills on juicier targets (if you have AI/ML skills and want to do nefarious things there are other options that pay much better than spamming).

Such captchas still work well at raising the cost of successful spamming above the expected payoff from said spam.

reaperman1y ago

So, I do this type of AI development for solving CAPTCHAs.

I can't get any real jobs that pay me for my more advanced skills. My primary sins were going to a second/third-tier university and some performance concerns in a portion of my previous roles due to divorce and burn-out. I make $80k/year in government IT, and $30-150k/year as the "AI" guy in a small 2-5 person group that offers a CAPTCHA-breaking API.

The spammers aren't the ones replicating this. They just pay B2B rates (combo of SaaS + Consulting, depending on client needs) to help them remove the roadblocks.

ryandrake1y ago

Despite the spamming angle, I think CAPTCHA-breaking is, on the balance, noble and honorable work. These things are user-hostile blights on the web, and any effort towards making them disappear as useless is worthwhile. Sites worried about spam should invest more in automated spam classification/elimination instead of punishing real users with CAPTCHA-solving. Not that I can offer a solution--if I could, I'd be a millionaire.

1 more reply

TZubiri1y ago

Ahh the good ole dilemma of selling your soul, you study what you love only to destroy it for profit. Like an entomologist hired by a pesticide company.

I get it man, gotta make the bucks helping spammers advertise their shitty products, even if they destroy the internet.

1 more reply

fragmede1y ago

> there are other options that pay much better than spamming

Are there? Say you've got a felony record and can't get a legit AI/ML job at eg OpenAI/anywhere. What would you do instead? most of the options I can think of involve getting paid for doing things that are basically spam if you zoom out enough.

3 more replies

hamilyon21y ago

Captchas are now useful to distinguish well-intentioned bots (they stop whenever they see captcha) from malicious ones, which solve them, but still behave a lot like bots.

Well-intentional bots are first-class citizens

brookst1y ago

Wouldn’t a well-intentioned bot follow robots.txt anyway?

1 more reply

lostlogin1y ago

Do you complete the circle and do the good bot bad bot classification with a mod bot?

TZubiri1y ago

Interesting, subtle difference but I always thought of captchas as having computational difficulty, but that's clearly not the point as you say. The cost is not compute but developer time.

If you manage crack it at 1mhz per captcha or 1ghz or 1000ghz, it makes no difference, as the bottleneck is the network identifier (ip address/block)

While still a type of PoW, these economics are different than offline mechanisms like password hashing or crypto. Where a 1ghz cost is still significantly different than 1mhz.

atomicnumber31y ago

The watershed of "good enough at programming to just get a real job" vs "can code enough to be really annoying to businesses, but not enough to hack it as a dev" is a lot more on the annoying side than you'd think.

I say this with the chagrin of someone who works on a cool software product that is also coincidentally really well-shaped to make people want to abuse it.

delfinom1y ago

>he vast majority of spammers would not be able to replicate this;

Eh? They just need to buy their software from someone that can. I would say many of the malware and spamware isn't created by every individual deploying it, but instead vendors that got good at it and decide to make revenue by licensing out their software to other bad actors.

brian-armstrong1y ago

Makes me wonder what comes next. Could we create a forum where every member must do a 15 minute video interview with a moderator? I know this "doesn't scale" but I think it could make for a funny gimmick.

matchamatcha1y ago

When I was a teenager, I stumbled upon a music forum that required phone interviews for signing up. They had other interesting sign up rules, like you could not have silly user names (judged by the admin). I guess it served as an effective filter for their member base..

1 more reply

jabroni_salad1y ago

private torrent trackers are/were doing that. It was really just to make sure you understood how p2p culture works and what the expectations are, and really easy to pass if you just followed a guide. However, I did see many people fail their interview.

jmb991y ago

Was there ever video interviews? Admittedly I wasn’t really paying attention but back when I was getting into what it was only IRC, and these days it still seems to be IRC anywhere that does interviews (otherwise class-restricted forum invites).

1 more reply

ggu7hgfk8j1y ago

We are increasingly moving to ID checks. Australia law just now. For all its faults it solves spam as side effect.

2 more replies

bobsmooth1y ago

A small signup fee is much easier.

1 more reply

3abiton1y ago

I think captchas are just another lind of defense to make it harder for actors abusing the system. It's not a solution, just a little (getting outdated) fortification.

poincaredisk1y ago

Small? From your own link, recaptcha v3 takes 10-15s and costs $1.3 for 1000 captchas. This is actually huge, and cost prohibitively expensive for many things where you would want to use it (like scrapping a large website).

costco1y ago

Depends on the website, but you don't get always get a recaptcha, so the cost is a lot lower than that. You usually get it if you're exceeding some rate limit or you're doing a sensitive action like registering.

RobotToaster1y ago

> so really captchas are more like putting a small min amount of effort.

At that point a proof of work captcha (mCaptcha.org is one, but there are others), is probably the best option. Especially with how any reasonably effective traditional captcha is an accessibility nightmare.

cubefox1y ago

It's completely unclear what a "proof of work" captchas is supposed to be.

3 more replies

nyclounge1y ago

Wow Funcaptcha cost the most and it is open source.

mieko1y ago

If you're into this, here's my 2014 breakdown of the Silk Road CAPTCHA: https://github.com/mieko/sr-captcha

mbs1591y ago

Intriguing, thanks for sharing!

antirez1y ago

Appropriate response by 4Chan to this: simplify the human work given that anyway it's simple to solve via NNs. We are at a point where designing very hard captchas has high probabilities to increase the human annoyance without decreasing the machine solvability.

codetrotter1y ago

> simplify the human work given that anyway it's simple to solve via NNs. We are at a point where designing very hard captchas has high probabilities to increase the human annoyance without decreasing the machine solvability

Or disallow free users to post at all, and require everyone to buy the 4chan Pass for $20 USD per year if they want to post.

https://4chan.org/pass

This is already available to not have CAPTCHA. So if CAPTCHA is totally ineffective, it follows that they should do away with CAPTCHA and free users being able to post at all and everyone should buy the 4chan Pass if they want to post.

fullspectrumdev1y ago

This kills the board. Users will go elsewhere, fuck all people pay for pass.

jachee1y ago

And the spambots will follow them. Which kills the next board. Repeat ad nauseum until the end of the internet.

ranger_danger1y ago

Agreed, charging for accounts is the only halfway viable solution I have seen any service use that gives a sizable downtick in the sheer number of bots/spam.

Of course it's not perfect, and it will still happen, but I have yet to hear any better solutions. Please prove me wrong though!

1 more reply

poincaredisk1y ago

At this point I have to wait 90 seconds before making every post. (maybe because I don't persist cookies). I posted very rarely, but now I just stopped - I get it when someone shows me the door.

matheusmoreira1y ago

That would work. It would also kill the site.

efilife1y ago

What? So you use 4chan? It would completely kill what makes this website special

YeahThisIsMe1y ago

We've been stuck at that point for at least 5, if not 10, years.

hackernewds1y ago

Just use Worldcoin retina scans next

gosub1001y ago

"Drag each symbol to the group that is most likely to be offended by it."

xp841y ago

Ooh I love this, all off-the-shelf AI won’t touch it due to all their “safety” (aka anti-hurt-feelings) protocols

encom1y ago

4chan doesn't care about human annoyance. They just started doing a 15 minute post delay, which is infuriating. I had to whitelist 4chan in Cookie AutoDelete.

poincaredisk1y ago

Hi fellow cookie autodeleter, I experienced the same thing, but I just decided to stop posting. Whitelisting felt too much like giving in to terrorists. I'm considering just not going there in the future. Maybe after all this time I will finally be free.

2 more replies

matheusmoreira1y ago

Just stop posting there. The whole point of it is to post anonymously in a high traffic forum. The rate limiting timers have reduced traffic to the point many boards feel dead, and their solution to that problem is to sell accounts.

hsbauauvhabzb1y ago

What is NN?

numpad01y ago

"AI" but pre-COVID

1 more reply

layer81y ago

https://en.wikipedia.org/wiki/Neural_network_(machine_learni...

brodo1y ago

I am totally in favor of increasing the annoyance of 4chan users.

somat1y ago

I wonder if it would be better to pretend to have a captcha but really you are analysing the user timing and actions. Honestly I half suspect this is already going on.

If you wanted to go full meta "never go full meta" you would train a AI to figure out if the agent on the other side was human or not. that is, invent the reverse turing test. it's a human if the ai is unable to differentiate it's responses from normal humans responses. as opposed to marketing human responses.

Well now I have to go have a lay down, I feel a little ill from even thinking on the subject.

wraptile1y ago

That's kinda what every major captcha distributor does already!

Even before captcha is being served your TLS is first fingerprinted, then your IP, then your HTTP2, then your request, then your javascript environment (including font and image rendering capabilities) and browser itself. These are used to calculate a trust score which determines whether captcha will be served at all. Only then it makes sense to analyze captcha's input but by that time you caught 90% of bots either way.

The amount your browser can tell about you to any server without your awareness is insane to the point where every single one us probably has a more unique digital fingerprint than our very own physical fingerprint!

encom1y ago

This is how ClownFlare and its ilk, make life hell on the internet, when you use a "weird" browser on a "weird" OS.

1 more reply

gosub1001y ago

Re: your last paragraph, https://coveryourtracks.eff.org/

EFF have been running this for years. Gives an estimate about how many unique traits your browser has. Even things like screen resolution are measured.

zoltrix3031y ago

Would it be possible to serve a fake fingerprint that appears legitimate? Or even better mimic the finger print of real users who've visited a site you own for example?

nullpt_rs1y ago

yep, but it can get tricky.

some projects worth checking out: https://github.com/refraction-networking/utls https://github.com/berstend/puppeteer-extra

1 more reply

wraptile1y ago

Yes, that's what web scraping services do (full disclaimer I work at scrapfly.io). Collecting fingerprints and patching the web browser against this fingerprinting is quite a bit of work so most people outsource this to web scraping APIs.

barbolo1y ago

https://github.com/lwthiker/curl-impersonate

PUSH_AX1y ago

In that case why do I ever receive a captcha?

Pikamander21y ago

It adds another layer of analysis. For example:

If the user solves the CAPTCHA in 0.0001 seconds, they're definitely a bot.

If the user keeps solving every CAPTCHA in exactly 2.0000 seconds, each time makes it increasingly likely that they're a bot.

If the user sets the CAPTCHA entry's input.value property directly instead of firing individual key press events with keycodes, they're probably either a bot, copy-pasting the solution, or using some kind of non-standard keyboard (maybe accessibility software?).

Basically, even if the CAPTCHA service already has a decent idea of whether the user is a bot, forcing them to solve a CAPTCHA gives the service more data to work with and increases the barrier of entry for bot makers.

sdk164201y ago

I found several websites switched to 'press here until the timer runs out', probably they are doing the checks while the user is holding their mouse pressed, it would be trivial to bypass the long press by itself with automated mouse clickers.

kccqzy1y ago

That's what reCAPTCHA does.

benreesman1y ago

In my opinion the granddaddy of all 4chan CAPTCHA busts is still Yannick Kilcher’s GPT-J tune on “Raiders of the Lost Kek” set, and might be the coolest thing an LLM has ever done on video: https://youtu.be/efPrtcLdcdM?si=errY0PrEhnX9ylDw

chiph1y ago

Nearly a full minute of disclaimers and warnings about 4chan. That's got to be a record.

ValentinA231y ago

>I released the model, the code and I evaluated the model on a huge set of benchmarks and it turns out this horrible, terrible, model is more truthful-yes more truthful-than any other GPT out there

Pikamander21y ago

> The official TensorFlow-to-TFJS model converter doesn't work on Python 3.12. This doesn't seem to really be documented.

> TensorFlow.js doesn't support Keras 3.

I tried getting into some casual machine learning stuff a few years ago and more or less gave up because of stuff like this. It was staggering how many recent tutorials were already outdated, how many random pitfalls there were, and how many "getting started" guides assumed you were already an expert.

sigmoid101y ago

As someone who has been working in ML for years, I can only recommend to stay away from anything recent. Grab an old bayesian statistics textbook and learn the fundamentals, then progress to learning the major frameworks like Pytorch. Try to write every part of a CNN, RNN and Transformer architecture and training pipeline yourself the first time (including data loaders, but maybe leave out CUDA matrix kernels). Stay the hell away from wrappers for other people's wrappers like Langchain. Their documentation is often not just outdated, but flat out wrong regarding the fundamentals. Huggingface is great if you know the basics and thus how to fix things if their standard wrappers break.

rohansuri1y ago

Any book you would recommend?

sigmoid101y ago

You can try Theodoridis if you can find a first or second edition. It is old enough to not be diluted by the recent craze but still recent enough to cover all the necessary fundamentals. There is also a new edition coming out soon, but that seems to have been heavily tainted by the ChatGPT hype.

ChrisMarshallNY1y ago

That’s like spending a few hours, learning to take the lid off your septic tank.

blackjackfoe1y ago

Little bit, but at least you learned something :)

gherkinnn1y ago

Oddly enough, I find most of 4chan less brainrot inducing than Twitter, even pre-Musk.

JasserInicide1y ago

It's still brainrot, it's just on the opposite end of the political spectrum.

1 more reply

tovej1y ago

There's no smart algorithm for sorting posts, and there's a limited number of active threads, so it's not rage baiting in quite the same way. Only active threads stay alive though, so it has the exact same issue as twitter and other social media, only engaging content is served to users, and the most engaging things are rage bait, conspiracy theories, and porn. Things that get someone riled up enough to respond.

thrance1y ago

I have bad news for you, then...

1 more reply

meowface1y ago

I am a liberal and also genuinely find many 4chan boards less politically awful than current Twitter most of the time.

The chronological sorting at least offers some diversity of opinion. The first 50 replies to a 4chan thread about Trump (in the right board) will usually contain many, maybe even mostly, anti-Trump posts. On Twitter you usually need to scroll through the sea of blue checkmark replies for a while to find even one anti-Trump post.

Some 4chan boards are majority neo-Nazis who want all minorities expelled or murdered. But stumble across a particular Twitter thread and it's the same thing but with even more ideological uniformity within the thread, and with 4000 neo-Nazis in the thread instead of 60.

That said, both sites definitely are not great to use if you aren't very right-wing.

salawat1y ago

...Don't underestimate the things to be learned studying a septic system.

morkalork1y ago

Following the links to the captcha solving service you can read profiles of the humans doing the work where its pitched as more ethical than them working in hazardous factories!

tumsfestival1y ago

I can only imagine how much worse they'll make the captcha after stuff like this picks up speed with the users all the while being ineffective against the bots.

rany_1y ago

I really doubt that they're the first to do this.

1 more reply

OmarShehata1y ago

captchas are broken, forever. There is no way to prevent bots without also preventing a bottom tier of human users (visually impaired people, old people, or just impatient people). Like this xkcd [1] comic suggests, we need to just focus on rewarding and punishing specific behavior, regardless of whether the agent is human or not

[1] https://xkcd.com/810/

shortrounddev21y ago

Jokes aside, we don't want any bots at all. Even if they're posting constructive comments, we should interact with humans, not machines

hsbauauvhabzb1y ago

That doesn’t mean that webcrawlers have no legitimate value (think: search indexers) or illegitimate value (think: intellectual property theft via data scraping for AI purposes), and bots which communicate while they have no place, aren’t going to go away.

Philpax1y ago

In the interest of provoking discussion: why?

If a bot can meaningfully pass and act as a productive member of the community, what does it matter?

2 more replies

echelon1y ago

I think a better approach is to make account creation frictionful (eg. charge money, set karma thresholds, require an invite, etc.), score each account, and ban or time out accounts when they break community rules.

But an even better approach would be to go fully P2P and leave the scoring and ranking and filtering at the end nodes, with the possibility of friendly networks of interest group peers assisting with the task. BitTorrent for social media, pgp signed accounts, fully flexible annotation and ingestion. It's also less subject to cabal-based censorship.

webstrand1y ago

PoW like hashcash (not a cryptocurrency thing) might be a better solution. Users could even delegate solving the PoW puzzles to a 3rd party for low power devices like phones. But it imposes a cost on spammers that's inescapable.

1 more reply

cchance1y ago

I mean at some point ... the average visitor is dumber than the AI and your now just blocking dumb people

OmarShehata1y ago

yes, we're creating websites that are gated by IQ tests. This isn't the way

2 more replies

djbusby1y ago

*you're

makifoxgirl1y ago

This project also solves the 4chan captcha https://github.com/moffatman/chan

Alifatisk1y ago

If there is one blog I've fell in love it, it's nullpt.rs. Still waiting for part 2 of Reverse Engineering Tiktok's VM Obfuscation

ranger_danger1y ago

For those that don't know, the JKCS extension has been doing this for years already:

https://addons.mozilla.org/en-US/firefox/addon/jkcs/

https://chromewebstore.google.com/detail/joshi-koukousei-cap...

Userscript version: https://github.com/drunohazarb/4chan-captcha-solver

blackjackfoe1y ago

I really hope my post didn't come off as if I was trying to make it sound like this was a new idea. Regardless, this is good information, because it counters the posts of the form "great, now that you made this, you're going to make it harder."

ranger_danger1y ago

I didn't look at it that way, just maybe that you (and/or others) might not have been aware of its existence since I didn't see it mentioned anywhere.

Yeul1y ago

I understand why Cloudflare has to exist. But its beyond annoying that it forces you into using an unmodified Chrome sans VPN.

hobom1y ago

Does 4Chan also have bot BEHAVIOR detection (e.g. unnatural mouse movements)that google captcha has?

blackjackfoe1y ago

It does not, at least not once you pass the Cloudflare Turnstile challenge (which can be done with an API as well.)

ipnon1y ago

The results here suggest it does not.

kalleboo1y ago

Yeah I had been under the impression that the point of captchas like this (and those "slide a puzzle piece" ones) weren't the solution to the problem as much as checking for human-like mouse movements.

chad1n1y ago

I've built 3 iterations of captcha solvers for that crappy website based on https://github.com/drunohazarb/4chan-captcha-solver/issues/1 . The only thing I've learned along the way is that it's mostly pointless outside of a "learning" exercise, since they'll change the captcha (in terms of letter count or the entropy background). Initially, it was 4 characters with pretty obvious background, then it turned to 5, then it was both 4 and 5 and the current iteration which is also either 4 or 5, but with a lot of entropy surrounding the characters.

blackjackfoe1y ago

This project was really my first decent introduction to computer vision and machine learning (along with that of those who helped me in various ways; none of them desired to be credited here other than the guy who collected some of the data for me.)

It was definitely a successful learning exercise, and it's made me more confident tackling some other problems I've had in mind for awhile.

spookie1y ago

To help you out if you're interested:

- a smeared gaussian in one axis and another in another axis can really help segmenting chars, finding lines of text in OCR

- You can unshear chars using the Radon or Hough transform as a basis to understand the angle

Went through MNIST a few weeks ago and I agree it's interesting!

blackjackfoe1y ago

I am always interested! Thank you for the tips, I'll definitely research these.

sorenjan1y ago

Shearing is a linear operation that should be trivial for a NN to learn. Have you found that unshearing is actually useful? Was it to feed the image to an existing OCR program?

normie30001y ago

How did this project help you to learn computer vision? I'd also like to write a basic captcha solver as an intro, but superficially this project just looks like a dump of generated code.

blackjackfoe1y ago

What do you mean by "generated code"? All of the code in the linked GitHub repo was written by me, with the assistance of a couple friends who helped here and there, but didn't request to be credited.

I learned a lot because I had to do a ton of research and experimentation (fancy word for trial-and-error) to write the code and have it work as I expected.

1 more reply

bryan01y ago

In the article it mentions they changed the number of characters in the captcha after he trained the model, and the model could still solve it

oefrha1y ago

Changing the number of characters barely registers as a change. They merely need to use a variety of fonts (according to the post right now there are a grand total of 15 possible glyphs which is tiny) and it would vastly increase the difficulty of generating the training set, and probably affect model accuracy by a lot. Not to mention more complex backgrounds. What’s seen here is an ancient and relatively simple form of captcha.

kattagarian1y ago

I remember trying to use 4chan once and i couldn't even pass through the captcha.

morkalork1y ago

I remember using it before it had a captcha

HaZeust1y ago

There was a chaotic neutral time in my life where I used it daily for an extended period of time; and then found myself out of that rut and would only go back to see unhinged takes on a particular current event that I was interested in seeing the hivemind's thoughts on. Each and every time I went back, and tried to contribute to a thread, the Captchas and the CloudFlare checks were increasingly intrusive.

During this election, I completely gave up even trying to participate and just lurked.

not_your_vase1y ago

   ▲
  ▲ ▲

__turbobrew__1y ago

I tried to post and it gave me a 900 second cooldown, not even on vpn. I too remember the good old days when there was no capcha.

smithcoin1y ago

I’ll never forget spending the evening of the 2016 election on /pol/

BrandonY1y ago

What happened?

poincaredisk1y ago

A lot of memes and shitposting, I assume. /pol/ was always political, pro-trump, and according to some was even important enough to influence elections. I find that claim dubious, but it's true that many pro-trump memes (and memes in general) were created on 4chan.

1 more reply

trallnag1y ago

Made a profit of 40 bucks betting 10 bucks on Trump that evening / night

m3kw91y ago

Very tasteful title animation I must say. It’s fast enough, you feel it, and not distracting, gives a vibe even from glancing

asynchronous1y ago

[meta] what blog site is this? Is it a joint among authors? I can’t find more information on their GitHub. Looks neat.

nullpt_rs1y ago

I (veritas) run the blog but accept contributions from anyone. The blog itself is open source :-) https://github.com/nullpt-rs/blog

2Gkashmiri1y ago

Hey dude. Any idea if 1000 labelled images are good enough for training and how much time it would take to train on a a40 nvidia like on https://www.runpod.io/pricing ?

unit1491y ago

Parsing the visualization data, within a JSON script tasked with parsing it is a complex endeavor when the site requires verifying email.

If the JSON file is corrupt, it shows the following if tt1 and cd do not align.

> "error": "You have to wait a while before doing this again"

lofenfew1y ago

It might be worth noting that this, including the harder version the op encountered, are not the hardest captchas that 4chan can serve. There is a still harder version which is sent to less trustworthy IPs. I imagine it would still be tractably solved with computer vision. This in part misses the point though, since 4chan has been continuously altering their captcha since it released, making it difficult to create a permanent solution that won't be broken down the road.

chatmasta1y ago

Datacenter IPs can’t even post at all, nevermind needing to solve a CAPTCHA. That’s why the accusations of “VPN shill” are usually wrong, as is the assumption of anonymity – 4chan is in fact one of the least anonymous sites on the internet. The optional username feature gives it a veneer of anonymity, but the strict IP requirements ensure almost every post is attributable to a residential internet connection, and reliably associable with other posts from that same connection.

jterrys1y ago

4chan tries to make its users anonymous to each other. There's nothing in there about you being anonymous to their servers.

blackjackfoe1y ago

Some datacenter IPs can post fine, mostly just not those belonging to any large hosting company. I would mention a list of ones I know aren't blocked, but, well, that might get them blocked.

chatmasta1y ago

That’s surprising to me. I assumed they were using some service (like Cloudflare) with an updated list of non-residential IP addresses.

I’ve only ever tried to post through Cloudflare WARP (or Apple Private Relay, which is also Cloudflare but different exit IP range). Once I realized that didn’t work, I thought maybe it wasn’t worth posting at all :) I don’t like the idea of my ISP having any suspicion I posted to 4Chan (even if it’s technically https yadda yadda…)

codexon1y ago

You can get residential ips nowadays. They are much more expensive for an individual, but for a business or nation-state, it is a feasible option.

gruez1y ago

What about users behind CGNAT, like mobile users?

4 more replies

blackjackfoe1y ago

Yeah, I encountered those as well in my data gathering. I threw them out from the training set, but I kept them for possible future experimentation.

Shank1y ago

Can you upload a few of these samples somewhere?

blackjackfoe1y ago

I need to manipulate the data a bit, because right now it's just raw, unaligned foreground/background images with solutions. I need to do the alignment and save them as images rather than JSON files. I'll do that when I have the time.

cchance1y ago

Jesus looking at both example captchas... as a human... i have no fucking clue the answer lol

anigbrowl1y ago

You get used to them, there are various heuristics built in that make them easier then they at first appear.

blackjackfoe1y ago

I initially wrote the alignment-only script (in the source repo as `user-scripts/4chan-captcha-aligner.ts`) before the rest of the project because the person who was collecting the data manually for me couldn't wrap their head around the slider-style CAPTCHAs. There's definitely a learning curve.

paulpauper1y ago

And now we can look forward to even harder ones now that those have been broken. soon the web will be unusable to everyone but robots

axpy9061y ago

It’s nice to see this posted and interesting that it’s in tensorflow. I wonder for how many years the capture was already broken but not just posted about publicly.

b81y ago

Glad to see Blackjack and Jordin. We used to hack on Minecraft together. nullpt.rs and secret.club are full of former video game hackers :)

thrance1y ago

4Chan is probably one of the only social platforms where genuiune users and russian bots share the same views, why even bother with CAPTCHAs?

mgaunard1y ago

I remember when they introduced their new captcha; it was so tedious to solve it I stopped interacting there entirely.

chistev1y ago

Man, is there anything computers won't be able to break!

crazy

cubefox1y ago

Not a word on how describing and releasing this code is obviously unethical!? Captchas have a legitimate use to keep bots out.

saagarjha1y ago

Well, for one, it's not obviously unethical.

matrix871y ago

the blacked out minimalist aesthetic on this site looks really cool

bhasi1y ago

I really like it too. I'm always excited to see the themes of personal and other tech blogs I come across here.

nfRfqX5n1y ago

Hi veritas

dmitrygr1y ago

  > The official TensorFlow-to-TFJS model converter doesn't work on Python 3.12. This doesn't seem to really be documented, and the error messages thrown when you try to use it on Python 3.12 are non-obvious. I tried an older version of Python (3.10) on a hunch, using PyEnv, and it worked like a charm.

Amazing. And then people wonder why "just use python 2" is still a thing.

orhmeh091y ago

Do you have examples of "just use python 2" still being a thing in 2024?

dmitrygr1y ago

Yeah, whenever i need to write a quick script and have no time to suffer "$library needs python 3.x, where x must be > $value and <= $value2, and not a prime except when that ends in a 3, except on leap days"

2 is stable and does not change from under you. Which is what you want in a programming langiuage

Zopieux1y ago

In my recent experience, this dependency hell is quite specific to scientific / ML python.

The general state of ML code is abysmal, as it attracts a lot of inexperienced developers, and Python's duck/relaxed typing spirit makes it easy to write incomprehensible code with megabytes of unnecessary or bloated dependencies.

It's not bad per se, the amount of innovation is impressive, but a lot of it is a castle of cards, from low level libraries to end-user software.

sadeshmukh1y ago

Python 3.10 seems to work for almost everything, and Python 2 most certainly doesn't. In fact, even latest works for almost everything - there's an alternative to 99.9% of Python 2 stuff in Python 3.

tomxor1y ago

Bet it can't break reCAPTCHA on a VPN.

[edit]

More specifically I mean when they insidiously give you infinite tests even though it's impossible to pass because the IP has been blacklisted... There's a special place in hell for the anti-human's that made that decision, and yes it involves captcha.

blackjackfoe1y ago

I would also be inclined to believe that my project to solve the proprietary 4Chan text CAPTCHA cannot solve an unrelated image CAPTCHA. I'd bet a lot of money on it, in fact!

fresh_broccoli1y ago

I wasn't a very active 4chan poster to begin with, but when they introduced this awful CAPTCHA, and later the 300s countdown before making the first post, I completely lost interest in using the website.

Anonymous boards were supposed to be low-friction, but now 4chan is one of the most user-hostile social media platforms around. It takes a special kind of dedication to post there, which I seriously doubt helps the quality of the site.

alekratz1y ago

one of the biggest problems that 4chan has to combat is spam. unfortunately, at 4chan's scale, hcaptcha and recaptcha are not free. 4chan is not exactly a font of money, either. the only reason they turned to this awful homebrew captcha was because recaptcha stopped being free. is there any better way to do it with a single developer for a website that serves millions of people a day?

joe-collins1y ago

Not the rampant racism or sexism or simple misanthropy or outright calls to violence or overflowing hostility.

It's the spam that tops the problem list.

6 more replies

avar1y ago

    > is there any better way to do
    > it with a single developer for
    > a website that serves millions
    > of people a day?

No, the other reason they're using this is to make it so annoying that you'll spend $20/yr to buy a 4chan pass to bypass it.

If you're not making your free website annoying to drive revenue there's obvious ways to make it less annoying.

E.g. keep the annoying captcha, but don't show one again for the lifetime of a cookie, validate users who can make a money transfer of $0.01 etc.

alekratz1y ago

>No, the other reason they're using this is to make it so annoying that you'll spend $20/yr to buy a 4chan pass to bypass it.

I think this is a really cynical outlook, especially for a website that is not run as a modern tech-centric company. 4chan's roots are in that of the Old Internet, where it is a creative and messy and interesting place to be. why would they be banking solely on using a terrible captcha as a method to drive user subscriptions, when they have the option to run circus-tent ads? if making money was their sole purpose, why would they not kick the problematic and porn boards to the curb and ban the use of slurs to make room for more friendly advertisers? there are so many other avenues to increase profitability that most websites have taken which 4chan has staunchly refused to follow. why would they choose only the 4chan pass and ads as their only opportunity at making money?

3 more replies

Anon10961y ago

> keep the annoying captcha, but don't show one again for the lifetime of a cookie

This is already being done, there's a cookie and heuristics in place that will give you an easier captcha or occasionally skip it entirely. But 4chan really does have a couple (and I truly mean a small amount of super super dedicated users) of bad actors who constantly spam and try to work around any roadblocks given to annoy the rest of the userbase. You cannot give them a reliable way to spam no matter what. That's why there's now many country and region blocks in addition to your standard VPN/DC IP range blocks. Plus the Cloudflare check added a couple years ago.

1 more reply

blackjackfoe1y ago

Do a Web search for "4Chan CAPTCHA" sometime. All the top results will likely be people complaining about how terrible it is. You're certainly not alone.

The worst part about the countdown: if you wait too long to make a post after waiting the 10 minutes (eg: you get distracted,) it will expire, and you have to wait another 10 minutes.

scrlk1y ago

The addition of the post countdown has had a pretty noticeable effect on posts/day across multiple boards: https://4stats.io/

When an earlier version was trialled on /biz/ (mandatory email verification - https://warosu.org/biz/thread/58388587), it nuked the board and it hasn't recovered.

shortrounddev21y ago

They had a gigantic spam problem, captcha saved the site

raincole1y ago

- Obscure proprietary algorithm decides what you read

- Obscure CAPTCHA and other anti-spam features

- Pay to post

Choose one.

1 more reply

paulpauper1y ago

then how does Reddit and Twitter work without such an obnoxious captcha? I find it hard to believe those sites get less spam. Or any other community.

mikeyouse1y ago

You need accounts with unique emails to post everywhere else, and those sites are massive with hundreds/thousands of devs, some of whom work exclusively on anti-spam. If you make a site immune to advertising revenue and any other source of profit, you’re going to struggle to pay for “internet-scale” efforts.

RockRobotRock1y ago

First, they aren't anonymous. It's a lot more friction when you have to generate an account (which also requires a captcha).

Second, Twitter absolutely does make you perform captchas if they suspect you are a bot. I say this as someone who ran Twitter bots previously.

heavensteeth1y ago

Twitter is extremely user hostile. Every time I've made an account it has inevitability asked for an email and a phone number, and at least a few captchas.

blackjackfoe1y ago

Reddit and Twitter both have huge bot problems. On Reddit it's a bit less obvious due to the upvote/downvote system, and on Twitter it's a bit less obvious because you usually only follow people you want to see. Make a post on Twitter that mentions something like cryptocurrency, and you'll get a dozen bot replies immediately.

KaoruAoiShiho1y ago

They don't surface every post to everyone unlike 4chan so spam is much less visible though they still exist.

shortrounddev21y ago

Reddit and Twitter are replete with bots

anigbrowl1y ago

By selling your data to advertisers.

WeylandYutani1y ago

Patient dead operation successful.

123yawaworht4561y ago

recaptcha is terrible if you are cursed with an ISP that Google deems icky for some indiscernible reason. at the time, I was getting slowly fading bullshit that invariably gaslit me with "try again" several times. when they've switched to custom captcha, I actually started posting again instead of just lurking.

yeah, the recent 5-15 minute countdown before your first post is a bizarre thing, but I assume the volume of spam and ban-evading schizos they're dealing with is ungodly. a single dedicated shithead can shit up a general or a slow board indefinitely by just resetting their router or switching airplane mode on/off for a few minutes when they get banned.

>but now 4chan is one of the most user-hostile social media platforms around.

virtually every single big platform requires your phone number.

paulpauper1y ago

Same here. the captcha is the tip of the iceberg. VPNs , proxies...all blocked. Tons of ghosting and censoring of posts too. Also crawling with feds and people trying to get you to incriminate yourself. I love the option to bypass it with crypto. Yeah, like I am going to give them btc, which will be traced by every agency and coin analysis firm and also get my wallet/exchange account restricted by being linked to 4chan. The owners more than happy to comply with every 3-letter agency request for info.

Der_Einzige1y ago

taps the sign

jimbob451y ago

but now 4chan is one of the most user-hostile social media platforms around

Stay off /v/, /tv/, /pol/, and /a/ and you’ll have a pretty good time.

yungporko1y ago

certainly won't have a good time on /b/ either

1 more reply

prettywoman1y ago

> 300s countdown

I don't get why they added that nasty "feature" to the post form, it really discourages you to post(maybe it's because they want to sell you their 4chan pass), I don't understand why 4chan is still active

hombre_fatal1y ago

Presumably, anyone who regularly uses 4chan would register. Once you register and click the login link in your email, you just get the easy Cloudflare captcha and no countdown.

The horrible captcha + 300s countdown is for completely unauthed users. Most sites don't even allow unauthed users to post at all.

Hamuko1y ago

If you don't get it, you probably don't spend too much time on 4chan.

There is A LOT of ban evasion on 4chan. If you have a dynamic IP address from your ISP, you just spam/derail threads with personal crusades/whatever until you get banned, reset your router and repeat.

This countdown increases the cost of ban evasion, since you can't get right back in to continue. Everyone on your targeted board/thread now gets at least a 15-minute respite.

They've also had to blacklist entire ISP from making any posts because some people are constantly ban evading on them. Especially mobile ISPs, where there's basically an unlimited amount of fresh IPv6 addresses available.

anigbrowl1y ago

Congratulations, now it will get upgraded and become more work for humans to solve, increasing the burden on every non-malicious user.

jeroenhd1y ago

It's not like bots aren't already bypassing these CAPTCHAs. One author writing a blog post about how they accomplished what spammers and bots have been doing for ages isn't going to change anything.

I just opened 4chan and after the initial Cloudflare bot detection I was told to register an email or wait 15 minutes before I was allowed to even obtain a CAPTCHA. Looks like they're already taking a layered approach to combat bots.

blackjackfoe1y ago

(author here) Interestingly, the email registration/time-limit was added after I started this project, but before I told anyone about it.

credus1y ago

It only took about three days until the very first captcha solver was made back in 2021, and the dev's only response was to blanket ban the author's name sitewide until he became popular again for other reasons so they had to remove the filter. They know it's only a matter of time for someone to train a new model no matter how much they update the captcha so they don't really care much about it these days.

sunaookami1y ago

There are already loads of extensions and scripts out there that can solve these captchas with a great success rate.

anigbrowl1y ago

Adding one more will degrade rather than improve that. Notwithstanding all the downvotes, the author's comment (just above) seems to endorse my argument.

I dislike the captcha a lot, but I wish people would invest the same effort in attacking spam that they do in defeating anti-spam techniques. Spam and similar kinds of abuse are the bane of the internet but most people seem to shrug it off but declaring that a 'hard problem' so they can ignore it.

tomcam1y ago

If there's one place on the web I would apply anonymity with great diligence, it would be posting any article that might put me at odds with the good people of 4Chan.

mostly kidding! mostly

blackjackfoe1y ago

The 4Chan userbase hates the CAPTCHA as much as I do :)

snvzz1y ago

This, but unironically.

NoMoreNicksLeft1y ago

I suspect really strongly that the available characters in the 4chan captcha were chose to be able to spell out the most racist/nazi/extreme slurs and slogans imaginable. For instance, not all numerals are ever used, but 1, 4, and 8 are. K is often there, and whatever the algo is, pseudorandom or not, it often doubles/triples characters. I've personally seen "kkk" twice over the years. Mind you, it does seem random. But even randomly, these must happen often enough to set that crowd off, they make a game of posting a screenshot of the "good ones".

blackjackfoe1y ago

All the worst slurs I can think of in my limited vocabulary can't even be spelled with the characters available. I suspect the opposite - they might have been chosen to avoid spelling things like that.

NoMoreNicksLeft1y ago

You either know some radioactively hot slurs, or you've just not hung out there enough. Only the "i" is missing, and a week doesn't go by that someone doesn't post it with the 1 instead. Granted, I think that one's a repost (never bothered to try to check).

Der_Einzige1y ago

4chan was gaming the previous captchas for awhile to label some of the data with racial slurs, as they had discovered the threshold that you’re allowed to be wrong by, and were aggressively abusing it.

BriggyDwiggs421y ago

Oh no you’re probably on the money

j / k navigate · click thread line to collapse

349 comments

cherryteastain1y ago

Actually, I'll extend that to saying every open source Google library/tool feels like that.

alecco1y ago

related (15 days ago)

https://news.ycombinator.com/item?id=42130881 on Francois Chollet is leaving Google

> "Why did you decide to merge Keras into TensorFlow in 2019": I didn't! The decision was made in 2018 by the TF leads -- I was a L5 IC at the time and that was an L8 decision.

Retr0id1y ago

something something Conway's law

Dachande6631y ago

[0] https://www.hybridlogic.co.uk/contact

vunderba1y ago

Reminds me of the Doom captcha.

https://vivirenremoto.github.io/doomcaptcha/

Dachande6631y ago

99% certain this is where I copied the idea from.

winrid1y ago

It says I've been blocked when I try to view that. Not on a VPN.

Dachande6631y ago

efilife1y ago

What are the bad countries? Russia and china?

1 more reply

winrid1y ago

I'm in silicon valley in the USA on Comcast lol

1 more reply

EasyMark1y ago

Are you in a safari browser?

1 more reply

chamomeal1y ago

No way, that is a cool fucking captcha!!

tayiorrobinson1y ago

Haven't looked at the devconsole but it'd probably be easily bypassed by someone dedicated.

1 more reply

account421y ago

Cool as a one-off use on some random blog contact form. Infuriatingly annoying if used somewhere you have to solve it with any frequency.

bawolff1y ago

There is a reason why people moved away from distorted text based captcha. We are basically at the point where computers are better at them then humans.

https://www.usenix.org/system/files/conference/woot14/woot14... is a paper on the subject i think is really interesting

However a surprising amount of text based captchas can be solved in a few line shell script of, using imagemagik to convert to greyscale, dilate and undilate, then pass to teserract

However there are also sites like https://2captcha.net , so really captchas are more like putting a small min amount of effort.

noprocrasted1y ago

Just because you can technically crack them doesn't mean they're useless.

Such captchas still work well at raising the cost of successful spamming above the expected payoff from said spam.

reaperman1y ago

So, I do this type of AI development for solving CAPTCHAs.

The spammers aren't the ones replicating this. They just pay B2B rates (combo of SaaS + Consulting, depending on client needs) to help them remove the roadblocks.

ryandrake1y ago

1 more reply

TZubiri1y ago

Ahh the good ole dilemma of selling your soul, you study what you love only to destroy it for profit. Like an entomologist hired by a pesticide company.

I get it man, gotta make the bucks helping spammers advertise their shitty products, even if they destroy the internet.

1 more reply

fragmede1y ago

> there are other options that pay much better than spamming

3 more replies

hamilyon21y ago

Captchas are now useful to distinguish well-intentioned bots (they stop whenever they see captcha) from malicious ones, which solve them, but still behave a lot like bots.

Well-intentional bots are first-class citizens

brookst1y ago

Wouldn’t a well-intentioned bot follow robots.txt anyway?

1 more reply

lostlogin1y ago

Do you complete the circle and do the good bot bad bot classification with a mod bot?

TZubiri1y ago

Interesting, subtle difference but I always thought of captchas as having computational difficulty, but that's clearly not the point as you say. The cost is not compute but developer time.

If you manage crack it at 1mhz per captcha or 1ghz or 1000ghz, it makes no difference, as the bottleneck is the network identifier (ip address/block)

While still a type of PoW, these economics are different than offline mechanisms like password hashing or crypto. Where a 1ghz cost is still significantly different than 1mhz.

atomicnumber31y ago

I say this with the chagrin of someone who works on a cool software product that is also coincidentally really well-shaped to make people want to abuse it.

delfinom1y ago

>he vast majority of spammers would not be able to replicate this;

brian-armstrong1y ago

matchamatcha1y ago

1 more reply

jabroni_salad1y ago

jmb991y ago

1 more reply

ggu7hgfk8j1y ago

We are increasingly moving to ID checks. Australia law just now. For all its faults it solves spam as side effect.

2 more replies

bobsmooth1y ago

A small signup fee is much easier.

1 more reply

3abiton1y ago

I think captchas are just another lind of defense to make it harder for actors abusing the system. It's not a solution, just a little (getting outdated) fortification.

poincaredisk1y ago

costco1y ago

RobotToaster1y ago

> so really captchas are more like putting a small min amount of effort.

cubefox1y ago

It's completely unclear what a "proof of work" captchas is supposed to be.

3 more replies

nyclounge1y ago

Wow Funcaptcha cost the most and it is open source.

mieko1y ago

If you're into this, here's my 2014 breakdown of the Silk Road CAPTCHA: https://github.com/mieko/sr-captcha

mbs1591y ago

Intriguing, thanks for sharing!

antirez1y ago

codetrotter1y ago

Or disallow free users to post at all, and require everyone to buy the 4chan Pass for $20 USD per year if they want to post.

https://4chan.org/pass

fullspectrumdev1y ago

This kills the board. Users will go elsewhere, fuck all people pay for pass.

jachee1y ago

And the spambots will follow them. Which kills the next board. Repeat ad nauseum until the end of the internet.

ranger_danger1y ago

Agreed, charging for accounts is the only halfway viable solution I have seen any service use that gives a sizable downtick in the sheer number of bots/spam.

Of course it's not perfect, and it will still happen, but I have yet to hear any better solutions. Please prove me wrong though!

1 more reply

poincaredisk1y ago

At this point I have to wait 90 seconds before making every post. (maybe because I don't persist cookies). I posted very rarely, but now I just stopped - I get it when someone shows me the door.

matheusmoreira1y ago

That would work. It would also kill the site.

efilife1y ago

What? So you use 4chan? It would completely kill what makes this website special

YeahThisIsMe1y ago

We've been stuck at that point for at least 5, if not 10, years.

hackernewds1y ago

Just use Worldcoin retina scans next

gosub1001y ago

"Drag each symbol to the group that is most likely to be offended by it."

xp841y ago

Ooh I love this, all off-the-shelf AI won’t touch it due to all their “safety” (aka anti-hurt-feelings) protocols

encom1y ago

4chan doesn't care about human annoyance. They just started doing a 15 minute post delay, which is infuriating. I had to whitelist 4chan in Cookie AutoDelete.

poincaredisk1y ago

2 more replies

matheusmoreira1y ago

hsbauauvhabzb1y ago

What is NN?

numpad01y ago

"AI" but pre-COVID

1 more reply

layer81y ago

https://en.wikipedia.org/wiki/Neural_network_(machine_learni...

brodo1y ago

I am totally in favor of increasing the annoyance of 4chan users.

somat1y ago

I wonder if it would be better to pretend to have a captcha but really you are analysing the user timing and actions. Honestly I half suspect this is already going on.

Well now I have to go have a lay down, I feel a little ill from even thinking on the subject.

wraptile1y ago

That's kinda what every major captcha distributor does already!

encom1y ago

This is how ClownFlare and its ilk, make life hell on the internet, when you use a "weird" browser on a "weird" OS.

1 more reply

gosub1001y ago

Re: your last paragraph, https://coveryourtracks.eff.org/

EFF have been running this for years. Gives an estimate about how many unique traits your browser has. Even things like screen resolution are measured.

zoltrix3031y ago

Would it be possible to serve a fake fingerprint that appears legitimate? Or even better mimic the finger print of real users who've visited a site you own for example?

nullpt_rs1y ago

yep, but it can get tricky.

some projects worth checking out: https://github.com/refraction-networking/utls https://github.com/berstend/puppeteer-extra

1 more reply

wraptile1y ago

barbolo1y ago

https://github.com/lwthiker/curl-impersonate

PUSH_AX1y ago

In that case why do I ever receive a captcha?

Pikamander21y ago

It adds another layer of analysis. For example:

If the user solves the CAPTCHA in 0.0001 seconds, they're definitely a bot.

If the user keeps solving every CAPTCHA in exactly 2.0000 seconds, each time makes it increasingly likely that they're a bot.

sdk164201y ago

kccqzy1y ago

That's what reCAPTCHA does.

benreesman1y ago

chiph1y ago

Nearly a full minute of disclaimers and warnings about 4chan. That's got to be a record.

ValentinA231y ago

>I released the model, the code and I evaluated the model on a huge set of benchmarks and it turns out this horrible, terrible, model is more truthful-yes more truthful-than any other GPT out there

Pikamander21y ago

> The official TensorFlow-to-TFJS model converter doesn't work on Python 3.12. This doesn't seem to really be documented.

> TensorFlow.js doesn't support Keras 3.

sigmoid101y ago

rohansuri1y ago

Any book you would recommend?

sigmoid101y ago

ChrisMarshallNY1y ago

That’s like spending a few hours, learning to take the lid off your septic tank.

blackjackfoe1y ago

Little bit, but at least you learned something :)

gherkinnn1y ago

Oddly enough, I find most of 4chan less brainrot inducing than Twitter, even pre-Musk.

JasserInicide1y ago

It's still brainrot, it's just on the opposite end of the political spectrum.

1 more reply

tovej1y ago

thrance1y ago

I have bad news for you, then...

1 more reply

meowface1y ago

I am a liberal and also genuinely find many 4chan boards less politically awful than current Twitter most of the time.

That said, both sites definitely are not great to use if you aren't very right-wing.

salawat1y ago

...Don't underestimate the things to be learned studying a septic system.

morkalork1y ago

Following the links to the captcha solving service you can read profiles of the humans doing the work where its pitched as more ethical than them working in hazardous factories!

tumsfestival1y ago

I can only imagine how much worse they'll make the captcha after stuff like this picks up speed with the users all the while being ineffective against the bots.

rany_1y ago

I really doubt that they're the first to do this.

1 more reply

OmarShehata1y ago

[1] https://xkcd.com/810/

shortrounddev21y ago

Jokes aside, we don't want any bots at all. Even if they're posting constructive comments, we should interact with humans, not machines

hsbauauvhabzb1y ago

Philpax1y ago

In the interest of provoking discussion: why?

If a bot can meaningfully pass and act as a productive member of the community, what does it matter?

2 more replies

echelon1y ago

webstrand1y ago

1 more reply

cchance1y ago

I mean at some point ... the average visitor is dumber than the AI and your now just blocking dumb people

OmarShehata1y ago

yes, we're creating websites that are gated by IQ tests. This isn't the way

2 more replies

djbusby1y ago

*you're

makifoxgirl1y ago

This project also solves the 4chan captcha https://github.com/moffatman/chan

Alifatisk1y ago

If there is one blog I've fell in love it, it's nullpt.rs. Still waiting for part 2 of Reverse Engineering Tiktok's VM Obfuscation

ranger_danger1y ago

For those that don't know, the JKCS extension has been doing this for years already:

https://addons.mozilla.org/en-US/firefox/addon/jkcs/

https://chromewebstore.google.com/detail/joshi-koukousei-cap...

Userscript version: https://github.com/drunohazarb/4chan-captcha-solver

blackjackfoe1y ago

ranger_danger1y ago

I didn't look at it that way, just maybe that you (and/or others) might not have been aware of its existence since I didn't see it mentioned anywhere.

Yeul1y ago

I understand why Cloudflare has to exist. But its beyond annoying that it forces you into using an unmodified Chrome sans VPN.

hobom1y ago

Does 4Chan also have bot BEHAVIOR detection (e.g. unnatural mouse movements)that google captcha has?

blackjackfoe1y ago

It does not, at least not once you pass the Cloudflare Turnstile challenge (which can be done with an API as well.)

ipnon1y ago

The results here suggest it does not.

kalleboo1y ago

chad1n1y ago

blackjackfoe1y ago

It was definitely a successful learning exercise, and it's made me more confident tackling some other problems I've had in mind for awhile.

spookie1y ago

To help you out if you're interested:

- a smeared gaussian in one axis and another in another axis can really help segmenting chars, finding lines of text in OCR

- You can unshear chars using the Radon or Hough transform as a basis to understand the angle

Went through MNIST a few weeks ago and I agree it's interesting!

blackjackfoe1y ago

I am always interested! Thank you for the tips, I'll definitely research these.

sorenjan1y ago

Shearing is a linear operation that should be trivial for a NN to learn. Have you found that unshearing is actually useful? Was it to feed the image to an existing OCR program?

normie30001y ago

How did this project help you to learn computer vision? I'd also like to write a basic captcha solver as an intro, but superficially this project just looks like a dump of generated code.

blackjackfoe1y ago

I learned a lot because I had to do a ton of research and experimentation (fancy word for trial-and-error) to write the code and have it work as I expected.

1 more reply

bryan01y ago

In the article it mentions they changed the number of characters in the captcha after he trained the model, and the model could still solve it

oefrha1y ago

kattagarian1y ago

I remember trying to use 4chan once and i couldn't even pass through the captcha.

morkalork1y ago

I remember using it before it had a captcha

HaZeust1y ago

During this election, I completely gave up even trying to participate and just lurked.

not_your_vase1y ago

   ▲
  ▲ ▲

__turbobrew__1y ago

I tried to post and it gave me a 900 second cooldown, not even on vpn. I too remember the good old days when there was no capcha.

smithcoin1y ago

I’ll never forget spending the evening of the 2016 election on /pol/

BrandonY1y ago

What happened?

poincaredisk1y ago

1 more reply

trallnag1y ago

Made a profit of 40 bucks betting 10 bucks on Trump that evening / night

m3kw91y ago

Very tasteful title animation I must say. It’s fast enough, you feel it, and not distracting, gives a vibe even from glancing

asynchronous1y ago

[meta] what blog site is this? Is it a joint among authors? I can’t find more information on their GitHub. Looks neat.

nullpt_rs1y ago

I (veritas) run the blog but accept contributions from anyone. The blog itself is open source :-) https://github.com/nullpt-rs/blog

2Gkashmiri1y ago

Hey dude. Any idea if 1000 labelled images are good enough for training and how much time it would take to train on a a40 nvidia like on https://www.runpod.io/pricing ?

unit1491y ago

Parsing the visualization data, within a JSON script tasked with parsing it is a complex endeavor when the site requires verifying email.

If the JSON file is corrupt, it shows the following if tt1 and cd do not align.

> "error": "You have to wait a while before doing this again"

lofenfew1y ago

chatmasta1y ago

jterrys1y ago

4chan tries to make its users anonymous to each other. There's nothing in there about you being anonymous to their servers.

blackjackfoe1y ago

Some datacenter IPs can post fine, mostly just not those belonging to any large hosting company. I would mention a list of ones I know aren't blocked, but, well, that might get them blocked.

chatmasta1y ago

That’s surprising to me. I assumed they were using some service (like Cloudflare) with an updated list of non-residential IP addresses.

codexon1y ago

You can get residential ips nowadays. They are much more expensive for an individual, but for a business or nation-state, it is a feasible option.

gruez1y ago

What about users behind CGNAT, like mobile users?

4 more replies

blackjackfoe1y ago

Yeah, I encountered those as well in my data gathering. I threw them out from the training set, but I kept them for possible future experimentation.

Shank1y ago

Can you upload a few of these samples somewhere?

blackjackfoe1y ago

cchance1y ago

Jesus looking at both example captchas... as a human... i have no fucking clue the answer lol

anigbrowl1y ago

You get used to them, there are various heuristics built in that make them easier then they at first appear.

blackjackfoe1y ago

paulpauper1y ago

And now we can look forward to even harder ones now that those have been broken. soon the web will be unusable to everyone but robots

axpy9061y ago

It’s nice to see this posted and interesting that it’s in tensorflow. I wonder for how many years the capture was already broken but not just posted about publicly.

b81y ago

Glad to see Blackjack and Jordin. We used to hack on Minecraft together. nullpt.rs and secret.club are full of former video game hackers :)

thrance1y ago

4Chan is probably one of the only social platforms where genuiune users and russian bots share the same views, why even bother with CAPTCHAs?

mgaunard1y ago

I remember when they introduced their new captcha; it was so tedious to solve it I stopped interacting there entirely.

chistev1y ago

Man, is there anything computers won't be able to break!

crazy

cubefox1y ago

Not a word on how describing and releasing this code is obviously unethical!? Captchas have a legitimate use to keep bots out.

saagarjha1y ago

Well, for one, it's not obviously unethical.

matrix871y ago

the blacked out minimalist aesthetic on this site looks really cool

bhasi1y ago

I really like it too. I'm always excited to see the themes of personal and other tech blogs I come across here.

nfRfqX5n1y ago

Hi veritas

dmitrygr1y ago

  > The official TensorFlow-to-TFJS model converter doesn't work on Python 3.12. This doesn't seem to really be documented, and the error messages thrown when you try to use it on Python 3.12 are non-obvious. I tried an older version of Python (3.10) on a hunch, using PyEnv, and it worked like a charm.

Amazing. And then people wonder why "just use python 2" is still a thing.

orhmeh091y ago

Do you have examples of "just use python 2" still being a thing in 2024?

dmitrygr1y ago

2 is stable and does not change from under you. Which is what you want in a programming langiuage

Zopieux1y ago

In my recent experience, this dependency hell is quite specific to scientific / ML python.

It's not bad per se, the amount of innovation is impressive, but a lot of it is a castle of cards, from low level libraries to end-user software.

sadeshmukh1y ago

Python 3.10 seems to work for almost everything, and Python 2 most certainly doesn't. In fact, even latest works for almost everything - there's an alternative to 99.9% of Python 2 stuff in Python 3.

tomxor1y ago

Bet it can't break reCAPTCHA on a VPN.

[edit]

blackjackfoe1y ago

I would also be inclined to believe that my project to solve the proprietary 4Chan text CAPTCHA cannot solve an unrelated image CAPTCHA. I'd bet a lot of money on it, in fact!

fresh_broccoli1y ago

alekratz1y ago

joe-collins1y ago

Not the rampant racism or sexism or simple misanthropy or outright calls to violence or overflowing hostility.

It's the spam that tops the problem list.

6 more replies

avar1y ago

    > is there any better way to do
    > it with a single developer for
    > a website that serves millions
    > of people a day?

No, the other reason they're using this is to make it so annoying that you'll spend $20/yr to buy a 4chan pass to bypass it.

If you're not making your free website annoying to drive revenue there's obvious ways to make it less annoying.

E.g. keep the annoying captcha, but don't show one again for the lifetime of a cookie, validate users who can make a money transfer of $0.01 etc.

alekratz1y ago

>No, the other reason they're using this is to make it so annoying that you'll spend $20/yr to buy a 4chan pass to bypass it.

3 more replies

Anon10961y ago

> keep the annoying captcha, but don't show one again for the lifetime of a cookie

1 more reply

blackjackfoe1y ago

Do a Web search for "4Chan CAPTCHA" sometime. All the top results will likely be people complaining about how terrible it is. You're certainly not alone.

The worst part about the countdown: if you wait too long to make a post after waiting the 10 minutes (eg: you get distracted,) it will expire, and you have to wait another 10 minutes.

scrlk1y ago

The addition of the post countdown has had a pretty noticeable effect on posts/day across multiple boards: https://4stats.io/

When an earlier version was trialled on /biz/ (mandatory email verification - https://warosu.org/biz/thread/58388587), it nuked the board and it hasn't recovered.

shortrounddev21y ago

They had a gigantic spam problem, captcha saved the site

raincole1y ago

- Obscure proprietary algorithm decides what you read

- Obscure CAPTCHA and other anti-spam features

- Pay to post

Choose one.

1 more reply

paulpauper1y ago

then how does Reddit and Twitter work without such an obnoxious captcha? I find it hard to believe those sites get less spam. Or any other community.

mikeyouse1y ago

RockRobotRock1y ago

First, they aren't anonymous. It's a lot more friction when you have to generate an account (which also requires a captcha).

Second, Twitter absolutely does make you perform captchas if they suspect you are a bot. I say this as someone who ran Twitter bots previously.

heavensteeth1y ago

Twitter is extremely user hostile. Every time I've made an account it has inevitability asked for an email and a phone number, and at least a few captchas.

blackjackfoe1y ago

KaoruAoiShiho1y ago

They don't surface every post to everyone unlike 4chan so spam is much less visible though they still exist.

shortrounddev21y ago

Reddit and Twitter are replete with bots

anigbrowl1y ago

By selling your data to advertisers.

WeylandYutani1y ago

Patient dead operation successful.

123yawaworht4561y ago

>but now 4chan is one of the most user-hostile social media platforms around.

virtually every single big platform requires your phone number.

paulpauper1y ago

Der_Einzige1y ago

taps the sign

jimbob451y ago

but now 4chan is one of the most user-hostile social media platforms around

Stay off /v/, /tv/, /pol/, and /a/ and you’ll have a pretty good time.

yungporko1y ago

certainly won't have a good time on /b/ either

1 more reply

prettywoman1y ago

> 300s countdown

hombre_fatal1y ago

Presumably, anyone who regularly uses 4chan would register. Once you register and click the login link in your email, you just get the easy Cloudflare captcha and no countdown.

The horrible captcha + 300s countdown is for completely unauthed users. Most sites don't even allow unauthed users to post at all.

Hamuko1y ago

If you don't get it, you probably don't spend too much time on 4chan.

This countdown increases the cost of ban evasion, since you can't get right back in to continue. Everyone on your targeted board/thread now gets at least a 15-minute respite.

anigbrowl1y ago

Congratulations, now it will get upgraded and become more work for humans to solve, increasing the burden on every non-malicious user.

jeroenhd1y ago

It's not like bots aren't already bypassing these CAPTCHAs. One author writing a blog post about how they accomplished what spammers and bots have been doing for ages isn't going to change anything.

blackjackfoe1y ago

(author here) Interestingly, the email registration/time-limit was added after I started this project, but before I told anyone about it.

credus1y ago

sunaookami1y ago

There are already loads of extensions and scripts out there that can solve these captchas with a great success rate.

anigbrowl1y ago

Adding one more will degrade rather than improve that. Notwithstanding all the downvotes, the author's comment (just above) seems to endorse my argument.

tomcam1y ago

If there's one place on the web I would apply anonymity with great diligence, it would be posting any article that might put me at odds with the good people of 4Chan.

mostly kidding! mostly

blackjackfoe1y ago

The 4Chan userbase hates the CAPTCHA as much as I do :)

snvzz1y ago

This, but unironically.

NoMoreNicksLeft1y ago

blackjackfoe1y ago

NoMoreNicksLeft1y ago

Der_Einzige1y ago

BriggyDwiggs421y ago

Oh no you’re probably on the money

j / k navigate · click thread line to collapse