Such a great resource. It's surprisingly easy to build your own massive datasets using it. I re-derived WebText2, used for training GPT-3, just on a home machine. And with some image scraping you can build up image datasets for training interesting GAN models.
> the training process they used are not.
Seems like it'd be fairly straightforward to finetune an existing language model . GPT-3 if you've got spare change, GPT-J-6B can be finetuned in Colab for free, and GPT-NeoX-20B could be finetuned for free/cheap. Use simple concats of AITA posts followed by a top comment. Balance for NTA/YTA like the Training Data page mentions, and I'll bet you'll get comparable results.
That said, the _idea_ of this bot is really cool and fun.