I know the pointy-haired boss wants to use APIs for everything, even feature flags and user login, but he's been canceled.
It is almost scary how productive I can be working with the new tools for embeddings and neural networks. Regularly I get things done in half an hour that used to take a whole weekend. It used to be a real black art to train networks on a GPU but now I can go down a short checklist and... it's easy.
The worst problem I had with my first "classification based on pretraining" project was that the network trained way too fast, I'd trained plenty of neural networks before and just didn't believe it could learn that fast. I wasted time looking for something wrong but really... it is that fast.
If you don't believe me, look at this paper
https://arxiv.org/abs/2304.01238
they were getting get results at spam classification with just 1000 samples repeated 3 times and that is very consistent with what I'm seeing with my problems. I used to train networks for between 20 minutes and 20 hours and now it's more like 2 minutes.
So really you can be up and running with huggingface transformers in not much longer it takes to set up API keys.
I agree that networks are training faster, and you don't need PhDs to train these networks anymore. But training still has quirks. Understanding the model's behaviour still requires some DS knowledge. People with that knowledge are expensive to hire. APIs will get you in a few minutes what a DS with 1-2 months of work will get you.
https://arxiv.org/abs/2303.17564
here you have a company which can make a document collection about the same size as "The Pile", add that to "The Pile" and train a model based on that. They're not just a big company but they are in the information business so it is clear that it's worth it to them.