undefined | Better HN

0 pointsharpersealtako4y ago0 comments

I know that at least on most common performance benchmarks these claims are measurably false (gpt-j has a number of key performance improvements to the equivalently sized models), and in particular code generation for 6B is very clearly a strength of GPT-J even above the 275B GPT-3. None of that is very controvertial as far as I can tell.

But even just subjectively, having used GPT-3 based AI Dungeon for fiction writing in the past until OpenAI forced them to censor outputs, effectively smothering it in its sleep, and now using NovelAI, which is a GPT-J-6B based alternative, EleutherAI's model is clearly a step above GPT-3 in most practical applications. And this isn't even getting into OpenAI's privacy/control issues.

0 comments

sailingparrot4y ago

> I know that at least on most common performance benchmarks these claims are measurably false

What "these claims" are you referring to? It seems you are taking issue with only one specific claim of my comment, namely than GPT-3 6B is better quality than GPT-J 6B. Evaluations run by Eulether folks are available here [1] and I have the opposite subjective experience from you.

But even assuming I'm wrong, that doesn't change at all the substance of what I am saying: If you need better quality than GPT-J, then GPT-3 (DaVicing, 175B) is your only option.

And if you care about latency, last time I checked (6 months ago) OpenAI was miles ahead.

> in particular code generation for 6B is very clearly a strength of GPT-J even above the 275B GPT-3.

Note on that: ~8% of GPT-J training data is GitHub code, that's not the case for GPT-3 hence the difference. But OpenAI has a separate model avaiable in their API called Codex that is specifically tailored for code generation (also the model behind GH copilot) and that is much much better than GPT-J: Even the 300M params version of Codex outperforms it [2], and the API gives you access to a 12B version.

I'm not trying to sell you OpenAI's API though, it has indeed pretty severe limitations, I'm only saying there are real reasons that people might want to use it contrary to what the comment I was replying to was saying, and just replicating what they do isn't exactly a walk in the park.

[1]: https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/ [2]: https://arxiv.org/pdf/2107.03374.pdf

j / k navigate · click thread line to collapse