The False Promise of Imitating Proprietary LLMs (opens in new tab)

(arxiv.org)

126 pointslebek3y ago83 comments

83 comments

ofou3y ago

From the Conclusion:

"Finally, our work raises ethical and legal questions, including whether the open-source community should continue to advance progress by “stealing” what OpenAI and other companies have done, as well as what legal countermeasures companies can take to protect and license intellectual property."

Really???

croes3y ago

I think the creators of all the scraped training data would like to talk about intellectual property too

senttoschool3y ago

I've written hundreds of thousands of words, possibly millions on various sites including HN, Reddit, blogs, Stackoverflow, forgotten platforms, etc. Doesn't seem right that OpenAI/LLMs can use my intellect but I can't use theirs.

winddude3y ago

their work didn't they did.

I'm going to need verifiable proof this wasn't written by chatGPT as propaganda.

RobotToaster3y ago

Was this "study" sponsored by "open"ai?

washadjeffmad3y ago

That counts as a question, I guess.

blazespin3y ago

The breathtaking audacity of calling distilling GPT4 'stealing' when GPT4 trained on data it has no proprietary right to.

microtherion3y ago

"We ignore what created us; we adore what we create." — Aleister Crowley, The Book of Lies

RobotToaster3y ago

"You are trying to kidnap what I've rightfully stolen, and I think it quite ungentlemanly."

wilg3y ago

They put "stealing" in scare quotes, so it's probably not worth getting fired up about.

fasterik3y ago

Was GPT-4 trained on data that was acquired illegally? Or was it trained on data acquired legally that OpenAI didn't have the rights to redistribute? There is a difference. In the latter case, whether it counts as "stealing" would come down to whether or not GPT-4 counts as a derivative work, or some similar legal concept.

svaha17283y ago

https://www.washingtonpost.com/technology/interactive/2023/a...

Scribd has lots of pdfs of books that are copyrighted. The Washington Post article mentions there are several other places it downloaded and scraped pdfs of copyrighted textbooks, etc

fasterik3y ago

That's interesting to know, but that doesn't by itself imply that it's illegal. For example, Google Books, which has massive amounts of scanned PDFs of copyrighted works, is considered fair use under US copyright law.

2 more replies

kordlessagain3y ago

Just because someone can convert text to numbers doesn’t mean they have a right to the numbers. That’s like trying to own the emotion a book has on someone, or the things they see in mind when they read it.

blazespin3y ago

What I find rather amusing is they spend the whole paper dismissing it as ineffective yet still feel the need to worry about the 'ethics' and 'legality'. They don't cite anything with regards to a discussion/evidence of either, of course, and looking at the authorship list I don't believe any of them are lawyers or ethics experts.

colordrops3y ago

No one should have "rights" to any data, information, bits, or whatever. It's not physical and any attempt to apply artificial scarcity to replicate the physical world is a crime against humanity. The lines around which data is protected and which is copyable is arbitrary bullshit. You aren't stealing a fire when you light one candle with another. It's my storage device and I'm not breaking the law all of a sudden because the gates are holding a different set of charges.

looping__lui3y ago

By that logic, you also need to accept that no one should ever need to pay you for creating artifacts that are not bound to the physical world solely. I assume you work for free for your employer or in a space that is not “dealing” with data, information, bits, whatsoever.

1 more reply

quickthrower23y ago

Like a torrent of the last GoT season then?

… with compression.

croes3y ago

Imagine the GoT producers used GRRM's books without licensing and then claim copyright on the series.

Does OpenAI have the rights on all the texts they used to train their GPTs?

1 more reply

politician3y ago

I would like the big players to argue that they have some right to the numbers as it has important applications to BitTorrent and cryptography too for that matter.

runsWphotons3y ago

yeah this is insane thinking haha

layer83y ago

Stolen twice is still stolen.

cs7023y ago

The authors conduct automated, more methodical evaluations of LLMs finetuned to imitate ChatGPT outputs, and find that, despite superficial/informal appearances to the contrary, the base LLMs close little to none of the gap to ChatGPT on tasks that are not heavily supported in the imitation data.

It's not good news for the open LLM ecosystem.

evrydayhustling3y ago

This is a very weird type of paper. They take a specific approach, then make arguments about a broad class of approaches that are under constant development. The finding that distilled LLMs must be more specialized than the giant LLMs that train them is unsurprising; nobody at this point expects a 13B parameter model to succeed with the same accuracy at the broad range of tasks supported by what may be a 1T parameter model.

lebekOP3y ago

> nobody at this point expects a 13B parameter model to succeed with the same accuracy at the broad range of tasks supported by what may be a 1T parameter model

I think a lot of people believe exactly that. To take one example from the "We Have No Moat" essay:

"It doesn’t take long before the cumulative effect of all of these fine-tunings overcomes starting off at a size disadvantage. Indeed, in terms of engineer-hours, the pace of improvement from these models vastly outstrips what we can do with our largest variants, and the best are already largely indistinguishable from ChatGPT." - https://www.semianalysis.com/p/google-we-have-no-moat-and-ne...

evrydayhustling3y ago

That essay works in a context of specific datasets and tasks, which are referenced in the surrounding sentences and paragraphs. They are saying that for a particular "emergent" capability you might reach with a giant LLM, you might get there more efficiently with distillation / LoRa.

My comment is about generality, which is the remaining advantage of giant models.

int_19h3y ago

That is exactly what people are expecting, and largely because of misleading metrics thrown around to claim ridiculous things like e.g. Vicuna-13b being nearly as good as GPT-3.5. It even shows up in the comments here, and if you go to any tangentially related subreddit, that's the kind of stuff that gets told as "everybody knows" to people setting up a local LLM for the first time.

evrydayhustling3y ago

The Vicuna headline was def an overreach, although the main text admits pretty readily that their performance test (asking ChatGPT to evaluate quality) is not rigorous [1]. I'm sure that has set a lot of people pontificating about AGI with tiny models, but I can't imagine anyone who has worked directly with fine tuning having that impression.

The comments I see here are not about that. They are about small models succeeding at specific tasks, which is affirmed by this paper. Most applications of LLMs are not general purpose chat bots, so this is not bad news for most of the distill/fine tune community.

[1] https://lmsys.org/blog/2023-03-30-vicuna/

skybrian3y ago

Even if they don't start out expecting that, people might be fooled by how it behaves when they try it out. So it seems useful to point out that initial impressions based on crowdsourced evaluations are misleading:

> We were initially surprised by how much imitation models improve over their base models: they are far better at following instructions, and their outputs appear similar to ChatGPT’s. This was further supported by both human and GPT-4 evaluations, where the outputs of our best imitation model were rated as competitive with ChatGPT (e.g., Figure 1, left).

("Competitive" meaning that 70% outputs seemed about as good.)

mdale3y ago

I don't know if it's bad news per say. It helps to know where to deploy a tool, it's limitations and where to focus to build something competitive / better.

a0zU3y ago

They don't even use more methodological evaluations of fine-tuned LLMs, they use metrics that are specifically built to support a (false) contrarian conclusion in order to generate attention for their "paper."

flangola73y ago

Good news for alignment though. This gives me a tiny amount of hope.

int_19h3y ago

So, LLMs aligned with the interests of our corporate overlords and that nebulous "national security" thing that somehow always translates to more surveillance and less due process?

flangola73y ago

This tech has about as much chance to continue unregulated as highly enriched uranium. There is no future-path that includes unregulated AI.

I don't like horrific government abuse of residents,and I would not mind throwing most billionaire CEOs into a pool of alligators and dissolving their corporations. I don't like Altman, I think he's a smart person with NOBUS-level reckless hubris who is softballing the magnitude of the dangera to wet. The status quo is not good and it's getting worse.z

It doesn't matter. 5 people with launch-all-the-nukes buttons is better than 500 million.

2 more replies

winddude3y ago

"Second, given the large gap between LLaMA and ChatGPT (the latter model is faster, cheaper, and more accurate), "

No it's not, llama would be cheaper and likely faster if you ran it on the same scale, actually there've been a few calcs done, that running llama 65b if you're at 100% usage is cheaper than 3.5turbo per token. Also comparing them for accuracy isn't fair comparison, one is a foundational model, one is an instruct tuned model. Perhaps compare llama 65b with gpt3.

fomine33y ago

Isn't it the comparison is like "My home PC server is far cheaper than EC2"?

throwaway69773y ago

If they really didn't test anything bigger than 13b, as their abstract states, then this doesn't even seem worth reading through.

wmf3y ago

The "Google has no moat" thing claimed that Vicuna-13B was almost as good as ChatGPT and this paper seemingly refutes that.

ShamelessC3y ago

Claims made in a leaked blog post shouldn't be considered as having any sort of scientific authority. That whole "no moat" piece has exactly the tone I would expect from an over-confident Googler who has essentially been following all of this by watching various Discord channels and browsing hacker news. That isn't how science is done. It shouldn't be how business is done, but people seem to really enjoy these everything-is-actually-simple narratives.

space_fountain3y ago

That claim wasn't sourced from the Google has no moat paper, but from the announcement for Vicuna-13B if I recall (or some other similar model). It shouldn't be taken as an independent quality assessment

nptacek3y ago

my eyebrows went up at a number of choices made in their assessment

Sai_3y ago

Could you explain what other choices were red flags for you? I’m somewhat familiar with the open source LLMs space but not enough to know why some choices are better than others.

brucethemoose23y ago

The jump between llama 13B and 30B is quite significant. And their instruction finetuning is not SOTA I don't think, though the point about general knowledge is a good one: instruction llama lies very confidently.

But one great thing about open source LLMs is that you can specialize them in various tasks with affordable LORA training, enough to easily beat GPT4 in a specific niche.

ThorsBane3y ago

Any recommended starting points for LORA training llama 30B on a specific niche? Books, tutorials, videos are all appreciated. Thanks for your time!

f_devd3y ago

Currently SOTA for specialization of LLMs is QLoRA: https://github.com/artidoro/qlora

ThorsBane3y ago

Thank you very much my friend. Be blessed in all that you do. <3

ImprobableTruth3y ago

This isn't a new result really. We already know through the gpt-4 paper that rlhf style fine-tuning just makes the model more compliant, not more capable.

YetAnotherNick3y ago

Exactly. Few hundreds of thousands of interactions with chatgpt is definitely too less for model to learn lot of new things. The thing it does well is make them much better at following instructions. It also makes it much better at working with given context.

mxwsn3y ago

This is an important study and I've been waiting for something like this ever since Alpaca and the following wave of imitating models that have had lackluster, non rigorous evaluation.

dspoka3y ago

Sensational title that misrepresents the message in paper.

However, when conducting more targeted automatic evaluations, we found that the imitation models close little to none of the large gap between LLaMA and ChatGPT. In particular, we demonstrate that imitation models improve on evaluation tasks that are heavily supported in the imitation training data. On the other hand, the models do not improve (or even decline in accuracy) on evaluation datasets for which there is little support. For example, training on 100k ChatGPT outputs from broad-coverage user inputs provides no benefits to Natural Questions accuracy (e.g., Figure 1, center), but training exclusively on ChatGPT responses for Natural-Questions-like queries drastically improves task accuracy.

Just because this might not be the way to replicate the performance of ChatGPT across all tasks, it seems to work quite well on whichever tasks are in the imitation learning. That is still a big win.

Later on this also works for factual correctness. (leaving aside the argument whether this is the right approach for factuality)

For example, training on 100k ChatGPT outputs from broad-coverage user inputs provides no benefits to Natural Questions accuracy (e.g., Figure 1, center), but training exclusively on ChatGPT responses for Natural-Questions-like queries drastically improves task accuracy.

blazespin3y ago

To be fair, this paper has been made obsolete in its entirety with recent research. It's not really their fault, but folks need to start publishing faster as posters or something if they want to provide something relevant.

A better title, knowing what we now, might be "To outperform GPT4, do more than imitating"

lebekOP3y ago

Link to said research?

kamranjon3y ago

I'd be really curious what the authors of the recent (3 days ago) paper on QLora would think of this article? https://arxiv.org/abs/2305.14314 - they claim "Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU"

Particularly this statement seems relevant: "We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Furthermore, we find that current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to ChatGPT."

courseofaction3y ago

Does this mean that, if one wants GPT-4 quality outputs on a topic, one should specifically generate a dataset on that topic to fine-tune their own model?

There's still room for closing the gap, but ultimately it's only going to be a pale imitation when the underlying model's representations aren't as useful.

nologic013y ago

> imitation models are adept at mimicking ChatGPT's style but not its factuality

this is largely a pot calling the kettle black. The LLM game is not about not mimicking somebody else. It is about not being caught doing so :-)

winddude3y ago

"However, imitation falls short in improving LMs across more challenging axes such as factuality, coding, and problem solving."

Brilliant observation captain obvious.

nmca3y ago

nobody with lots of experience with proprietary LMs is surprised

Sai_3y ago

And neither is anyone who has played with these new LLMs, found them so-so, and wondered whether the hype was warranted.

devjab3y ago

I’m curios as to why you think the hype isn’t warranted. If you go through my history (you don’t have too I’ll sum it up), you’ll see that I’m not impressed by the capabilities of LLM to actually do my work. Not for a lack of trying, but because ChatGPT simply tells too many lies. We’ve yet to get it to really do anything that wasn’t fairly basic, or solved a billion times on the internet anyway. Similarly we’ve stopped using co-pilot because it takes too much time to make it go away when it’s being bad to make up for the good it does.

Or to put it differently in SWE the LLM seem very bad at building things. What they are good at, however, is helping us build things. I’m not sure I’ll ever need to write JSDoc again on anything that isn’t too sensitive to share. Which is a significant efficiency and quality improvement on the work I do. I think of them as Swagger generators, but instead of being for an OpenAPI standard they are for everything. I imagine they’ll become very good at automating testing as another example, which will again be a further improvement on the work a single developer does.

In terms management might understand. I think you can view LLMs similarity to the way we’ve seen frameworks and tooling reduce the team size needed to build an application significantly over the previous 30 years. If you wanted to build a web-portal for asset management in 1999 you’d need a large team to do what a single developer and a good PO can do today. Maybe we won’t see the same reduction manpower, but instead an increase in quality.

Sai_3y ago

I meant the hype around open source LLMs, not OpenAI's LLM. On reading your response and my original comment, I suspect you thought I was including OpenAI's LLM as a hype-driven product. Sorry if you didn't think that.

That said, the rest of your comment is spot-on.

Paul G says this too that ChatGPT expertise is the same as a journalist's expertise. Its output seems impressive until it is on a subject you know very well.

GPT-x is like a wide-eyed intern or junior team member who loves to shoot its mouth because it has been told to be assertive and vocal. The good thing is that it is willing to learn.

Now, if this is true of GPT-x which is pretty much the benchmark against which every open source LLM is being measured, you can guess for yourself how much room these open source LLMs still have to cover.

airgapstopgap3y ago

This is exactly the reason OpenAI isn't afraid of the open-source community, like many kneejerk opponents of regulatory capture assume (they are probably still afraid of Google). Also why they still do the expensive and cumbersome RLHF training, instead of those deceptively cheap and fast finetunes. They understand their own tech and why there isn't free lunch.

Recently, John Schulman explained the issue with behavior cloning and it's a very typical ML problem.[1] Basically: what are we training the model to do? The model updates after finetuning in a holistic manner, based on the sum total of its content and capability. Suppose GPT-4 can correctly answer to many requests because it knows correct answers, in the sense that it has something isomorphic to an internal knowledge graph and tools for querying it, and that graph contains sufficient data for its tools to derive an answer at inference. RLHF reinforces this behavior by constraining the distribution of outputs (essentially, steering the model away from applying inappropriate tools for respective inputs, e.g. employing fantasy-narrative or bad-yahoo-answers cognitive routines when asked something that looks like a straightforward factual question).

Now suppose you teach LLaMA-13B to imitate those responses by SFTing it on a dump of successful GPT-4 conversations. But LLaMA doesn't have internals that would have enabled it to find the same answers; so on the object level it shallowly memorizes specific items of the post-training dataset, and on the meta-level it learns the stylistic flourish of a high-powered model. But it starts to hallucinate confident nonsense whenever you step out of the training distribution, because it doesn't actually learn to query its own knowledge graph. A little anthropomorphism won't hurt: you create an incapable impostor this way, a wannabe nerd, a character who is used to guessing the teacher's password and being praised, instead of understanding the subject, and keeps raising its hand whenever a question is asked, but is painfully clueless.

Indeed, the early and cheap success of behavior cloning was a massive red flag unto itself. There's no way all the compute and data that went into training GPT-3/3.5/4 tier models can be substituted with gently demonstrating the attitude vector. If we had models that were markedly less capable but comparably honest, we would have reasons for hope that this line terminates in a genuine open-source peer competitor; instead, we have total fraud.

It is a nontrivial task to have a model generalize epistemic honesty and not a lower-order behavior like clamping up and kowtowing or bullshitting from external examples; train it to say "I don't know" whenever it actually does not, but only then.

There are clever approaches here, but they're not such a low-hanging fruit as what passes for open-source right now.

1. https://youtu.be/hhiLw5Q_UFg?t=685

65103y ago

Before the internet I use to laugh my ass of at school friends who would wonder/debate about something but never bother to look it up, in stead they had elaborate collective "hallucinations", they imagined the facts until they were satisfied their answer was well reasoned enough, then they would consider it a fact. We all do this at times (at all ages) but one must learn to shut down the train of thought, stop polluting your memory.

I remember one from very early in life. I postulated out loud that Jerusalem, being the birth place of Jesus, must be the most peaceful place on earth. All those loving and caring religious people who work so hard to be good people. That their religions are slightly different shouldn't matter to Jesus message?

That the LLM's can consume such huge amounts of data doesn't mean they matured beyond that rather infantile mind set.

In the video you linked he explained that training it to learn to say it doesn't know will trigger false negatives.

The correct formula I imagine (hah!) is to wonder if the question is of interest to the model and to ask someone else for answers or some help figuring out the question. The human will just have to wait.

What is completely hilarious to me is that we all have heads full of learned answers for which we have no idea "why it is so" or at the very least lack that what would have one arrive at that solution. I get what Archimedes realized in the bathtub but what I want is the mechanism by which he arrived at such wild ideas. Could it be that learning a lot of facts would be the exact opposite kind of conditioning?

My mind now says this must be why we humans expire so fast. You keep the calcified brains around for a while as data storage but the focus of project humanity must be to create young ones. I will have to ponder this fact free line of reasoning some more. Perhaps I will find ways to convince myself I know something.

It is a fun thought that people created AI, we really want to believe we did. If enough pretend it is true no one can take it away from us.

If you want people to think you are intelligent you tell them things they already know and hide your sources.

imtringued3y ago

Humans don't expire very quickly.

Most humans are going to outlast whatever they produced during their lives. If anything, human bodies are among the most durable "goods" in the economy. Only real estate, public infrastructure and recorded knowledge (including genes) last longer than a human lifetime. How many of the things you buy and own are going to outlast you?

65103y ago

I was referring to the age distribution. If having a smaller percentage young people gave us an evolutionary edge it would have happened(?) Say the ratio rebellious exploration vs applied knowledge. What is ideal?

All the goods we produce are designed to last for a specific time. We can easily make them more durable and with some serious effort they could last longer than we can imagine. It would be expensive, it might be beneficial but who wants to pay for benefits 100 or 200 years into the future?

a0zU3y ago

>Grammatical error in the abstract.

luckystarr3y ago

Conspiracy theory: Is that the reason why GPT-4 is not available as an API? So people wouldn't siphon off it's capabilities?

int_19h3y ago

Not only it is available, but people have been fine-tuning LLaMA using GPT-4 outputs and training data generated by it for a while now; that's why you see models with names like gpt4-x-alpaca on HuggingFace.

arugulum3y ago

1. It is available via API 2. Likely not a conspiracy theory. Newer models don't have logits available and that's almost certainly because they didn't want other labs distilling from them.

luckystarr3y ago

> Newer models don't have logits available [...]

This definitely is a smoking gun. Keeping the crown jewels for themselves.

Semaphor3y ago

But it is, waitlisted-gated, but available. I have API access.

nmfisher3y ago

I think it is available now.

j / k navigate · click thread line to collapse

83 comments

ofou3y ago

From the Conclusion:

Really???

croes3y ago

I think the creators of all the scraped training data would like to talk about intellectual property too

senttoschool3y ago

winddude3y ago

their work didn't they did.

I'm going to need verifiable proof this wasn't written by chatGPT as propaganda.

RobotToaster3y ago

Was this "study" sponsored by "open"ai?

washadjeffmad3y ago

That counts as a question, I guess.

blazespin3y ago

The breathtaking audacity of calling distilling GPT4 'stealing' when GPT4 trained on data it has no proprietary right to.

microtherion3y ago

"We ignore what created us; we adore what we create." — Aleister Crowley, The Book of Lies

RobotToaster3y ago

"You are trying to kidnap what I've rightfully stolen, and I think it quite ungentlemanly."

wilg3y ago

They put "stealing" in scare quotes, so it's probably not worth getting fired up about.

fasterik3y ago

svaha17283y ago

https://www.washingtonpost.com/technology/interactive/2023/a...

Scribd has lots of pdfs of books that are copyrighted. The Washington Post article mentions there are several other places it downloaded and scraped pdfs of copyrighted textbooks, etc

fasterik3y ago

2 more replies

kordlessagain3y ago

blazespin3y ago

colordrops3y ago

looping__lui3y ago

1 more reply

quickthrower23y ago

Like a torrent of the last GoT season then?

… with compression.

croes3y ago

Imagine the GoT producers used GRRM's books without licensing and then claim copyright on the series.

Does OpenAI have the rights on all the texts they used to train their GPTs?

1 more reply

politician3y ago

I would like the big players to argue that they have some right to the numbers as it has important applications to BitTorrent and cryptography too for that matter.

runsWphotons3y ago

yeah this is insane thinking haha

layer83y ago

Stolen twice is still stolen.

cs7023y ago

It's not good news for the open LLM ecosystem.

evrydayhustling3y ago

lebekOP3y ago

> nobody at this point expects a 13B parameter model to succeed with the same accuracy at the broad range of tasks supported by what may be a 1T parameter model

I think a lot of people believe exactly that. To take one example from the "We Have No Moat" essay:

evrydayhustling3y ago

My comment is about generality, which is the remaining advantage of giant models.

int_19h3y ago

evrydayhustling3y ago

[1] https://lmsys.org/blog/2023-03-30-vicuna/

skybrian3y ago

("Competitive" meaning that 70% outputs seemed about as good.)

mdale3y ago

I don't know if it's bad news per say. It helps to know where to deploy a tool, it's limitations and where to focus to build something competitive / better.

a0zU3y ago

flangola73y ago

Good news for alignment though. This gives me a tiny amount of hope.

int_19h3y ago

So, LLMs aligned with the interests of our corporate overlords and that nebulous "national security" thing that somehow always translates to more surveillance and less due process?

flangola73y ago

This tech has about as much chance to continue unregulated as highly enriched uranium. There is no future-path that includes unregulated AI.

It doesn't matter. 5 people with launch-all-the-nukes buttons is better than 500 million.

2 more replies

winddude3y ago

"Second, given the large gap between LLaMA and ChatGPT (the latter model is faster, cheaper, and more accurate), "

fomine33y ago

Isn't it the comparison is like "My home PC server is far cheaper than EC2"?

throwaway69773y ago

If they really didn't test anything bigger than 13b, as their abstract states, then this doesn't even seem worth reading through.

wmf3y ago

The "Google has no moat" thing claimed that Vicuna-13B was almost as good as ChatGPT and this paper seemingly refutes that.

ShamelessC3y ago

space_fountain3y ago

nptacek3y ago

my eyebrows went up at a number of choices made in their assessment

Sai_3y ago

Could you explain what other choices were red flags for you? I’m somewhat familiar with the open source LLMs space but not enough to know why some choices are better than others.

brucethemoose23y ago

But one great thing about open source LLMs is that you can specialize them in various tasks with affordable LORA training, enough to easily beat GPT4 in a specific niche.

ThorsBane3y ago

Any recommended starting points for LORA training llama 30B on a specific niche? Books, tutorials, videos are all appreciated. Thanks for your time!

f_devd3y ago

Currently SOTA for specialization of LLMs is QLoRA: https://github.com/artidoro/qlora

ThorsBane3y ago

Thank you very much my friend. Be blessed in all that you do. <3

ImprobableTruth3y ago

This isn't a new result really. We already know through the gpt-4 paper that rlhf style fine-tuning just makes the model more compliant, not more capable.

YetAnotherNick3y ago

mxwsn3y ago

This is an important study and I've been waiting for something like this ever since Alpaca and the following wave of imitating models that have had lackluster, non rigorous evaluation.

dspoka3y ago

Sensational title that misrepresents the message in paper.

Just because this might not be the way to replicate the performance of ChatGPT across all tasks, it seems to work quite well on whichever tasks are in the imitation learning. That is still a big win.

Later on this also works for factual correctness. (leaving aside the argument whether this is the right approach for factuality)

blazespin3y ago

A better title, knowing what we now, might be "To outperform GPT4, do more than imitating"

lebekOP3y ago

Link to said research?

kamranjon3y ago

courseofaction3y ago

Does this mean that, if one wants GPT-4 quality outputs on a topic, one should specifically generate a dataset on that topic to fine-tune their own model?

There's still room for closing the gap, but ultimately it's only going to be a pale imitation when the underlying model's representations aren't as useful.

nologic013y ago

> imitation models are adept at mimicking ChatGPT's style but not its factuality

this is largely a pot calling the kettle black. The LLM game is not about not mimicking somebody else. It is about not being caught doing so :-)

winddude3y ago

"However, imitation falls short in improving LMs across more challenging axes such as factuality, coding, and problem solving."

Brilliant observation captain obvious.

nmca3y ago

nobody with lots of experience with proprietary LMs is surprised

Sai_3y ago

And neither is anyone who has played with these new LLMs, found them so-so, and wondered whether the hype was warranted.

devjab3y ago

Sai_3y ago

That said, the rest of your comment is spot-on.

Paul G says this too that ChatGPT expertise is the same as a journalist's expertise. Its output seems impressive until it is on a subject you know very well.

GPT-x is like a wide-eyed intern or junior team member who loves to shoot its mouth because it has been told to be assertive and vocal. The good thing is that it is willing to learn.

airgapstopgap3y ago

There are clever approaches here, but they're not such a low-hanging fruit as what passes for open-source right now.

1. https://youtu.be/hhiLw5Q_UFg?t=685

65103y ago

That the LLM's can consume such huge amounts of data doesn't mean they matured beyond that rather infantile mind set.

In the video you linked he explained that training it to learn to say it doesn't know will trigger false negatives.

It is a fun thought that people created AI, we really want to believe we did. If enough pretend it is true no one can take it away from us.

If you want people to think you are intelligent you tell them things they already know and hide your sources.

imtringued3y ago

Humans don't expire very quickly.

65103y ago

a0zU3y ago

>Grammatical error in the abstract.

luckystarr3y ago

Conspiracy theory: Is that the reason why GPT-4 is not available as an API? So people wouldn't siphon off it's capabilities?

int_19h3y ago

arugulum3y ago

1. It is available via API 2. Likely not a conspiracy theory. Newer models don't have logits available and that's almost certainly because they didn't want other labs distilling from them.

luckystarr3y ago

> Newer models don't have logits available [...]

This definitely is a smoking gun. Keeping the crown jewels for themselves.

Semaphor3y ago

But it is, waitlisted-gated, but available. I have API access.

nmfisher3y ago

I think it is available now.

j / k navigate · click thread line to collapse