undefined | Better HN

0 pointsthrowaw1215d ago0 comments

Because this is Microsoft, experimenting and failing is not encouraged, taking less risky bets and getting promoted is. Also no customer asked them to have 1-bit model, hence PM didn't prioritize it.

But it doesn't mean, idea is worthless.

You could have said same about Transformers, Google released it, but didn't move forward, turns out it was a great idea.

0 comments

embedding-shape15d ago

> You could have said same about Transformers, Google released it, but didn't move forward,

I don't think you can, Google looked at the research results, and continued researching Transformers and related technologies, because they saw the value for it particularly in translations. It's part of the original paper, what direction to take, give it a read, it's relatively approachable for being a machine learning paper :)

Sure, it took OpenAI to make it into an "assistant" that answered questions, but it's not like Google was completely sleeping on the Transformer, they just had other research directions to go into first.

> But it doesn't mean, idea is worthless.

I agree, they aren't, hope that wasn't what my message read as :) But, ideas that don't actually pan out in reality are slightly less useful than ideas that do pan out once put to practice. Root commentator seems to try to say "This is a great idea, it's all ready, only missing piece is for someone to do the training and it'll pan out!" which I'm a bit skeptical about, since it's been two years since they introduced the idea.

zozbot23415d ago

What OpenAI did was train increasingly large transformer model instances. which was sensible because transformers allowed for a scaling up of training compared to earlier models. The resulting instances (GPT) showed good understanding of natural language syntax and generation of mostly sensible text (which was unprecedented at the time) so they made ChatGPT by adding new stages of supervised fine tuning and RLHF to their pretrained text-prediction models.

mattalex14d ago

There were plenty of models the size of gpt3 in industry.

The core insight necessary for chatgpt was not scaling (that was already widely accepted): the insight was that instead of finetuning for each individual task, you can finetune once for the meta-task of instruction following, which brings a problem specification directly into the data stream.

joquarky14d ago

I miss having the completion models like davinci-003 since it gained in performance where it lacked simplicity to get what you want out.

It was fun to come up with creative ways to get it to answer your question or generate data by setting up a completion scenario.

I guess "chat" became the universal completion scenario. But I still feel like it could be "smarter" without the RLHF layer of distortion.

Schlagbohrer14d ago

Google had been working on a big LLM but they wanted to resolve all the safety concerns before releasing it. It was only when OpenAI went "YOLO! Check this out!" that Google then internally said, "Damn the safety concerns, full speed ahead!" and now we find ourselves in this breakneck race in which all safety concerns have been sidelined.

gardnr14d ago

Scaling seemed like the important idea that everyone was chasing. OpenAI used to be a lot more safety minded because it was in their non profit charter, now they’ve gone for-profit and weaponized their tech for the USA military. Pretty wild turnaround. Saying OpenAI was cavalier with safety in the early days is inaccurate. It was a skill issue. Remember Bard? Google was slow.

joquarky14d ago

They thought people might prefer quality and safety.

wongarsu15d ago

On the one hand, not publishing any new models for an architecture in almost a year seems like forever given how things are moving right now. On the other hand I don't think that's very conclusive on whether they've given up on it or have other higher priority research directions to go into first either

vineyardmike14d ago

> You could have said same about Transformers, Google released it, but didn't move forward, turns out it was a great idea

Google released transforms as research because they invented it while improving Google Translate. They had been running it for customers for years.

Beyond that, they had publicly-used transformer based LMs ("mums") integrated into search before GPT-3 (pre-chat mode) was even trained. They were shipping transformer models generating text for years before the ChatGPT moment. Literally available on the Google SERP page is probably the widest deployment technology can have today.

Transformers are also used widely in ASR technologies, like Google Assistant, which of course was available to hundreds of millions of users.

Finally, they had a private-to-employees experimental LLMs available, as well as various research initatives released (meena, LaMDA, PaLM, BERT, etc) and other experiments, they just didn't productize everything (but see earlier points). They even experimented with scaling (see "Chinchilla scaling laws").

j / k navigate · click thread line to collapse

0 comments

embedding-shape15d ago

> You could have said same about Transformers, Google released it, but didn't move forward,

> But it doesn't mean, idea is worthless.

zozbot23415d ago

mattalex14d ago

There were plenty of models the size of gpt3 in industry.

joquarky14d ago

I miss having the completion models like davinci-003 since it gained in performance where it lacked simplicity to get what you want out.

It was fun to come up with creative ways to get it to answer your question or generate data by setting up a completion scenario.

I guess "chat" became the universal completion scenario. But I still feel like it could be "smarter" without the RLHF layer of distortion.

Schlagbohrer14d ago

gardnr14d ago

joquarky14d ago

They thought people might prefer quality and safety.

wongarsu15d ago

vineyardmike14d ago

> You could have said same about Transformers, Google released it, but didn't move forward, turns out it was a great idea

Google released transforms as research because they invented it while improving Google Translate. They had been running it for customers for years.

Transformers are also used widely in ASR technologies, like Google Assistant, which of course was available to hundreds of millions of users.

j / k navigate · click thread line to collapse