New AI Training Technique Is Drastically Faster, Says Google (opens in new tab)

(decrypt.co)

84 pointsmoondistance1y ago38 comments

38 comments

So the paper itself is pretty significant, I think, from looking at it. The general methodology seems to be: train small model as a discriminatory scoring model on very high quality data (JEST is mostly concerned with multi-modal tasks it seems, so think image/text caption pairs), have that model score ‘maximally learnable’ batches on a larger / lower quality dataset, then train the big model using the scoring.

This turns out to be significant FLOPs and quality win, even counting for the initial model training and scoring part of it, they claim roughly 10x for quality/FLOP tradeoffs, and they show some significantly beating SOTA numbers for some tasks in their model size.

The bad part, to me, is that this is some significant engineering — it requires known high quality datasets, training of the scoring model, selection and scoring of the data for the big training run - this is not a bold new leap that’s going to be easy to implement for hobbyists - this is a practitioner’s excellent engineering showing the way forward for certain training needs.

As always, appreciate the publishing from DeepMind - this looks like great work. It would be nice to see a company like together.ai or others get it actionized into a pipeline; it might be a bit, though. It looks relatively gnarly in the details on the data and scoring side.

kmmlng1y ago

Isn't this similar to what Microsoft did with their Phi models?

vessenes1y ago

I don’t think so — the Phi training plan was to pull answers from textbooks and have GPT-4 write questions for the answers, thus ensuring high quality completions. They then trained on this data, fairly indiscriminately. This is about quality of training data, but it’s much more general in that it’s an approach that can target broad scale web data using a small / cheap model to ‘sort’ and prioritize.

morbicer1y ago

Nice. Google scientists come up with ground breaking idea, then Google's PM bungles the chance to bring it to the market and productize it and someone like OpenAI or Anthropic will swoop in to reap the rewards. And the cycle repeats.

Deep Mind people invent transformers and then they watch people laugh at Bard or what it's called nowadays because product and engineering lost the plot. Kodak is paging you some message from the grave, read it Google.

kirubakaran1y ago

Sounds like a management issue, not a PM/Engg issue

morbicer1y ago

Yes, PM to me stands for product _management_ so management issue. Same for engineering - it doesn't mean just individual contributors, there's someone managing the engineering as well.

kirubakaran1y ago

I meant Senior Management. "Product Manager" manages the product, not people.

1 more reply

dyauspitr1y ago

Not being able to take ideas and turn them into products clients want is solely a product management issue.

fhub1y ago

Not always. For example if bringing a new product to market has the perception it might eat into existing revenue then all sorts of managment shenanigans will likely happen at most big orgs.

dyauspitr1y ago

Gemini is solid. I’ll give them a year or two before they start building an unscalable moat.

josephg1y ago

Claude 3.5 seems better, to me. And ChatGPT is still excellent. Why on earth do you think Google will win this race?

dieortin1y ago

Claude 3.5 is much newer, not really a fair comparison. Google has the benefit of a huge client base, including enterprises. They can integrate Gemini into their other offerings, which OpenAI / Anthropic cannot.

ricopags1y ago

Largest context window (because of most compute) wins.

They've got more serious engineering heavyweights, putting a lot of collective work on fewer tasks/approaches. Microsoft is taking more of a kitchen sink approach

2 more replies

dbuser991y ago

What are you on about? They publish their research advancing the field. And gemini has caught up with openai and anybody else.

morbicer1y ago

I am glad they are advancing the field but I think it's unfortunate that doesn't make them top dog. Gemini is not top tier to me but I admit that confusing naming and spotty worldwide rollout might be a reason why I am not familiar with their best model. But that's a signal on it's own.

The launch was faked and I don't think the real thing is here yet https://techcrunch.com/2023/12/07/googles-best-gemini-demo-w...

AndyNemmity1y ago

Based on this comment, decided to try out gemini.

Total disaster. Doing similar tasks to openai and claude, it just borks. And it is complaining about my desire to use a gender guesser python libary, and tells me that's inappropriate for non-binary people, and it won't do it.

That's fun.

Edit 1: Also it refuses to print the entire script. I've tried many work arounds, it seems to only want to output a very small number of total lines.

Threw it into ChatGPT and immediately it fixed all the issues with Gemini, and worked on first try.

Edit 2: The only thing better about Gemini as far as I can tell, is that the copy code button is on the bottom. ChatGPT's is at the top, and that's dumb.

Edit 3: I'm being downvoted heavily now, to be clear, I didn't intentionally seek out the gender issue, it's just what I was working on.

I'm currently trying to generate infographics based on wrestlers, and I needed to split the men from the female for championship title rankings.

I have no problem with it in general, it just came up, so I communicated it.

Multiple times Gemini removed the code using the gender guesser library because it felt I shouldn't use it. When trying to determine wrestlers, and their Title Chances, it makes a lot of sense...

But Gemini just refused to allow me to use it, which seems like a ridiculous thing. I want to make the choices here.

fswd1y ago

I've had the same exact experiences.

pheatherlite1y ago

Problem with Google summed up. Ethics and pseudo sciences folks wanting to opinionate technology. That's akin to a kitchen knife refusing to cut gift wrapping paper because that's inappropriate use of a knife. The silliness

septic-liqueur1y ago

The problem with Gemini is the guardrails they've built into it which makes it useless for me. Which is a problem that has to do with Google and not any AI smarts.

hiddencost1y ago

Deepmind did not invent transformers...

https://arxiv.org/abs/1706.03762 https://arxiv.org/abs/1810.04805

josephg1y ago

Look at the author lists in the pdfs. Almost all of them are @google.com. They were Google employees when they wrote and published those papers.

alecco1y ago

Google Brain and Google Research are not DeepMind (an acquired company based in London). Transformers came out on 2017. Last year DeepMind was told to merge with the rest of Google AI (kinda). But it looks like they are still quite independent.

https://en.wikipedia.org/wiki/Google_Brain

https://en.wikipedia.org/wiki/Google_DeepMind

https://research.google/teams/

eutropia1y ago

https://arxiv.org/pdf/2406.17711 - link to the paper

1 more reply

kelseyfrog1y ago

Great, improvements in efficiency will lead to greater resource consumption due to Jevons Paradox[1].

1. https://en.wikipedia.org/wiki/Jevons_paradox

epistasis1y ago

Jevon's paradox is not inevitable, and only happens in a very few situations, and certainly not all.

And your statement of it is incorrect. It can result in greater demand, but it's not about resulting greater resource usage.

Some minority of efficiency improvements can sometimes lead to greater resource consumption, but overall efficiency does result in less resource usage.

kelseyfrog1y ago

How do we know if this particular instance will result in Jevons paradox?

Mehvix1y ago

>the falling cost of use induces increases in demand enough that resource use is increased, rather than reduced

This is just saying throughput is increased, yes? The time to train, and thus iterate (i.e. dialing in hyperpaprams) will decrease.

kelseyfrog1y ago

It calls into question the byline "which could mean lower energy demands."

Ie: more efficient steam engines lead to both an increase of steam engine throughput as well as coal consumption, an increase in AI efficiency can lead to an increase in training throughput and energy consumption.

The paradox is a result of prevalence scaling faster than efficiency and efficiency driving prevalence.

ricopags1y ago

Pretty similar to cappy https://arxiv.org/abs/2311.06720

swax1y ago

AI advancement is coming at us both ways - orders of magnitude more compute, with orders of magnitude more efficiency. Hyper exponential.

Dylan168071y ago

The efficiency has not improved all that much, and when you multiply two exponential growths it's still exponential.

Though even when you add the efficiency improvements I think we're still lagging behind Moore's Law overall.

downboots1y ago

I only hope it brings about more integration of our vast amounts of data instead of more generative inaccuracy

j / k navigate · click thread line to collapse