Technical paper: https://goo.gle/GeminiPaper
Some details:
- 32k context length
- efficient attention mechanisms (for e.g. multi-query attention (Shazeer, 2019))
- audio input via Universal Speech Model (USM) (Zhang et al., 2023) features
- no audio output? (Figure 2)
- visual encoding of Gemini models is inspired by our own foundational work on Flamingo (Alayrac et al., 2022), CoCa (Yu et al., 2022a), and PaLI (Chen et al., 2022)
- output images using discrete image tokens (Ramesh et al., 2021; Yu et al., 2022b)
- supervised fine tuning (SFT) and reinforcement learning through human feedback (RLHF)
I think these are already more details than what we got from OpenAI about GPT4, but on the other side, still only very little details.
For MMLU, it highlights the CoT @ 32 result, where Ultra beats GPT4, but it loses to GPT4 with 5-shot, for example.
For GSM8K it uses Maj1@32 for Ultra and 5-shot CoT for GPT4, etc.
Then also, for some reason, it uses different metrics for Ultra and Pro, making them hard to compare.
What a mess of a "paper".
They simply compare the prompting strategies that work best with each model. Otherwise it would be just a comparison of their response to specific prompt engineering.
Incorrect.
# Gemini marketing website, MMLU
- Gemini Ultra 90.0% with CoT@32*
- GPT-4 86.4% with 5-shot* (reported)
# gemini_1_report.pdf, MMLU
- Gemini Ultra 90.0% with CoT@32*
- Gemini Ultra 83.7% with 5-shot
- GPT-4 87.29% with CoT@32 (via API*)
- GPT-4 86.4% with 5-shot (reported)
Gemini marketing website compared best Gemini Ultra prompting strategy with a worse-performing (5-shot) GPT-4 prompting strategy.
(nitter: https://nitter.net/a_a_cabrera/status/1732454328307511807#m)
Ultra is out sometime next year, with GPT-4 level capability.
Pro is out now (?) with ??? level capability.
Sadly it's 3.5 quality, :(
Apple does this and it's obvious that they do it to use the "decoy effect" when customers want to shop. Why purchase a measly regular iPhone when you can spend a little more and get the Pro version?
But when it comes to AI, this tierification only leads to disappointment—everyone expects the best models from the FAANGO (including OpenAI), no one expects Google or OpenAI to offer shitty models that underperform their flagships when you can literally run Llama 2 and Mistral models that you can actually own.
They're tiers of computing power and memory. More performance costs more money to produce. The "nano" can fit on a phone, while the others can't.
Are you really objecting to the existence of different price/performance tiers...? Do you object to McDonald's selling 3 sizes of soft drink? There's nothing "decoy" about any of this.
Unless you expect Apple to just sell the high end devices at a loss? Or do you want the high end chips to be sold in the mass market devices and for Apple to just eat the R&D costs?
Large AI models have tight resources requirements. You physically can't use X billion parameters without ~X billion ~bytes of memory.
It makes complete sense to have these 3 "tiers". You have a max capability option, a price-performance scaling option, and an edge compute option.
IMO, Tiers can be useful when they make sense and aren't just for artificial market segmentation.