undefined | Better HN

0 pointsm3at2y ago0 comments

For others that were confused by the Gemini versions: the main one being discussed is Gemini Ultra (which is claimed to beat GPT-4). The one available through Bard is Gemini Pro.

For the differences, looking at the technical report [1] on selected benchmarks, rounded score in %:

Dataset | Gemini Ultra | Gemini Pro | GPT-4

MMLU | 90 | 79 | 87

BIG-Bench-Hard | 84 | 75 | 83

HellaSwag | 88 | 85 | 95

Natural2Code | 75 | 70 | 74

WMT23 | 74 | 72 | 74

[1] https://storage.googleapis.com/deepmind-media/gemini/gemini_...

0 comments

Traubenfuchs2y ago

formatted nicely:

  Dataset        | Gemini Ultra | Gemini Pro | GPT-4

  MMLU           | 90           | 79         | 87

  BIG-Bench-Hard | 84           | 75         | 83

  HellaSwag      | 88           | 85         | 95

  Natural2Code   | 75           | 70         | 74

  WMT23          | 74           | 72         | 74

teleforce2y ago

Excellent comparison, it seems that GPT-4 is only winning in one dataset benchmark namely HellaSwag for sentence completion.

Can't wait to get my hands on Bard Advanced with Gemini Ultra, I for one welcome this new AI overlord.

aroo2y ago

Horrible comparison given one score was achieved using 32-shot CoT (Gemini) and the other was 5-shot (GPT-4).

1 more reply

carbocation2y ago

I realize that this is essentially a ridiculous question, but has anyone offered a qualitative evaluation of these benchmarks? Like, I feel that GPT-4 (pre-turbo) was an extremely powerful model for almost anything I wanted help with. Whereas I feel like Bard is not great. So does this mean that my experience aligns with "HellaSwag"?

p_j_w2y ago

>Like, I feel that GPT-4 (pre-turbo) was an extremely powerful model for almost anything I wanted help with. Whereas I feel like Bard is not great. So does this mean that my experience aligns with "HellaSwag"?

It doesn't mean that at all because Gemini Turbo isn't available in Bard yet.

1 more reply

tarruda2y ago

I get what you mean, but what would such "qualitative evaluation" look like?

1 more reply

nathanfig2y ago

Thanks, I was looking for clarification on this. Using Bard now does not feel GPT-4 level yet, and this would explain why.

dkarras2y ago

not even original chatgpt level, it is a hallucinating mess still. Did the free bard get an update today? I am in the included countries, but it feels the same as it has always been.

tiziano882y ago

Permanent link to the result table contents: https://static.space/sha2-256:ea7e5d247afa8306cb84cbbd4438fd...

make32y ago

the numbers are not at all comparable, because Gemini uses 34 shot and variable shot vs 5 for gpt 4. this is very deceptive of them.

bitshiftfaced2y ago

Yes and no. In the paper, they do compare apples to apples with GPT4 (they directly test GPT4's CoT@32 but state its 5-shot as "reported"). GPT4 wins 5-shot and Gemini wins CoT@32. It also came off to me like they were implying something is off about GPT4's MMLU.

j / k navigate · click thread line to collapse

0 pointsm3at2y ago0 comments

For others that were confused by the Gemini versions: the main one being discussed is Gemini Ultra (which is claimed to beat GPT-4). The one available through Bard is Gemini Pro.

For the differences, looking at the technical report [1] on selected benchmarks, rounded score in %:

Dataset | Gemini Ultra | Gemini Pro | GPT-4

MMLU | 90 | 79 | 87

BIG-Bench-Hard | 84 | 75 | 83

HellaSwag | 88 | 85 | 95

Natural2Code | 75 | 70 | 74

WMT23 | 74 | 72 | 74

[1] https://storage.googleapis.com/deepmind-media/gemini/gemini_...

0 comments

Traubenfuchs2y ago

formatted nicely:

  Dataset        | Gemini Ultra | Gemini Pro | GPT-4

  MMLU           | 90           | 79         | 87

  BIG-Bench-Hard | 84           | 75         | 83

  HellaSwag      | 88           | 85         | 95

  Natural2Code   | 75           | 70         | 74

  WMT23          | 74           | 72         | 74

teleforce2y ago

Excellent comparison, it seems that GPT-4 is only winning in one dataset benchmark namely HellaSwag for sentence completion.

Can't wait to get my hands on Bard Advanced with Gemini Ultra, I for one welcome this new AI overlord.

aroo2y ago

Horrible comparison given one score was achieved using 32-shot CoT (Gemini) and the other was 5-shot (GPT-4).

1 more reply

carbocation2y ago

p_j_w2y ago

It doesn't mean that at all because Gemini Turbo isn't available in Bard yet.

1 more reply

tarruda2y ago

I get what you mean, but what would such "qualitative evaluation" look like?

1 more reply

nathanfig2y ago

Thanks, I was looking for clarification on this. Using Bard now does not feel GPT-4 level yet, and this would explain why.

dkarras2y ago

not even original chatgpt level, it is a hallucinating mess still. Did the free bard get an update today? I am in the included countries, but it feels the same as it has always been.

tiziano882y ago

Permanent link to the result table contents: https://static.space/sha2-256:ea7e5d247afa8306cb84cbbd4438fd...

make32y ago

the numbers are not at all comparable, because Gemini uses 34 shot and variable shot vs 5 for gpt 4. this is very deceptive of them.

bitshiftfaced2y ago

j / k navigate · click thread line to collapse