Groq surpasses 1,200 tokens/sec with Llama 3 8B (opens in new tab)

(twitter.com)

43 pointsYourCupOTea1y ago31 comments

31 comments

Groq is an insane company. SambaNova (discussed yesterday[0]) is also very promising. However, what I really want to see is local AI accelerator chips a la Tenstorrent Grayskull that can boost local generation to hundreds of tokens per second while being more efficient than GPUs.

[0]: https://news.ycombinator.com/item?id=40508797

frozenport1y ago

Samba is on gen 4 silicon and still lagging, somebody over there is doing something wrong

snhbsqub1y ago

How are they lagging? They are running faster than anyone else at full precision and with many many fewer chips than Groq. Groq is not real.

1 more reply

windowshopping1y ago

Is groq related to Twitter's grok or is that just a very unfortunate naming coincidence?

porphyra1y ago

Unrelated --- groq wrote an angry blog post complaining about Elon's xAI's grok: https://wow.groq.com/hey-elon-its-time-to-cease-de-grok/

BowBun1y ago

Not that angry! I appreciate the tone, though they deserve every right to protect their trademark.

1 more reply

spiderfarmer1y ago

I think groq has more users and a better business model.

snhbsqub1y ago

Groq makes zero revenue, needs hundreds on chips to run 1 model, and runs everything at lower precision. SambaNova has a lot of revenue, and runs at that speed at full precision on a single node. It really isn’t a competition.

verdverm1y ago

Different companies and efforts

Groq - mainly hardware, the LPU (https://wow.groq.com/lpu-inference-engine/)

Grok - Elon's de jour AI endeavor

Me10001y ago

Completely unrelated.

lxgr1y ago

They seem to be unrelated, but sharing an etymology: https://arxiv.org/abs/2201.02177

lxgr1y ago

Classic HN – downvotes without explanation.

I might well be wrong about the etymology here, but I understand "grokking" to be a term for a phenomenon in training neural networks.

What I'm not sure about is which was there first – AI companies called some version of "grok" or that term.

2 more replies

andy_xor_andrew1y ago

When reading Hacker News you develop a signal/noise filter, where lots of headlines make bold claims but you filter them out as embellishment or exaggeration.

My bullshit detector went off when I first saw Groq posted on HN - a startup is making their own chips (doubt) that performs faster than anything Nvidia has for inference (doubt) and accelerates LLMs to hundreds/thousands of tokens per second?? Mega doubt.

But... then I tried their demo, and... yeah, it's that good. Such an amazing company of talented individuals.

saberience1y ago

The issue is that their chips need a huge amount of server blades and there's a big doubt whether this model actually scales. That is, how will Groq handle much larger models with a context of hundreds of thousands or millions of tokens? Right now this would require them to deploy a cluster with thousands of chips, versus 10 chips for say an NVidia system.

The other issue they don't mention is power, space, efficiency etc. We want to run larger models with less power, fewer server blades, at lower cost. Not use more server blades, more chips, more power, etc.

verdverm1y ago

Cerebrus faces similar challenges with their wafer scale chips.

If anything, Google's TPU advancements chart a viable course. I suspect both Groq and Cerebrus will overcome the challenges and offer competitive compute options, depending on the context

1 more reply

frozenport1y ago

8 year old unicorn++ with a public demo sounds credible?

behnamoh1y ago

They're not responsive to my questions on Twitter, so I'm asking here:

    When will Groq support a real API (not experimental beta preview)?

    When will Groq support logprobs?!

    When will Groq actually tell us what their rate limit is?!

Until these aren't answered, many of us can't actually build on Groq.

Edit: It seems I'm getting downvoted by Groq employees...

porphyra1y ago

Try asking in the groq discord [0]. Some groq employees are fairly responsive there.

For groqcloud the rate limits are fairly clear [1]. For example, for llama3-8b-8192 you get 30 requests per minute, 14400 per day, and 30000 tokens per minute. That said, it's the beta free tier so it sometimes goes down randomly and the limits may be different once they start charging for it.

I'm not affiliated with groq but I use groqcloud to make some simple chatbots since it's currently free.

[0] https://discord.com/invite/n8KtCjfAug

[1] https://console.groq.com/settings/limits

j / k navigate · click thread line to collapse

31 comments

LorenDB1y ago

[0]: https://news.ycombinator.com/item?id=40508797

frozenport1y ago

Samba is on gen 4 silicon and still lagging, somebody over there is doing something wrong

snhbsqub1y ago

How are they lagging? They are running faster than anyone else at full precision and with many many fewer chips than Groq. Groq is not real.

1 more reply

windowshopping1y ago

Is groq related to Twitter's grok or is that just a very unfortunate naming coincidence?

porphyra1y ago

Unrelated --- groq wrote an angry blog post complaining about Elon's xAI's grok: https://wow.groq.com/hey-elon-its-time-to-cease-de-grok/

BowBun1y ago

Not that angry! I appreciate the tone, though they deserve every right to protect their trademark.

1 more reply

spiderfarmer1y ago

I think groq has more users and a better business model.

snhbsqub1y ago

verdverm1y ago

Different companies and efforts

Groq - mainly hardware, the LPU (https://wow.groq.com/lpu-inference-engine/)

Grok - Elon's de jour AI endeavor

Me10001y ago

Completely unrelated.

lxgr1y ago

They seem to be unrelated, but sharing an etymology: https://arxiv.org/abs/2201.02177

lxgr1y ago

Classic HN – downvotes without explanation.

I might well be wrong about the etymology here, but I understand "grokking" to be a term for a phenomenon in training neural networks.

What I'm not sure about is which was there first – AI companies called some version of "grok" or that term.

2 more replies

andy_xor_andrew1y ago

When reading Hacker News you develop a signal/noise filter, where lots of headlines make bold claims but you filter them out as embellishment or exaggeration.

But... then I tried their demo, and... yeah, it's that good. Such an amazing company of talented individuals.

saberience1y ago

verdverm1y ago

Cerebrus faces similar challenges with their wafer scale chips.

If anything, Google's TPU advancements chart a viable course. I suspect both Groq and Cerebrus will overcome the challenges and offer competitive compute options, depending on the context

1 more reply

frozenport1y ago

8 year old unicorn++ with a public demo sounds credible?

behnamoh1y ago

They're not responsive to my questions on Twitter, so I'm asking here:

    When will Groq support a real API (not experimental beta preview)?

    When will Groq support logprobs?!

    When will Groq actually tell us what their rate limit is?!

Until these aren't answered, many of us can't actually build on Groq.

Edit: It seems I'm getting downvoted by Groq employees...

porphyra1y ago

Try asking in the groq discord [0]. Some groq employees are fairly responsive there.

I'm not affiliated with groq but I use groqcloud to make some simple chatbots since it's currently free.

[0] https://discord.com/invite/n8KtCjfAug

[1] https://console.groq.com/settings/limits

j / k navigate · click thread line to collapse