Groq - mainly hardware, the LPU (https://wow.groq.com/lpu-inference-engine/)
Grok - Elon's de jour AI endeavor
I might well be wrong about the etymology here, but I understand "grokking" to be a term for a phenomenon in training neural networks.
What I'm not sure about is which was there first – AI companies called some version of "grok" or that term.
My bullshit detector went off when I first saw Groq posted on HN - a startup is making their own chips (doubt) that performs faster than anything Nvidia has for inference (doubt) and accelerates LLMs to hundreds/thousands of tokens per second?? Mega doubt.
But... then I tried their demo, and... yeah, it's that good. Such an amazing company of talented individuals.
The other issue they don't mention is power, space, efficiency etc. We want to run larger models with less power, fewer server blades, at lower cost. Not use more server blades, more chips, more power, etc.
If anything, Google's TPU advancements chart a viable course. I suspect both Groq and Cerebrus will overcome the challenges and offer competitive compute options, depending on the context
When will Groq support a real API (not experimental beta preview)?
When will Groq support logprobs?!
When will Groq actually tell us what their rate limit is?!
Until these aren't answered, many of us can't actually build on Groq.Edit: It seems I'm getting downvoted by Groq employees...
For groqcloud the rate limits are fairly clear [1]. For example, for llama3-8b-8192 you get 30 requests per minute, 14400 per day, and 30000 tokens per minute. That said, it's the beta free tier so it sometimes goes down randomly and the limits may be different once they start charging for it.
I'm not affiliated with groq but I use groqcloud to make some simple chatbots since it's currently free.