Thefastest.ai (opens in new tab)

(thefastest.ai)

60 pointszkoch2y ago32 comments

32 comments

Groq with llama3 70b is so fast and good enough for what we do (source code stuff) that it’s really quite painful to work with most others now. We replaced most our internal integrations with this and everything is great so far. I guess they will be bought soon?

BoorishBears2y ago

How is that possible with Groq rate limits?

Fireworks has Llama 3 for the same effective speed with much more realistic rate limits (and billing)

anonzzzies2y ago

Groq relaxes the rate limits if you work 'closer' with them.

ilaksh2y ago

how do you work closer? we filled out the enterprise form and I DMd a mod in the Discord..

m3kw92y ago

What do you guys do?

saltsaman2y ago

Couple of things:

1. Filtering by model should be enabled by default. Mixtral-8x7b-instruct on Perplexity is almost as fast as the 7B Llama 2 on fireworks, but are quite different in sizes.

2. Pricing is a very important factor that is not included.

3. Overall service reliability should also be an important signal.

juberti2y ago

Can you describe what you'd like to see for #1? We currently show everything, but let people filter via the UI or URL param, e.g., https://thefastest.ai/?mf=3-70

passion__desire2y ago

I don't understanding why would we need to having similar expectations from systems that we have from humans and building a whole theory on it. I can adjust my behaviour around systems. I am not restricted to operate within default values. e.g Whenever a price is listed as $99, I automatically know it is $100. Marketing gimmicks don't work once you know about them or in other words, expectations can be set in a new environment.

fragmede2y ago

Marketing gimmicks absolutely still work even if you know about them because they take advantage of basic human psychology so when you're tired/hungry/sleepy or otherwise not operating at peak performance, your lizard brain/autopilot takes over and you choose what's been chosen for you.

passion__desire2y ago

There was a story where users complained that a particular process is taking lot of time. He coded a progress bar and all the complaints disappeared because it set their expectations and it was visible.

pants22y ago

Another good resource: https://artificialanalysis.ai/

Gcam2y ago

this page is probably our most comparable to thefastest (which is cool, more benchmarks is better): https://artificialanalysis.ai/leaderboards/providers

We also have pricing, long/medium/short prompt lengths (decode time can vary between providers) & parallel query benchmarking + model details (ctx window, etc)

pants22y ago

I'd be interested to hear how Llama 8B with long chain-of-thought prompts compares to GPT-4 one-shot prompts for real-world tasks.

In classification for example, you could ask Llama 8B to reason through each possibility, rank them, rate them, make counterarguments, etc. - all in the same time that GPT-4 would take to output one classification without reasoning. Which does better?

harrisoned2y ago

I did that with Llama 3 8B with some stuff i could think of, and it did very good. It was on par with GPT4. I prompted some scenarios and asked it to use CoT. Scenarios like "i was standing and eating chocolate, and it melted. Will i find chocolate at my feet?", and the reasoning was pretty good.

But there was something it did way better than GPT4. I asked to create 10 phrases where the last word was an animal, excluding equines, and in alphabetical order. GPT3.5 and GPT4 aren't able to follow such instructions, but the 8b model did it with maestry.

rgbrgb2y ago

Good idea, that could make for a pretty interesting eval. It's similar to a timed test... we don't really care how long it takes or how much scratch paper you needed as long as you deliver the correct answer within the time limit.

pants22y ago

There are dozens of AI chip startups out there with wild claims about speed. Groq seems like the first to actually prove it by launching a product. I hope they spur a speed war with other chipmakers to make the fastest inference engine.

geor9e2y ago

I love this. Latency is the worst part about AI. I use the lowest latency models that give adequate answers. I do wish this site gave an average and standard deviation.For example Groq fluctuates wildly, depending of the time of day. They're ranked pretty poorly at "610ms" here, and I definitely encounter far worse from them sometimes, but it's wicked fast at other times.

ankerbachryhl2y ago

Been looking a lot for a simple overview like this, I’ve spent too much time benchmarking models/regions myself. Thank you for creating!

akozak2y ago

It'd be nice to have a similar site but cost per token.

zkochOP2y ago

This is a great idea. We'll add it.

akozak2y ago

Probably highly volume dependent, but still useful!

cedws2y ago

Any idea which one Copilot uses? I'm interested in exploring ways to get down autocomplete suggestion latency.

sva_2y ago

I think Github Copilot itself is GPT-3.5, but Copilot Chat is GPT-4.

jxy2y ago

No prompt length? For practical purposes, the prompt processing time would far more important.

juberti2y ago

We're going to add a selector to choose prompt size (and multimedia content in the prompt)

tikkun2y ago

I wish it also included latency of speech to text and text to speech APIs :)

CharlesW2y ago

Groq really has an unfortunate name. (I assume they had theirs before Grok.)

kevindamm2y ago

The spelling with a 'k' is more canon (referring to the term from Heinlein) and that was the spelling in the tech culture that borrowed it... what is the reason for choosing a 'q' in theirs, do you know?

CharlieDigital2y ago

I like the "q" as in "query" or "question"; seems a fitting homophone.

jsheard2y ago

Yeah they're not happy about that

https://wow.groq.com/hey-elon-its-time-to-cease-de-grok/

legohead2y ago

I'll act as official mediator. Conclusion: both should change their name

basil-rash2y ago

Pretty silly to name your company a common word directly related to the product, then get upset at others using that same word for their product. It’s like if Grindr made angle grinders then got mad at a different company releasing an angle grinder they called “Grinder”.

j / k navigate · click thread line to collapse

32 comments

anonzzzies2y ago

BoorishBears2y ago

How is that possible with Groq rate limits?

Fireworks has Llama 3 for the same effective speed with much more realistic rate limits (and billing)

anonzzzies2y ago

Groq relaxes the rate limits if you work 'closer' with them.

ilaksh2y ago

how do you work closer? we filled out the enterprise form and I DMd a mod in the Discord..

m3kw92y ago

What do you guys do?

saltsaman2y ago

Couple of things:

1. Filtering by model should be enabled by default. Mixtral-8x7b-instruct on Perplexity is almost as fast as the 7B Llama 2 on fireworks, but are quite different in sizes.

2. Pricing is a very important factor that is not included.

3. Overall service reliability should also be an important signal.

juberti2y ago

Can you describe what you'd like to see for #1? We currently show everything, but let people filter via the UI or URL param, e.g., https://thefastest.ai/?mf=3-70

passion__desire2y ago

fragmede2y ago

passion__desire2y ago

pants22y ago

Another good resource: https://artificialanalysis.ai/

Gcam2y ago

this page is probably our most comparable to thefastest (which is cool, more benchmarks is better): https://artificialanalysis.ai/leaderboards/providers

We also have pricing, long/medium/short prompt lengths (decode time can vary between providers) & parallel query benchmarking + model details (ctx window, etc)

pants22y ago

I'd be interested to hear how Llama 8B with long chain-of-thought prompts compares to GPT-4 one-shot prompts for real-world tasks.

harrisoned2y ago

rgbrgb2y ago

pants22y ago

geor9e2y ago

ankerbachryhl2y ago

Been looking a lot for a simple overview like this, I’ve spent too much time benchmarking models/regions myself. Thank you for creating!

akozak2y ago

It'd be nice to have a similar site but cost per token.

zkochOP2y ago

This is a great idea. We'll add it.