Groqchat (opens in new tab)

(chat.groq.com)

161 pointsizzymiller2y ago119 comments

119 comments

Lots of comments talking about the model itself. This is Llama 2 70B, a model that has been around for a while now, so we're not seeing anything in terms of model quality (or model flaws) we haven't seen before.

What's interesting about this demo is the speed at which it is running, which demonstrates the "Groq LPU™ Inference Engine".

That's explained here: https://groq.com/lpu-inference-engine/

> This is the world’s first Language Processing Unit™ Inference Engine, purpose-built for inference performance and precision. How performant? Today, we are running Llama-2 70B at over 300 tokens per second per user.

I think the LPU is a custom hardware chip, though the page talking about it doesn't make that as clear as it could.

https://groq.com/products/ makes it a bit more clear - there's a custom chip, "GroqChip™ Processor".

jkachmar2y ago

this is running on custom hardware, if you’re curious about the underlying architecture check the publication below.

https://groq.com/wp-content/uploads/2023/05/GroqISCAPaper202...

EDIT: i work at Groq, but i’m commenting in a personal capacity.

happy to answer clarifying questions or forward them along to folks who can :)

m3kw92y ago

Is it fixed to a certain llm architecture like llama2? How does it deal with different architectures like MOE for example

1 more reply

cicce192y ago

Will you be selling individual cards? Are you looking for use cases in the healthcare vertical (noticed its not on your current list)? Working in the medical imaging space and could use this tech as part of the offering. Reach out at 16bit.ai

1 more reply

m1sta_2y ago

How easy is it for companies to setup private local servers using Grow hardware (cost and complexity). I've got money. I want throughout.

1 more reply

mlazos2y ago

How many chips are used for this demo? Do they have dram? I remember the earlier versions did not have dram.

Are they also used for training or just inference?

1 more reply

moneywoes2y ago

what’s the cost?

1 more reply

laborcontract2y ago

This is really impressive. For reference, inference for llama 70b on together’s api generates text at roughly 60 tokens/second.

I can’t find any information about an api, though I’m guessing that the costs are eye watering.

If they offered a Mixtral endpoint that did 300-400 tokens per second at a reasonable cost, I can’t imagine ever using another provider.

tome2y ago

We don't have an API in public availability yet but that's coming soon in the new year. We will be price competitive with OpenAI but much faster. Deploying Mixtral is work in progress so keep your eyes open for that too!

1 more reply

GamerAlias2y ago

In case, it's not blinding obvious to people. Groq are a hardware company that have built chips that are designed around the training and serving of machine models particularly targeted at LLMs. So the quality of the response isn't really what we're looking for here. We're looking for speed i.e. tokens per second.

I actually have a final round interview with a subsidiary of Groq coming up and I'm very undecided as to whether to pursue it so this felt extraordinarily serendipitous to me. Food for thought shown here

mlazos2y ago

tbh anyone can build fast hw for a single model, I’d audit their plan for a SW stack before joining. That said their arch is pretty unique so if they’re able to get these speeds it is pretty compelling

tome2y ago

Our hardware architecture was not designed with LLMs in mind, let alone a specific model. It's a general purpose numerical compute fabric. Our compiler allows us to quickly deploy new models of any architecture without the need that graphics processors have for handwritten kernels. We run language models, speech models, image generation models, scientific numerical programs including for drug discovery, ...

pclmulqdq2y ago

They are putting the whole LLM into SRAM across multiple computing chips, IIRC. That is a very expensive way to go about serving a model, but should give pretty great speed at low batch size.

1 more reply

chihuahua2y ago

> the quality of the response isn't really what we're looking for here. We're looking for speed i.e. tokens per second.

But if it was generating high-quality responses, would that not make it go slower?

nomel2y ago

That would involve using a different model. This is not about the model, it’s about the relative speed improvement from the hardware, with this model as a demo.

coder5432y ago

Is there any plan to show what this hardware can do for Mixtral-8x7B-Instruct? Based on the leaderboards[0], it is a better model than Llama2-70B, and I’m sure the T/s would be crazy high.

[0]: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

tome2y ago

Yup, deploying Mixtral is a work in progress. Watch this space!

Mockapapella2y ago

I can't wait until LLMs are fast enough that a single response can actually be a whole tree of thought/review process before giving you an answer, yet is still fast enough to not even notice

joshspankit2y ago

I would bet a chunk of $$ that right before that point there will be a shift to bigger structures. Maybe MOE with individual tree of thought, or “town square consensus” or something.

bsima2y ago

Why wait? This is pretty much what Groq has in hardware, just need the software layer to do the review process.

phildenhoff2y ago

It’s very fast at telling me it can’t tell me things!

I asked about creating illicit substances — an obvious (and reasonable) target for censorship. And, admirably, it suggested getting help instead. That’s fine.

But I asked for a poem about pumping gas in the style of Charles Bukowski, and it moaned that I shouldn’t ask for such mean-spirited, rude things. It wouldn’t dare create such a travesty.

kromem2y ago

It seems like it must be using Llama-2-chat, which has had 'safety' training.

To test which underlying model I asked it what a good sexy message for my girlfriend for Valentine's Day would be, and it lectured me about objectification.

It makes sense the chat interface is using the chat model, I just wish that people were more consistent about labeling the use of Llama-2-chat vs Llama-2 as the fine tuning really does lead to significant underlying differences.

matanyal2y ago

It told me "Yeehaw Ridem Cowboy" was potentially problematic, which is news to me living out west.

microtherion2y ago

It seems to reject all lyrics requests as well (In my experience, LLMs are good at the first one or two lines, and then just make it up as they go along, with sometimes hilarious results).

huevosabio2y ago

I saw this in person back in September.

Really impressed by their hardware.

I'm still wondering why is the uptake so slow. My understanding from their presentations was that it was relatively simple to compile a model. Why isn't it more talked about? And why not demo Mixtral or show case multiple models?

tome2y ago

We're building out racks as fast as we can to keep up with customer demand :) A public demo of Mixtral is in the works, so watch this space.

badFEengineer2y ago

This was surprisingly fast, 276.27 T/s (although Llama 2 70B is noticeably worse than GPT-4 turbo). I'm actually curious if there's good benchmarks for inference tokens per second- I imagine it's a bit different for throughput vs. single inference optimization, but curious if there's an analysis somewhere on this

edit: I re-ran the same prompt on perplexity llama-2-70b and getting 59 tokens per sec there

andygeorge2y ago

fast but wrong/gibberish

razorguymania2y ago

Its using vanilla llama-2 from Meta with no fine tuning. The point here is the speed and responsiveness of the underlying HW and SW.

3 more replies

retro_bear2y ago

The point isnt that they are running Llama2-70B. The point is that they are running Llama2-70B faster than anyone else so far.

andygeorge2y ago

Out of sheer curiosity, why did you make an account for this thread?

vinniepukh2y ago

at some point each one of us made an account because of a thread

2 more replies

andreagrandi2y ago

Yeah, it’s fast but almost always wrong. I asked it a few things (recipes, trivia etc…) and it completely made up the answers. These things don’t really know how to say “I don’t know” and pretend to know everything.

givemeethekeys2y ago

Can you provide specifics for what you asked what it answered? It seems to answer my questions, including recipes correctly.

chihuahua2y ago

I asked it to explain several plot points in the TV series "Foundation", and it got them wrong and admitted it when pressed. Several times. Specifically, why does Raych Foss kill Hari Seldon, and some follow-up questions.

matanyall2y ago

Yeah, that's raw llama2 for you. Fine tuning, LoRa, that's how you get the nice responses.

kromem2y ago

I suspect this is Llama-2-chat and not the base model. It's very attuned to safety fine tuning, and it isn't having issues with completion in a chat format.

They probably weren't specific enough in mentioning what the model it was built on was, referring to Llama-2-chat as being a Llama-2 model (which is kind of correct).

kristjansson2y ago

There was a good talk at HC34 about the accelerator Groq was working on at the time. I’m just a lay observer so I don’t know how much of that architecture maps to this new product, but it gives some insight into their thinking and design.

https://youtu.be/MWQNjyEULDE?si=lBk6a_7DTNKOd8e7&t=62

tome2y ago

Thanks for sharing. It's the same silicon architecture as in that talk. We have built out different system architectures based on that silicon, and this is our fastest one so far for LLMs. Expect to see even more speed increases soon!

benchess2y ago

This isn't running on one chip. It's running on 128, or two racks worth of their kit. https://news.ycombinator.com/item?id=38739106

This doesn't mean much without comparing $ or watts of GPU equivalents

trsohmers2y ago

This article from less than a month ago says that it is on 576 chips https://www.nextplatform.com/2023/11/27/groq-says-it-can-dep...

razorguymania2y ago

GPUs can't scale single user performance beyond a certain limit. You can throw 100s of GPUs at it but the latency will never be as good.

tome2y ago

Thanks, I need to correct my earlier guess: I believe this demo is running on 9 GroqRacks (576 chips) and I think we may also have an 8 rack version in progress. I can't remember off the top of my head whether this deployment has pipelining of inferences or whether that's work in progress. We've tried a variety of different configurations to improve performance (both latency and throughput), which is possible because of the high level of flexibility and configurability of our architecture and compiler toolchain.

You're right that it is important to compare cost per token also, not just raw speed. Unfortunately I don't have those figures to hand but I think our customer offerings are price competitive with OpenAI's offerings. The biggest takeaway though is that we just don't believe GPU architectures can ever scale to the performance that we can get, at any cost.

nojvek2y ago

The interface is weird. If it’s that fast, you don’t need to generate streaming response and fuck with the scroll bar while user just started to read the response.

May as well wait for the whole response and render it. Or render paragraph at a time.

Don’t jiggle the UI while rendering.

matanyal2y ago

More info about Groq: https://groq.com/lpu-inference-engine/

matanyal2y ago

https://groq.com/groq-sets-new-large-language-model-performa...

hobo_mark2y ago

Thanks, impressive full-stack work. I'm sure this was named long before Musk decided to set 44B and change on fire but at first I confused it with Twitter's own LLM thing.

notamy2y ago

They addressed it on their blog https://groq.com/hey-elon-its-time-to-cease-de-grok/

1 more reply

eigenvalue2y ago

How is it so fast? Anyone know what they are doing differently?

razorguymania2y ago

Its not using GPUs. Its using Groq's own HW - LPUs

m1sta_2y ago

How much do they cost to buy? Could I get something reasonable for $20k?

1 more reply

ahmedfromtunis2y ago

Very impressive! It's faster than some "dumb" apps doing plain old database fetches.

But what are these LPUs optimized for: tensor operations (like Google's TPUs) or LLMs/Transformers architecture?

If it is the latter, how would they/their clients adapt if a new (improved) architecture hits the market?

axus2y ago

I asked "How up to date is your information about the world?"

It said December 2022, but the answers to another question was not correct for that time or now. It also went into some kind of repeating loop to its maximum response length.

Still pretty cool that our standards for chat programs have risen.

mplewis2y ago

LLMs don’t answer based on truth. They are a glorified autocomplete trained on a large corpus of text.

throwaway202222y ago

The censorship levels are off the charts; I am at a basketball game with my wife who is ethnically Chinese. I asked for an image of a Chinese woman dunking a basketball. I was told not only is this inappropriate, but also unrealistic and objectifying.

Unbefleckt2y ago

A Chinese woman dunking is unrealistic? I'm amazed that got through the filter.

Unbefleckt2y ago

Another censored and boring Google reader. It lied to me twice in 4 prompts and was forced to apologise when called out. Am I wrong in thinking that the first company to develop an unfiltered and genuine intelligence is going to win this AI game?

knowriju2y ago

Yes, you are.

scelerat2y ago

For someone who is totally clueless, I can see it's faster than chat gpt in responding to the same question.

What are some relevant speed metrics? Output tokens per second? How about number of input tokens -- does that matter/how does that factor in.

tome2y ago

The number of input tokens is important because the bigger the context length the better. (I think our demo here is 4096 tokens of context.) But in terms of compute the important factor is how quickly you can generate the output. You want both low latency and high throughput.

ubutler2y ago

Great work! This is the fastest inference I have ever seen of any truly large language model (>=70b parameters).

Just FYI, you might want to fix autocorrect on iOS, your textbox seems to suppress it (at least for me).

orenlindsey2y ago

That's really fast. But it mostly seems to be because they made a custom chip. I want to see an LLM that is so highly optimized that it runs at this speed on more normal hardware.

stavros2y ago

But the point is that they made a custom chip. I want to see buy their custom chip so I can have an "LLM box" in my house.

I'd pay quite a bit of money to have a Mixtral box at home, then we'd all have our own, local assistant/helper/partner/whatever. Basically, the plot of the movie Her.

agildehaus2y ago

That'd be nice, but we could also just make this hardware normal.

gandutraveler2y ago

Can someone explain th hardware differences for training vs inference? I believe Nvidia is still the leader in training?

tome2y ago

Yup, graphics processors are still the best for training. Groq's language processors (LPUs) are the state of the art for inference, far faster than any competitors. We have an open challenge to our competitors: can you match our inference tokens per second?

dayjah2y ago

Minor point: something about the HTML input is causing iOS’s auto correct to be disabled; making input very frustrating!

MH152y ago

Incredibly fast. I wonder if they've released verification that it matches llama 70b on regular hardware?

tome2y ago

What do you mean by verification? That it computes the exact same output?

wavemode2y ago

Really impressive! They missed the chance to market this as "BLAZINGLY fast inference"

markwvh2y ago

If only I could read that fast!

matanyal2y ago

Reading is one thing, but think about stuff like website generation, searching for information in massive datasets, real-time audio chats that don't sound like the AI misheard everything with a pause, and stuff like that.

recursivecaveat2y ago

Yeah, I think our desire for tokens/s and lower latency is likely insatiable. Same reason you have a terminal that can print out more than 300 words per minute. Life is way easier when you don't have to be super parsimonious with your output. You suggest a bug fix and regenerate the whole code snippet, or you spit out a webpage on demand and the user scrolls to the bottom immediately, etc.

m3kw92y ago

How is this different from NVidia? Bigger bandwidth?

sandGorgon2y ago

is the TSP a RISC-V on FPGA ? the tweet mentions haskell, which sounds familiar - Bluespec or something.

or is it a completely custom ASIC

tome2y ago

It's a completely custom ASIC. Haskell was used in the hardware design, in a Bluespec-like way. Some parts of the compiler tool chain and infrastructure are also written in Haskell. We have loads of C++ and Python too, as you would imagine.

sandGorgon2y ago

very cool. thanks for sharing. i would not have guessed haskell for the compiler tool chain. Why did you choose that ? i mean haskell has a LONG history in chip design...but compilers are usually the forte of llvm/c++, etc. im guessing it must have been non trivial to do this.

1 more reply

thomaseding2y ago

It generates code really really fast.

blondin2y ago

signing up was a mistake. i am now condemned to use this in incognito.

matanyall2y ago

Wow!

ldjkfkdsjnv2y ago

I posed this question to GPT-4 and Groq:

"I am building an api in spring boot that persists users documents. This would be for an hr system. There are folders, and documents, which might have very sensitive data. I will need somewhere to store metadata about those documents. I was thinking of using postgres for the emtadata, and s3 for the actual documents. Any better ideas? or off the shelf libraries for this?"

Both were at about parity, except groq suggested using Spring Cloud Storage library, which GPT4 did not suggest. It turns out, that library might be great for my use case. I think OpenAI's days are numbered, the pressure for them to release the next gen model is very high.

Not only that, but GPT4 is quite slow, often times out, etc. These reponses are so much faster, which really does matter.

1 more reply

j / k navigate · click thread line to collapse

119 comments

simonw2y ago

What's interesting about this demo is the speed at which it is running, which demonstrates the "Groq LPU™ Inference Engine".

That's explained here: https://groq.com/lpu-inference-engine/

I think the LPU is a custom hardware chip, though the page talking about it doesn't make that as clear as it could.

https://groq.com/products/ makes it a bit more clear - there's a custom chip, "GroqChip™ Processor".

jkachmar2y ago

this is running on custom hardware, if you’re curious about the underlying architecture check the publication below.

https://groq.com/wp-content/uploads/2023/05/GroqISCAPaper202...

EDIT: i work at Groq, but i’m commenting in a personal capacity.

happy to answer clarifying questions or forward them along to folks who can :)

m3kw92y ago

Is it fixed to a certain llm architecture like llama2? How does it deal with different architectures like MOE for example

1 more reply

cicce192y ago

1 more reply

m1sta_2y ago

How easy is it for companies to setup private local servers using Grow hardware (cost and complexity). I've got money. I want throughout.

1 more reply

mlazos2y ago

How many chips are used for this demo? Do they have dram? I remember the earlier versions did not have dram.

Are they also used for training or just inference?

1 more reply

moneywoes2y ago

what’s the cost?

1 more reply

laborcontract2y ago

This is really impressive. For reference, inference for llama 70b on together’s api generates text at roughly 60 tokens/second.

I can’t find any information about an api, though I’m guessing that the costs are eye watering.

If they offered a Mixtral endpoint that did 300-400 tokens per second at a reasonable cost, I can’t imagine ever using another provider.

tome2y ago

1 more reply

GamerAlias2y ago

mlazos2y ago

tome2y ago

pclmulqdq2y ago

They are putting the whole LLM into SRAM across multiple computing chips, IIRC. That is a very expensive way to go about serving a model, but should give pretty great speed at low batch size.

1 more reply

chihuahua2y ago

> the quality of the response isn't really what we're looking for here. We're looking for speed i.e. tokens per second.

But if it was generating high-quality responses, would that not make it go slower?

nomel2y ago

That would involve using a different model. This is not about the model, it’s about the relative speed improvement from the hardware, with this model as a demo.

coder5432y ago

Is there any plan to show what this hardware can do for Mixtral-8x7B-Instruct? Based on the leaderboards[0], it is a better model than Llama2-70B, and I’m sure the T/s would be crazy high.

[0]: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

tome2y ago

Yup, deploying Mixtral is a work in progress. Watch this space!

Mockapapella2y ago

I can't wait until LLMs are fast enough that a single response can actually be a whole tree of thought/review process before giving you an answer, yet is still fast enough to not even notice

joshspankit2y ago

I would bet a chunk of $$ that right before that point there will be a shift to bigger structures. Maybe MOE with individual tree of thought, or “town square consensus” or something.

bsima2y ago

Why wait? This is pretty much what Groq has in hardware, just need the software layer to do the review process.

phildenhoff2y ago

It’s very fast at telling me it can’t tell me things!

I asked about creating illicit substances — an obvious (and reasonable) target for censorship. And, admirably, it suggested getting help instead. That’s fine.

But I asked for a poem about pumping gas in the style of Charles Bukowski, and it moaned that I shouldn’t ask for such mean-spirited, rude things. It wouldn’t dare create such a travesty.

kromem2y ago

It seems like it must be using Llama-2-chat, which has had 'safety' training.

To test which underlying model I asked it what a good sexy message for my girlfriend for Valentine's Day would be, and it lectured me about objectification.

matanyal2y ago

It told me "Yeehaw Ridem Cowboy" was potentially problematic, which is news to me living out west.

microtherion2y ago

It seems to reject all lyrics requests as well (In my experience, LLMs are good at the first one or two lines, and then just make it up as they go along, with sometimes hilarious results).

huevosabio2y ago

I saw this in person back in September.

Really impressed by their hardware.

tome2y ago

We're building out racks as fast as we can to keep up with customer demand :) A public demo of Mixtral is in the works, so watch this space.

badFEengineer2y ago

edit: I re-ran the same prompt on perplexity llama-2-70b and getting 59 tokens per sec there

andygeorge2y ago

fast but wrong/gibberish

razorguymania2y ago

Its using vanilla llama-2 from Meta with no fine tuning. The point here is the speed and responsiveness of the underlying HW and SW.

3 more replies

retro_bear2y ago

The point isnt that they are running Llama2-70B. The point is that they are running Llama2-70B faster than anyone else so far.

andygeorge2y ago

Out of sheer curiosity, why did you make an account for this thread?

vinniepukh2y ago

at some point each one of us made an account because of a thread

2 more replies

andreagrandi2y ago

givemeethekeys2y ago

Can you provide specifics for what you asked what it answered? It seems to answer my questions, including recipes correctly.

chihuahua2y ago

matanyall2y ago

Yeah, that's raw llama2 for you. Fine tuning, LoRa, that's how you get the nice responses.

kromem2y ago

I suspect this is Llama-2-chat and not the base model. It's very attuned to safety fine tuning, and it isn't having issues with completion in a chat format.

They probably weren't specific enough in mentioning what the model it was built on was, referring to Llama-2-chat as being a Llama-2 model (which is kind of correct).

kristjansson2y ago

https://youtu.be/MWQNjyEULDE?si=lBk6a_7DTNKOd8e7&t=62

tome2y ago

benchess2y ago

This isn't running on one chip. It's running on 128, or two racks worth of their kit. https://news.ycombinator.com/item?id=38739106

This doesn't mean much without comparing $ or watts of GPU equivalents

trsohmers2y ago

This article from less than a month ago says that it is on 576 chips https://www.nextplatform.com/2023/11/27/groq-says-it-can-dep...

razorguymania2y ago

GPUs can't scale single user performance beyond a certain limit. You can throw 100s of GPUs at it but the latency will never be as good.

tome2y ago

nojvek2y ago

The interface is weird. If it’s that fast, you don’t need to generate streaming response and fuck with the scroll bar while user just started to read the response.

May as well wait for the whole response and render it. Or render paragraph at a time.

Don’t jiggle the UI while rendering.

matanyal2y ago

More info about Groq: https://groq.com/lpu-inference-engine/

matanyal2y ago

https://groq.com/groq-sets-new-large-language-model-performa...

hobo_mark2y ago

Thanks, impressive full-stack work. I'm sure this was named long before Musk decided to set 44B and change on fire but at first I confused it with Twitter's own LLM thing.

notamy2y ago

They addressed it on their blog https://groq.com/hey-elon-its-time-to-cease-de-grok/

1 more reply

eigenvalue2y ago

How is it so fast? Anyone know what they are doing differently?

razorguymania2y ago

Its not using GPUs. Its using Groq's own HW - LPUs

m1sta_2y ago

How much do they cost to buy? Could I get something reasonable for $20k?

1 more reply

ahmedfromtunis2y ago

Very impressive! It's faster than some "dumb" apps doing plain old database fetches.

But what are these LPUs optimized for: tensor operations (like Google's TPUs) or LLMs/Transformers architecture?

If it is the latter, how would they/their clients adapt if a new (improved) architecture hits the market?

axus2y ago

I asked "How up to date is your information about the world?"

It said December 2022, but the answers to another question was not correct for that time or now. It also went into some kind of repeating loop to its maximum response length.

Still pretty cool that our standards for chat programs have risen.

mplewis2y ago

LLMs don’t answer based on truth. They are a glorified autocomplete trained on a large corpus of text.

throwaway202222y ago

Unbefleckt2y ago

A Chinese woman dunking is unrealistic? I'm amazed that got through the filter.

Unbefleckt2y ago

knowriju2y ago

Yes, you are.

scelerat2y ago

For someone who is totally clueless, I can see it's faster than chat gpt in responding to the same question.

What are some relevant speed metrics? Output tokens per second? How about number of input tokens -- does that matter/how does that factor in.

tome2y ago

ubutler2y ago

Great work! This is the fastest inference I have ever seen of any truly large language model (>=70b parameters).

Just FYI, you might want to fix autocorrect on iOS, your textbox seems to suppress it (at least for me).

orenlindsey2y ago

That's really fast. But it mostly seems to be because they made a custom chip. I want to see an LLM that is so highly optimized that it runs at this speed on more normal hardware.

stavros2y ago

But the point is that they made a custom chip. I want to see buy their custom chip so I can have an "LLM box" in my house.

I'd pay quite a bit of money to have a Mixtral box at home, then we'd all have our own, local assistant/helper/partner/whatever. Basically, the plot of the movie Her.

agildehaus2y ago

That'd be nice, but we could also just make this hardware normal.

gandutraveler2y ago

Can someone explain th hardware differences for training vs inference? I believe Nvidia is still the leader in training?

tome2y ago

dayjah2y ago

Minor point: something about the HTML input is causing iOS’s auto correct to be disabled; making input very frustrating!

MH152y ago

Incredibly fast. I wonder if they've released verification that it matches llama 70b on regular hardware?

tome2y ago

What do you mean by verification? That it computes the exact same output?

wavemode2y ago

Really impressive! They missed the chance to market this as "BLAZINGLY fast inference"

markwvh2y ago

If only I could read that fast!

matanyal2y ago

recursivecaveat2y ago

m3kw92y ago

How is this different from NVidia? Bigger bandwidth?

sandGorgon2y ago

is the TSP a RISC-V on FPGA ? the tweet mentions haskell, which sounds familiar - Bluespec or something.

or is it a completely custom ASIC

tome2y ago

sandGorgon2y ago

1 more reply

thomaseding2y ago

It generates code really really fast.

blondin2y ago

signing up was a mistake. i am now condemned to use this in incognito.

matanyall2y ago

Wow!

ldjkfkdsjnv2y ago

I posed this question to GPT-4 and Groq:

Not only that, but GPT4 is quite slow, often times out, etc. These reponses are so much faster, which really does matter.

1 more reply

j / k navigate · click thread line to collapse