undefined | Better HN

0 pointswokwokwok2y ago0 comments

> I can and do!

I’m openly skeptical.

Most examples I’ve seen of this have been frankly rubbish, which has matched my experience closely.

The larger models, like 70B are capable of generating reasonably good structured outputs and some of the smaller ones like codellama are also quite good.

The 7b models are unreliable.

Some trivial tasks (eg. Chatbot) can be done, but most complex tasks (eg. Generating code) require larger models and multiple iterations.

Still, happy to be shown how wrong I am. Post some examples of good stuff you’ve done on /r/localllama

…but so far, beyond porn, the 7B models haven’t impressed me.

Examples that actually do useful things are almost always either a) claimed with no way of verifying or doing it yourself, or b) actually use the openAI API.

That’s been my experience anyway.

I standby what I said: prompt engineering can only take you so far. There’s a quantitative hard limit on what you can do with just a prompt.

Proof: if it was false, you could do what GPT4 does with 10 param model and a good prompt.

You can’t.

0 comments

blagie2y ago

> Proof: if it was false, you could do what GPT4 does with 10 param model and a good prompt.

This is oh so very much a strawman. There is rapid progress in AI. For my domains, the first useful model (without finetuning or additional training) was GPT3, which was released in 2020, and had 175B parameters.

We've had three years of optimization on the models, as well as a lot of progress on how to use them. That means we need fewer parameters today than we did in 2020. That doesn't imply there isn't a hard lower bound somewhere. We just don't know where or what it is.

My expectation is we'll continue to do better and better until, where e.g. a 2030 1B parameter model will be competitive with a 2020 200B parameter model, and a 2030 200B parameter model will be much better than either. After some amount of progress, we'll hit it (or more accurately, asymptotically converge to it).

I don't use local LLMs for coding, but for things related to text (it is a large LANGUAGE model, after all). For that, 7B parameter models became adequate sometime in 2023. For reference, in 2020, they were complete nonsense. You'd get cycles of repeating text, or just lose coherence after a sentence or two.

With my setup, local models aren't anywhere close to fast enough for real-time use. For coding, I need real-time use. It wouldn't surprise me if that domain needed more parameters, just based on what I've seen, but I could be proven wrong. If you buy me an H100, I can experiment with it too. As a footnote, many LARGE models work horribly for coding too; OpenAI did a very good job with GPT there (and I haven't used it enough to know, but I've heard Google did too from people who've used Bard).

j / k navigate · click thread line to collapse