I’m openly skeptical.
Most examples I’ve seen of this have been frankly rubbish, which has matched my experience closely.
The larger models, like 70B are capable of generating reasonably good structured outputs and some of the smaller ones like codellama are also quite good.
The 7b models are unreliable.
Some trivial tasks (eg. Chatbot) can be done, but most complex tasks (eg. Generating code) require larger models and multiple iterations.
Still, happy to be shown how wrong I am. Post some examples of good stuff you’ve done on /r/localllama
…but so far, beyond porn, the 7B models haven’t impressed me.
Examples that actually do useful things are almost always either a) claimed with no way of verifying or doing it yourself, or b) actually use the openAI API.
That’s been my experience anyway.
I standby what I said: prompt engineering can only take you so far. There’s a quantitative hard limit on what you can do with just a prompt.
Proof: if it was false, you could do what GPT4 does with 10 param model and a good prompt.
You can’t.