undefined | Better HN

0 pointswouldbecouldbe14d ago0 comments

I think its why its so good; it works on half ass assumptions, poorly written prompts and assumes everything missing.

0 comments

I worked on a project that did fine tuning and RLHF[1] for a major provider, and you would not believe just how utterly broken a large proportion of the prompts (from real users) were. And the project rules required practically reading tea leaves to divine how to give the best response even to prompts that were not remotely coherent human language.

[1] Reinforcement learning from human feedback; basically participants got two model responses and had to judge them on multiple criteria relative to the prompt

redman2514d ago

I feel like the right response for those situations is to start asking questions of the user. It’s what a human would do if they did not understand.

vidarh14d ago

I made the argument multiple times that the right answer to many prompts would be a question, and it was allowed under some rare circumstances, but far too few.

I suspect in part because the provider also didn't want to create an easy cop out for the people working on the fine-tuning part (a lot of my work was auditing and reviewing output, and there was indeed a lot of really sloppy work, up to and including cut and pasting output from other LLMs - we know, because on more than one occasion I caught people who had managed to include part of Claudes website footer in their answer...)

1 more reply

XCSme14d ago

To be honest, I had this "issue" too.

I upgraded to a new model (gpt-4o-mini to grok-4.1-fast), suddenly all my workflows were broken. I was like "this new model is shit!", then I looked into my prompts and realized the model was actually better at following instructions, and my instructions were wrong/contradictory.

After I fixed my prompts it did exactly what I asked for.

Maybe models should have another tuneable parameters, on how well it should respect the user prompt. This reminds me of imagegen models, where you can choose the config/guidance scale/diffusion strength.

j / k navigate · click thread line to collapse

0 comments

vidarh14d ago

[1] Reinforcement learning from human feedback; basically participants got two model responses and had to judge them on multiple criteria relative to the prompt

redman2514d ago

I feel like the right response for those situations is to start asking questions of the user. It’s what a human would do if they did not understand.

vidarh14d ago

I made the argument multiple times that the right answer to many prompts would be a question, and it was allowed under some rare circumstances, but far too few.

1 more reply

XCSme14d ago

To be honest, I had this "issue" too.

After I fixed my prompts it did exactly what I asked for.

j / k navigate · click thread line to collapse