undefined | Better HN

0 pointskapp_in_life2y ago0 comments

Likely trained on internal code.

0 comments

That model is trained on synthetically AI-generated code, not internal code.

It suggests that synthetic training could be the future in increasing capability of smaller models (and perhaps bigger ones too). AI will train AI.

haldujai2y ago

I thought this specific model was referring to self-instruction using both synthetic prompts (generated from few-shot in-context prompting of presumably some OpenAI model, the original paper used text-davinci-002) as well as synthetic code (presumably Code Llama 7 like for self-instruct) subsequently validated with execution?

The differences being it's not just training on unvalidated synthetic data and this specific method (per the unnatural questions paper) results in increased instruction diversity which confers some added advantage and I'm assuming explains the performance gain over the also synthetic self-instruct code?

I may be misunderstanding but this seems more nuanced than just training on synthetically AI-generated code and is more validating of synthetic instructions (i.e. low resource setting) rather than synthetic code (i.e. high resource setting).

sroussey2y ago

That is the basis for https://synthesis.ai/

SubiculumCode2y ago

I'm an amateur, but it seems to me that methods to synthesize will have to be distinct from methods of the generative model.

j / k navigate · click thread line to collapse