undefined | Better HN

0 pointskarpathy9mo ago0 comments

I kind of say it in words (agreeing with you) but I agree the versioning is a bit confusing analogy because it usually additionally implies some kind of improvement. When I’m just trying to distinguish them as very different software categories.

0 comments

miki1232119mo ago

What do you think about structured outputs / JSON mode / constrained decoding / whatever you wish to call it?

To me, it's a criminally underused tool. While "raw" LLMs are cool, they're annoying to use as anything but chatbots, as their output is unpredictable and basically impossible to parse programmatically.

Structured outputs solve that problem neatly. In a way, they're "neural networks without the training". They can be used to solve similar problems as traditional neural networks, things like image classification or extracting information from messy text, but all they require is a Zod or Pydantic type definition and a prompt. No renting GPUs, labeling data and tuning hyperparameters necessary.

They often also improve LLM performance significantly. Imagine you're trying to extract calories per 100g of product, but some product give you calories per serving and a serving size, calories per pound etc. The naive way to do this is a prompt like "give me calories per 100g", but that forces the LLM to do arithmetic, and LLMs are bad at arithmetic. With structured outputs, you just give it the fifteen different formats that you expect to see as alternatives, and use some simple Python to turn them all into calories per 100g on the backend side.

abdullin9mo ago

Even more than that. With Structured Outputs we essentially control layout of the response, so we can force LLM to go through different parts of the completion in a predefined order.

One way teams exploit that - force LLM to go through a predefined task-specific checklist before answering. This custom hard-coded chain of thought boosts the accuracy and makes reasoning more auditable.

solaire_oa9mo ago

I also think that structured outputs are criminally underused, but it isn't perfect... and per your example, it might not even be good, because I've done something similar.

I was trying to make a decent cocktail recipe database, and scraped the text of cocktails from about 1400 webpages. Note that this was just the text of the cocktail recipe, and cocktail recipes are comparatively small. I sent the text to an LLM for JSON structuring, and the LLM routinely miscategorized liquor types. It also failed to normalize measurements with explicit instructions and the temperature set to zero. I gave up.

hellovai9mo ago

have you tried schema-aligned parsing yet?

the idea is that instead of using JSON.parse, we create a custom Type.parse for each type you define.

so if you want a:

   class Job { company: string[] }

And the LLM happens to output:

   { "company": "Amazon" }

We can upcast "Amazon" -> ["Amazon"] since you indicated that in your schema.

https://www.boundaryml.com/blog/schema-aligned-parsing

and since its only post processing, the technique will work on every model :)

for example, on BFCL benchmarks, we got SAP + GPT3.5 to beat out GPT4o ( https://www.boundaryml.com/blog/sota-function-calling )

3 more replies

handfuloflight9mo ago

Which LLM?

1 more reply

coderatlarge9mo ago

note the per 100g prompt might lead the llm to reach for the part of its training distribution that is actually written in terms of the 100g standard and just lead to different recall rather than a suboptimal calculation based on non-standardized per 100g training examples.

BobbyJo9mo ago

The versioning makes sense to me. Software has a cycle where a new tool is created to solve a problem, and the problem winds up being meaty enough, and the tool effective enough, that the exploration of the problem space the tool unlocks is essentially a new category/skill/whatever.

computers -> assembly -> HLL -> web -> cloud -> AI

Nothing on that list has disappeared, but the work has changed enough to warrant a few major versions imo.

TeMPOraL9mo ago

For me it's even simpler:

V1.0: describing solutions to specific problems directly, precisely, for machines to execute.

V2.0: giving machine examples of good and bad answers to specific problems we don't know how to describe precisely, for machine to generalize from and solve such indirectly specified problem.

V3.0: telling machine what to do in plain language, for it to figure out and solve.

V2 was coded in V1 style, as a solution to problem of "build a tool that can solve problems defined as examples". V3 was created by feeding everything and the kitchen sink into V2 at the same time, so it learns to solve the problem of being general-purpose tool.

BobbyJo9mo ago

That's less a versioning of software and more a versioning of AI's role in software. None -> Partial -> Total. Its a valid scale with regard to AI's role specifically, but I think Karpathy was intending to make a point about software as a whole, and even the details of how that middle "Partial" era evolves.

lymbo9mo ago

What are some predictions people are anticipating for V4?

My Hail Mary is it’s going to be groups of machines gathering real world data, creating their own protocols or forms of language isolated to their own systems in order to optimize that particular system’s workflow and data storage.

1 more reply

gchamonlive9mo ago

> versioning is a bit confusing analogy because it usually additionally implies some kind of improvement

Exactly what I felt. Semver like naming analogies bring their own set of implicit meanings, like major versions having to necessarily supersede or replace the previous version, that is, it doesn't account for coexistence further than planning migration paths. This expectation however doesn't correspond with the rest of the talk, so I thought I might point it out. Thanks for taking the time to reply!

poorcedural9mo ago

Andrej, maybe Software 3.0 is not written in spoken language like code or prompts. Software 3.0 is recorded in behavior, a behavior that today's software lacks. That behavior is written and consumed by machine and annotated by human interaction. Skipping to 3.0 is premature, but Software 2.0 is a ramp.

mclau1579mo ago

Would this also be more of a push towards robotics and getting physical AI in our every day lives

poorcedural9mo ago

Very insightful! How you would describe boiling an egg is different than how a machine would describe it to another machine.

1 more reply

swyx9mo ago

no no, it actually is a good analogy in 2 ways:

1) it is a breaking change from the prior version

2) it is an improvement in that, in its ideal/ultimate form, it is a full superset of capabilities of the previous version

j / k navigate · click thread line to collapse

0 comments

miki1232119mo ago

What do you think about structured outputs / JSON mode / constrained decoding / whatever you wish to call it?

abdullin9mo ago

Even more than that. With Structured Outputs we essentially control layout of the response, so we can force LLM to go through different parts of the completion in a predefined order.

solaire_oa9mo ago

I also think that structured outputs are criminally underused, but it isn't perfect... and per your example, it might not even be good, because I've done something similar.

hellovai9mo ago

have you tried schema-aligned parsing yet?

the idea is that instead of using JSON.parse, we create a custom Type.parse for each type you define.

so if you want a:

   class Job { company: string[] }

And the LLM happens to output:

   { "company": "Amazon" }

We can upcast "Amazon" -> ["Amazon"] since you indicated that in your schema.

https://www.boundaryml.com/blog/schema-aligned-parsing

and since its only post processing, the technique will work on every model :)

for example, on BFCL benchmarks, we got SAP + GPT3.5 to beat out GPT4o ( https://www.boundaryml.com/blog/sota-function-calling )

3 more replies

handfuloflight9mo ago

Which LLM?

1 more reply

coderatlarge9mo ago

BobbyJo9mo ago

computers -> assembly -> HLL -> web -> cloud -> AI

Nothing on that list has disappeared, but the work has changed enough to warrant a few major versions imo.

TeMPOraL9mo ago

For me it's even simpler:

V1.0: describing solutions to specific problems directly, precisely, for machines to execute.

V2.0: giving machine examples of good and bad answers to specific problems we don't know how to describe precisely, for machine to generalize from and solve such indirectly specified problem.

V3.0: telling machine what to do in plain language, for it to figure out and solve.

BobbyJo9mo ago

lymbo9mo ago

What are some predictions people are anticipating for V4?

1 more reply

gchamonlive9mo ago

> versioning is a bit confusing analogy because it usually additionally implies some kind of improvement

poorcedural9mo ago

mclau1579mo ago

Would this also be more of a push towards robotics and getting physical AI in our every day lives

poorcedural9mo ago

Very insightful! How you would describe boiling an egg is different than how a machine would describe it to another machine.

1 more reply

swyx9mo ago

no no, it actually is a good analogy in 2 ways:

1) it is a breaking change from the prior version

2) it is an improvement in that, in its ideal/ultimate form, it is a full superset of capabilities of the previous version

j / k navigate · click thread line to collapse