TypeChat (opens in new tab)

(microsoft.github.io)

556 pointsDanRosenwasser2y ago169 comments

169 comments

I don't see the value add here.

Here's the core of the message sent to the LLM: https://github.com/microsoft/TypeChat/blob/main/src/typechat...

You are basically getting a fixed prompt to return structured data with a small amount of automation and vendor lockin. All these LLM libraries are just crappy APIs to the underlying API. It is trivial to write a script that does the same and will be much more flexible as models and user needs evolve.

As an example, think about how you could change the prompt or use python classes instead. How much work would this be using a library like this versus something that lifts the API calls and text templating to the user like: https://github.com/hofstadter-io/hof/blob/_dev/flow/chat/llm...

bwestergard2y ago

The value is in:

1. Running the typescript type checker against what is returned by the LLM.

2. If there are type errors, combining those into a "repair prompt" that will (it is assumed) have a higher likelihood of eliciting an LLM output that type checks.

3. Gracefully handling the cases where the heuristic in #2 fails.

https://github.com/microsoft/TypeChat/blob/main/src/typechat...

In my experience experimenting with the same basic idea, the heuristic in #2 works surprisingly well for relatively simple types (i.e. records and arrays not nested too deeply, limited use of type variables). It turns out that prompting LLMs to return values inhabiting relatively simple types can be used to create useful applications. Since that is valuable, this library is valuable inasmuch as it eliminates the need to hand roll this request pattern, and provides a standardized integration with the typescript codebase.

BoorishBears2y ago

Here's a project that does that better imo:

https://github.com/dzhng/zod-gpt

And by better I mean doesn't tie you to OpenAI for no good reason

4 more replies

verdverm2y ago

these are trivial steps you can add in any script, as your link demonstrates.

Why would I want to add all this extra stuff just for that? The opaque retry until it returns valid JSON? That sounds like it will make for many pleasant support cases or issues

Personally, I have found investing more effort in the actual prompt engineering improves success rates and reduces the need to retry with an appended error message. Especially helpful are input/output pairs (i.e. few-shot) and while we haven't tried it yet, I imagine fine-tuning and distillation would improve the situation even more

2 more replies

politelemon2y ago

Pretty much all the LLM libraries I'm seeing are like this. They boil down to a request to the LLM to do something in a certain way. I've noticed under complex conditions, they stop listening and start reverting to their 'default' behavior.

But that said it still feels like using a library is the right thing to do... so I'm still watching this space to see what matures and emerges as a good-enough approach.

TechBro86152y ago

Where's the vendor lock-in? This is an open source library and the file you linked to even includes configs for two vendors: ChatGPT and Bard.

verdverm2y ago

vendor lock in to a library and the design choices they make

basically, since it reduces the user input space, you are giving up flexibility and control for some questionably valuable abstractions, such as a predefined prompt, no ability to prompt engineer, CoT/ToT, etc...

if anything, choose a broader framework like langchain and have something like this an extension or plugin to the framework, no need for a library for this one little thing

1 more reply

parentheses2y ago

The value is turn unstructured data into structured data and ensure it satisfies schema constraints.

For example: you have 1000 free-text survey responses about your product, building a schema and for-each `TypeChat`ing them would get you a dataset for that free-text. It's mind-bogglingly useful.

verdverm2y ago

yes, turning unstructured data into structured data is one of the most useful ways to use an LLM right now. It has been done before with using schemas and could be done without all the extra cruft.

There was a similar example a few months back using XML instead, but I haven't heard much about it since, because again, the library did not add value on top of doing these things in a more open or scripted setting.

MSFT has another project in similar vain, guardrails, interesting idea, but made worse by wrapping it in a library. Most of these LLM ideas are better as a function than a library, make them transform the i/o rather than every library needing to write wrappers around the LLM APIs as well

There are several more making use of OpenAPI / JSONSchema rather than TS.

We use a subset of CUE, essentially JSON without as many quotes or commas. The LLMs are quite flexible with few-shot learning. They can be made more reliable with fine-tuning. They can be made faster and cheaper with distillation.

whimsicalism2y ago

Yes as the abstractions gets better it becomes easier to code useful things.

verdverm2y ago

the debate is about how valuable the abstraction here is to warrant a library, and the fact that it predefines the prompt and api call flow, so you cannot prompt engineer or use something like CoT/ToT

2 more replies

ofslidingfeet2y ago

Getting these models to reliably return a consistent structure without frequent human intervention and/or having to account for the personal moral opinions of big tech CEOs is not trivial, no.

verdverm2y ago

There are multiple ways to get structured output, and what this library is doing is not really that interesting. The concept is interesting and has had multiple implementations already, the code (and abstraction) here is not interesting and creates more issues than it solves

1 more reply

nfw22y ago

It’s essentially prompt engineering as a service with some basic quality-control features thrown in.

Sure, your engineers could implement it themselves, but don’t they have better things to do?

verdverm2y ago

the quality of the prompt does not look that good from my experience reaching flexible structured output based on a schema

There are other questionable decisions and a valuable use of engineering time is indeed to evaluate candidate abstractions and think about the long-term cost of adopting them. In this case, it does not seem like it saves that much effort and in the long run means a lot of important LLM knobs are out of your control. Not a good tradeoff

quickthrower22y ago

You can probably define the python language grammar as a typescript type though!

andy_xor_andrew2y ago

Here's one thing I don't get.

Why all the rigamarole of hoping you get a valid response, adding last-mile validators to detect invalid responses, trying to beg the model to pretty please give me the syntax I'm asking for...

...when you can guarantee a valid JSON syntax by only sampling tokens that are valid? Instead of greedily picking the highest-scoring token every time, you select the highest-scoring token that conforms to the requested format.

This is what Guidance does already, also from Microsoft: https://github.com/microsoft/guidance

But OpenAI apparently does not expose the full scores of all tokens, it only exposes the highest-scoring token. Which is so odd, because if you run models locally, using Guidance is trivial, and you can guarantee your json is correct every time. It's faster to generate, too!

zarzavat2y ago

It’s like the story of the brown M&Ms[0]. If the model is returning semantically correct data, you would hope that it can at least get the syntax correct. And if it can’t then you ought to throw the response away anyway.

Also I believe that such a method cannot capture the full complexity of TypeScript types.

[0] https://www.snopes.com/fact-check/brown-out/

tonyonodi2y ago

That's a great analogy! I'd been wondering for a while whether that's a problem with this approach; to be honest I still don't know whether it is, so it would be good to see someone test it empirically.

rolisz2y ago

> when you can guarantee a valid JSON syntax by only sampling tokens that are valid? Instead of greedily picking the highest-scoring token every time, you select the highest-scoring token that conforms to the requested format.

Yes, you can guarantee a syntactically correct JSON that way, but will it be a semantically correct? If the model really really really wanted to put another token there, but you are forcing it to put a {, maybe the following generated text won't be as good.

I'm not sure, I'm just wondering out loud.

geysersam2y ago

Well, if the output doesn't conform to the format it's useless. If the model can't produce good and correct output then it's simply not up to the task.

2 more replies

donfotto2y ago

I agree that sampling only valid tokens is a very promising approach.

I experimented a bit with finetuning open source LLMs for JSON parsing (without guided token sampling). Depending on one's use case, 70B parameters might be an overkill. I've seen promising results with much much smaller models. Finetuning a small model combined with guided token sampling would be interesting.

Then again, finetuning is perhaps not perfect for very general applications. When you get input that you didn't anticipate in your training dataset, you're in trouble.

csomar2y ago

The LLM will be able to handle more complex scenarios. I could imagine a use-case: If you are ordering from a self-vending machine, instead of having to go through the whole process you just say your order out loud. You can say, for example, a couple chocolate bars and the LLM tries to guess from inventory.

Of course, if you are on the web, it makes no sense. It is much easier to use the mouse to click on a couple of items.

Scaevolus2y ago

Llama.cpp recently added grammar based sampling, which constraints token selection to follow a rigid format like you describe.

https://github.com/ggerganov/llama.cpp/pull/1773

CGamesPlay2y ago

OpenAI doesn’t expose this information because it makes it vastly easier to train your model off theirs.

paxys2y ago

I swear I think of something and Anders Hejlsberg builds it.

Structured requests and responses are 100% the next evolution of LLMs. People are already getting tired of chatbots. Being able to plug in any backend without worrying about text parsing and prompts will be amazing.

unshavedyak2y ago

> Structured requests and responses are 100% the next evolution of LLMs. People are already getting tired of chatbots. Being able to plug in any backend without worrying about text parsing and prompts will be amazing.

Yup, a general desire of mine is to locally run an LLM which has actionable interfaces that i provide. Things like "check time", "check calendar", "send message to user" and etc.

TypeChat seems to be in the right area. I can imagine an extra layer of "fit this JSON input to a possible action, if any" and etc.

I see a neat hybrid future where a bot (LLM/etc) works to glue layers of real code together. Sometimes part of ingestion, tagging, etc - sometimes part of responding to input, etc.

All around this is a super interesting area to me but frankly, everything is moving so fast i haven't concerned myself with diving too deep in it yet. Lots of smart people are working on it so i feel the need to let the dust settle a bit. But i think we're already there to have my "dream home interface" working.

psyphy2y ago

I just published CopilotKit, which lets you implement this exact functionality for any web app via react hooks.

`useMakeCopilotActionable` = you pass the type of the input, and an arbitrary typescript function implementation.

https://github.com/RecursivelyAI/CopilotKit

Feedback welcome

sdwr2y ago

I was thinking about this yesterday. ChatGPT really is good enough to act as a proper virtual assistant / home manager, with enough toggles exposed.

1 more reply

paragraft2y ago

Tell me about it - I implemented this just yesterday except with a focus on functions rather than objects.

_the_inflator2y ago

This as a dynamic mapper in a backend layer can be huge.

For example, try to keep up with (frequent) API payload changes around a consumer in Java. We implemented a NodeJS layer just to stay sane. (Banking, huge JSON payloads, backends in Java)

Mapping is really something LLMs could shine.

tylerrobinson2y ago

It could shine, or it could be an absolute disaster.

Code/functionality archeology is already insanely hard in orgs with old codebases. Imagine the facepalming that Future You will have when you see that the way the system works is some sort of nondeterministic translation layer that magically connects two APIs where versions are allowed to fluctuate.

1 more reply

sidnb132y ago

Maybe worth looking into: https://news.ycombinator.com/item?id=36750083

sidnb132y ago

maybe worth looking into: https://news.ycombinator.com/item?id=36750083

dvt2y ago

This is my hot take: we're slowly entering the "tooling" phase of AI, where people realize there's no real value generation here, but people are so heavily invested in AI, that money is still being pumped into building stuff (and of course, it's one of the best way to guarantee your academic paper gets published). I mean, LangChain is kind of a joke and they raised $10M seed lol.

DeFi/crypto went through this phase 2 years ago. Mark my words, it's going to end up being this weird limbo for a few years where people will slowly realize that AI is a feature, not a product. And that its applicability is limited and that it won't save the world. It won't be able to self-drive cars due to all the edge cases, it won't be able to perform surgeries because it might kill people, etc.

I keep mentioning that even the most useful AI tools (Copilot, etc.) are marginally useful at best. At the very best it saves me a few clicks on Google, but the agents are not "intelligent" in the least. We went through a similar bubble a few years ago with chatbots[1]. These days, no one cares about them. "The metaverse" was much more short-lived, but the same herd mentality applies. "It's the next big thing" until it isn't.

[1] https://venturebeat.com/business/facebook-opens-its-messenge...

JSavageOne2y ago

Hard disagree on AI being just a bubble with limited applicability.

> It won't be able to self-drive cars due to all the edge cases, it won't be able to perform surgeries because it might kill people, etc.

You literally just cherry-picked the most difficult applications of AI. The vast majority of peoples' jobs don't involve life or death, and thus are ripe for automation. And even if the life or death jobs retain a human element, they will most certainly be augmented by AI agents. For example a surgery might still be handled by a human, but it will probably become mandatory for a doctor or nurse to diagnose a patient in conjunction with an AI.

> We went through a similar bubble a few years ago with chatbots

Are you honestly comparing that to now? ChatGPT got to 100 million users in a few months and everyone and their grandma has used it. I wasn't even aware of any chatbot bubble a few years ago, it certainly wasn't that significant.

> even the most useful AI tools (Copilot, etc.) are marginally useful at best

Sure, but you're literally seeing them in their worst versions. ChatGPT has been a life-changer for me, and it doesn't even execute code yet (Code Interpreter does though, which I haven't tested yet)

By 2030 humans probably won't be typing code anymore, it'll just be prompting machines and directing AI agents. By then most peoples' jobs will also be automated.

AI isn't just some fad, it's going to change literally every industry, and way faster than people think. The cynicism here trying to dismiss the implications of AI by comparing it to the metaverse are just absurd and utterly lacking in imagination. Yes there is still a lot of work that needs to be done, specifically in the AI agent side of things, but we will get there, probably way faster than people realize, and the implications are enormous.

hnlmorg2y ago

> By 2030 humans probably won't be typing code anymore, it'll just be prompting machines and directing AI agents. By then most peoples' jobs will also be automated.

Eventually, perhaps. But by 2023? Definitely not.

I think both you and the GP are at opposite ends of the extreme and the reality is somewhere in that gulf in between

coffeemug2y ago

When I use ChatGPT I feel like I'm looking at a different technology than other people. It's supposed to be able to answer every question and teach me anything, but in practice it turns out to be a content-farm-as-a-service (CFaaS?) Copilot is similar, it's usually easier for me to write the code than iterate through it to find the least bad example and then fix the bugs.

That said, AlphaGo went from "hallucinating" bad moves to the best player in the world in a fairly short period of time. If this is at all doable for language models, GPT-x may blow all this out of the water.

dvt2y ago

> That said, AlphaGo went from "hallucinating" bad moves to the best player in the world in a fairly short period of time. If this is at all doable for language models, GPT-x may blow all this out of the water.

I think the state space when looking at something like Go v. natural language (or even formal languages like programming languages or first/second order logic) is not even remotely comparable. The number of states in Go is 3^361. The number of possible sentences in English, while technically infinite, has some sensible estimates (Googling shows the relatively tame 10^570 figure).

dwaltrip2y ago

> we're slowly entering the "tooling" phase of AI, where people realize there's no real value generation here

Hard disagree. A very clear counterexample from my usage:

Gpt-4 is phenomenal at helping a skilled person work on tangential tasks where their skills generally translate but they don’t have strong domain knowledge.

I’ve been writing code for a decade, and recently I’ve been learning some ML for the first time. I’m using gpt-4 everyday and it’s been a delight.

To be fair, I can see one might find the rough edges annoying on occasion. For me, it’s quite manageable and not much of a bother. I’ve gotten better at ignoring or working around them. There is definitely an art to using these tools.

I expect the value provided to continue growing. We haven’t plucked all of the low-hanging or mid-hanging fruit yet.

I can share chat transcripts if you are interested.

phillipcarter2y ago

> DeFi/crypto went through this phase 2 years ago.

A key difference is that these things, no matter how impressive their technical merits, required people to completely reshape whatever they were doing to get the first bit of benefit.

Modern AI (and really, usually LLMs) has immediate and broad applicability across nearly every economic sector, and that's why so many of us are already building and releasing features with it. There's incredible value in this stuff. Completely world-changing? No. But enough to create new product categories and fundamentally improve large swaths of existing product capabilities? Absolutely.

notRobot2y ago

I feel like this is actually a very sensible take. AI has many uses, and it can be really good at some things, but it's not the hail mary it's being treated as.

ploppyploppy2y ago

Your analysis is based on what's possible now. This is the worst it'll ever be.

bottlepalm2y ago

How does no voice assistant (Apple, Google, Amazon, Microsoft) integrate LLMs into their service yet, and how has OpenAI not released their own voice assistant?

Also like RSS, if there were some standard URL a websites exposed for AI interaction, using this TypeChat to expose the interfaces, we'd be well on our way here.

dbish2y ago

OpenAI is pretty likely working on their own (see Kaparthy's "Building a kind of JARVIS @ OреոΑӏ"), and Microsoft of course is doing an integration or reinterpretation of Cortana with OpenAI's LLMS (since they are incapable of building their own models nowadays it seems - "Why do we have Microsoft Research at all?”-S.N.), but there's a lot less value in voice driven LLM then there is in actually being able to perform actions. Take Alexa for example, you need a system that can handle smart home control in a predictable, debuggable, way otherwise people would get annoyed. I definitely think you can do this, but the current system as built (and others like Siri and to a lesser use Cortana) all have a bunch of hooks and APIs being used by years and years of rules and software built atop less powerful models. They need to both maintain the current quality and improve on it while swapping out major parts of their system in order to make this work, which takes time.

Not to mention that none of these assistants actually make any money, they all lose money really, and are only worthwhile to big companies with other ways to make cash or drive other parts of their business (phones, shopping, whatever), so there's less incentive for a startup to do it.

I worked on both Cortana and Alexa in the past, thought a lot about trying to build a new version of them ground up with the LLM advancements, and while the tech was all straight forward and even had some new ideas for use cases that are enabled now, could not figure out a business model that would work (and hence, working on something completely different now).

bottlepalm2y ago

It's July, they just needed to put a voice interface on ChatGPT, it'd easily help them sell more pro licenses as well. I'm not a conspiracy person, but this just seems so obvious it feels like there's something else going on here.

3 more replies

nonethewiser2y ago

> How does no voice assistant (Apple, Google, Amazon, Microsoft) integrate LLMs into their service yet

When I first learned what ChatGPT was my thought was "oh so like what Siri is supposed to be."

perryizgr82y ago

Talking to Alexa is laughable now, after having interacted with ChatGPT and Bing. It's so frustrating to see capable hardware being let down by crappy software for years upon years.

zitterbewegung2y ago

Microsoft is doing that to replace Cortana in windows 11

nathan_f772y ago

I'm really looking forward to something that I can use to control Home Assistant. I'm just really nervous about using any cloud-based API for this, so I would like to get something running on a server in my own house. But I would also want the voice recognition and response times to be extremely fast so I don't feel like I'm ever waiting for anything. I've seen a few DIY attempts at a personal assistant but there's always a significant delay that would become very annoying if I used it regularly.

9dev2y ago

Seriously, it feels like there’s some collusion going on behind the scenes. This is the most obvious use case for the technology, but none of the big vendors have explored it.

jomohke2y ago

It takes a while to develop a product, and the world only woke up to them mere months ago

mavamaarten2y ago

I think it's because it turns out that taming a generative language model is really difficult. It's what we need to support more than some hardcoded simple questions, but companies like Google who are known for search want to keep their image of "use us to find what you're looking for". In the current state, their models (especially Bard in my experience) simply return bullshit and want to sound confident. They need to get beyond that stage.

But I feel you. My Google Assistant doesn't even seem to look for answers to questions anymore. All I get, even for simple queries, is a "sorry, I don't understand".

COGlory2y ago

Willow, and the Willow Interference Server have the option to use Vicuna with speech input and TTS

joefreeman2y ago

> It's unfortunately easy to get a response that includes { "name": "grande latte" }

    type Item = {
        name: string;
        ...
        size?: string;

I'm not really following how this would avoid `name: "grande latte"`?

But then the example response:

    "size": 16

> This is pretty great!

Is it? It's not even returning the type being asked for?

I'm guessing this is more of a typo in the example, because otherwise this seems cool.

DanRosenwasserOP2y ago

Whoops - thanks for catching this. Earlier iterations of this blog post used an different schema where `size` had been accidentally specified as a `number`. While we changed the schema, we hadn't re-run the prompt. It should be fixed now!

graypegg2y ago

Their example here is really weak overall IMO. Like more than just that typo. You also probably wouldn’t want a “name” string field anyway. Like there’s nothing stoping you from receiving

    {
        name: “the brown one”,
        size: “the espresso cup”,
    … }

Like that’s just as bad as parsing the original string. You probably want big string union types for each one of those representing whatever known values you want, so the LLM can try and match them.

But now why would you want that to be locked into the type syntax? You probably want something more like Zod where you can use some runtime data to build up those union types.

You also want restrictions on the types too, like quantity should be a positive, non-fractional integer. Of course you can just validate the JSON values afterwards, but now the user gets two kinds of errors. One from the LLM which is fluent and human sounding, and the other which is a weird technical “oops! You provided a value that is too large for quantity” error.

The type syntax seems like the wrong place to describe this stuff.

mynameisvlad2y ago

I feel like that's just a documentation bug. I'm guessing they changed from number of ounces to canonical size late in the drafting of the announcement and forgot to change the output value to match.

There would be no way for a system to map "grande" to 16 based on the code provided, and 16 does not seem to be used anywhere else.

hirsin2y ago

The rest of the paragraph discusses "what happens when it ignores type?", so I think that's where they were going with that?

33a2y ago

Looks like it just runs the LLM in a loop until it spits out something that type checks, prompting with the error message.

This is a cute idea and it looks like it should work, but I could see this getting expensive with larger models and input prompts. Probably not a fix for all scenarios.

babyshake2y ago

At least with OpenAI, wouldn't it be better if under the hood it was using the new function call feature?

akavi2y ago

Typescript's type system is much more expressive than the one the function call feature makes available.

I imagine closing the loop (using the TS compiler to restrict token output weights) is in the works, though it's probably not totally trivial. You'd need:

* An incremental TS compiler that could report "valid" or "valid prefix" (ie, valid as long as the next token is not EOF)

* The ability to backtrack the model

Idk how hard either one piece is.

1 more reply

osaariki2y ago

I'm not familiar with how TypeChat works, but Guidance [1] is another similar project that can actually integrate into the token sampling to enforce formats.

[1]: https://github.com/microsoft/guidance

J_Shelby_J2y ago

It’s logit bias. You don’t even need another library to do this. You can do it with three lines of python.

Here’s an example of one of my implementations of logit bias.

https://github.com/ShelbyJenkins/shelby-as-a-service/blob/74...

behnamoh2y ago

except that guidance is defunct and is not maintained anymore.

1 more reply

SkyPuncher2y ago

I suspect most products are concerned about product-market fit then they can wrangle costs down.

There's also a good assumption that models will be improving structured output as the market is demanding it.

garrett_makes2y ago

I built and released something really similar to this (but smaller scope) for Laravel PHP this week: https://github.com/adrenallen/ai-agents-laravel

My take on this is, it should be easy for an engineer to spin up a new "bot" with a given LLM. There's a lot of boring work around translating your functions into something ChatGPT understands, then dealing with the response and parsing it back again.

With systems like these you can just focus on writing the actual PHP code, adding a few clear comments, and then the bot can immediately use your code like a tool in whatever task you give it.

Another benefit to things like this, is that it makes it much easier for code to be shared. If someone writes a function, you could pull it into a new bot and immediately use it. It eliminates the layer of "converting this for the LLM to use and understand", which I think is pretty cool and makes building so much quicker!

None of this is perfect yet, but I think this is the direction everything will go so that we can start to leverage each others code better. Think about how we use package managers in coding today, I want a package manager for AI specific tooling. Just install the "get the weather" library, add it to my bot, and now it can get the weather.

jasongill2y ago

Starred this as I've been working on a similar but maybe more broader scoped approach, but I think some of your ideas are really slick!

katamaster8182y ago

Hang on, so this is doing runtime validation of an object against a typescript type definition? Can this be shipped as a standalone library/feature? This would be absolutely game changing for validating api response payloads, etc. in typescript codebases.

tehsauce2y ago

maybe this function?

https://github.com/microsoft/TypeChat/blob/4d34a5005c67bc494...

katamaster8182y ago

yup, just found that, super neat, I am 100% interested in using this for other runtime validation...

It's interesting because I've always been under the impression the TS team was against the use of types at runtime (that's why projects like https://github.com/nonara/ts-patch exist), but now they're doing it themselves with this project...

I wonder what the performance overhead of starting up an instance of tsc in memory is? Is this suitable for low latency situations? Lots of testing to do...

1 more reply

parentheses2y ago

I'm very surprised that they're not using `guidance` [0] here.

It not only would allow them to suggest that required fields be completed (avoiding the need for validation [1]) and probably save them GPU time in the end.

There must be a reason and I'm dying to know what it is! :)

Side-note, I was in the process of building this very thing and good ol' Misrocoft just swung in and ate my lunch.. :/

[0] https://github.com/microsoft/guidance

[1] https://github.com/microsoft/TypeChat/blob/main/src/typechat...

Zaheer2y ago

It's not super clear how this differs from another recently released library from Microsoft: Guidance (https://github.com/microsoft/guidance).

They both seem to aim to solve the problem of getting typed, valid responses back from LLMs

DanRosenwasserOP2y ago

One of the key things that we've focused on with TypeChat is not just that it acts as a specification for retrieving structured data (i.e. JSON), but that the structure is actually valid - that it's well-typed based on your type definitions.

The thing to keep in mind with these different libraries is that they are not necessarily perfect substitutes for each other. They often serve different use-cases, or can be combined in various ways -- possibly using the techniques directly and independent of the libraries themselves.

tlrobinson2y ago

    const schema = fs.readFileSync(path.join(__dirname, "sentimentSchema.ts"), "utf8");
    const translator = typechat.createJsonTranslator<SentimentResponse>(model, schema, "SentimentResponse");

It would have been much nicer if they took this an an opportunity to build generic runtime type introspection into TypeScript.

mahalex2y ago

So, it's a thing that appends "please format your response as the following JSON" to the prompt", then validates the actual response against the schema, all in a "while (true)" loop (literally) until it succeeds. This unbelievable achievement is a work of seven people (authors of the blog post).

Honestly, this is getting beyond embarrassing. How is this the world we live in?

jlnho2y ago

It's because not everyone can be as gifted as you.

I think the (arguably very prototypical) implementation is not what's interesting here. It's the concept itself. Natural language may soon become the default interface for most of the computing people do on a day to day basis, and tools like these will make it easier to create new applications in this space.

Edes2y ago

I'm gonna love trying to figure out what query gets the support chatbot to pair me with an actual human so that I can solve something that's off script

1 more reply

TeeWEE2y ago

Yeah it’s basically a retry loop. I’m curious about the average response time and the worst case amount of iterations.

At best, all these “retry until successfully” are just hacks to bridge the formal world with the stochastic. It’s just useless without some stats on how it performs.

And even if it conforms. Your not sure the data makes sense. Probably .. but exactly that probably

I would not recommend using this in production.

lsh1232y ago

Hm... so how do we know that the actual values in the produced json are correct???

mahalex2y ago

As with anything output by “AI”: you don’t.

siva72y ago

One of the authors is Anders Hejlsberg, the guy behind c# and delphi

rob742y ago

I think he's probably more of an author in the way that the leader of a research team is always credited on any paper by the team, even if he didn't personally do any actual work on it?

Anyway, TIL that Hejlsberg is also involved with TypeScript...

mahalex2y ago

That’s what makes it even more embarrassing.

gigel822y ago

I agree with comments saying this is basically a 10-line "demo script" everyone could write and it is weird to have big names associated with it.

But I heard from MS friends that AI is an absolute "need to have". If you're not working on AI, you're not getting (as much) budget. I suspect this is more about ticking the box than producing some complex project. Unfortunately, throughout the company, folks are doing all kinds of weird things to tick the box like writing a "copilot" (with associated azure openai costs) fine-tuned on a handful of documentation articles :(

huac2y ago

I've written a version of this in Golang (tied to OpenAI API, mostly): https://github.com/stillmatic/gollum/blob/main/dispatch.go

Define a struct and tag it with golang's json comments. Then, give it a prompt and ...

    type dinnerParty struct {
        Topic       string   `json:"topic" jsonschema:"required" jsonschema_description:"The topic of the conversation"`
        RandomWords []string `json:"random_words" jsonschema:"required" jsonschema_description:"Random words to prime the conversation"`
    }
    completer := openai.NewClient(os.Getenv("OPENAI_API_KEY"))
    d := gollum.NewOpenAIDispatcher[dinnerParty]("dinner_party", "Given a topic, return random words", completer, nil)
    output, _ := d.Prompt(context.Background(), "Talk to me about dinosaurs")

and you should get a response like

    expected := dinnerParty{
        Topic:       "dinosaurs",
        RandomWords: []string{"dinosaur", "fossil", "extinct"},
    }

trafnar2y ago

It's not clear to me how they ensure the responses will be valid JSON, are they just asking for it, then parsing the result with error checking?

esafak2y ago

Yes. https://github.com/microsoft/TypeChat/blob/main/src/typechat...

davnicwil2y ago

seems like they run the generated response through the typescript type checker, and if it fails, retry using the error message as a further hint to the LLM, until it succeeds.

anonzzzies2y ago

I would expect that, if it doesn’t do that even, why bother… that is also trivial to do anyway.

1 more reply

verdverm2y ago

also some very basic prompt engineering

robbie-c2y ago

This is funny, I have something pretty similar in my code, except it's using Zod for runtime typechecking, and I convert Zod schemas to json schemas and send that to gpt-3.5 as a function call. I would expect that using TypeScript's output is better for recovering from errors than with Zod's output, so I can definitely see the advantage of this.

sandkoan2y ago

Relevant: Built this which generalizes to arbitrary regex patterns / context free grammars with 100% adherence and is model-agnostic — https://news.ycombinator.com/item?id=36750083

_andrei_2y ago

Just use function calling, declare your function schema using Zod, and convert it to JSONSchema automatically. You don't have to write your types more than once, you get proper validation with great error messages, and can extend it.

abhinavkulkarni2y ago

There already are techniques to guade LLMs into producing output that adhere to a schema. For e.g. forcing LLMs to stick to a Context-Free Grammar: https://matt-rickard.com/context-free-grammar-parsing-with-l...

Just like many similar methods, this is based on logit biasing, so it may have an impact on quality.

geysersam2y ago

Anyone knows in what situations this approach is superior to jsonformer (https://github.com/1rgs/jsonformer) and vice versa?

Or are they solving different problems?

It seems jsonformer has some advantages such as only generating tokens for the values and not the structure of the JSON. But this project seems to have more of a closed feedback loop prompt the model to do the right thing.

waffletower2y ago

At least for llama.cpp users, this recently introduced PR -- https://github.com/ggerganov/llama.cpp/pull/1773 -- introducing grammar-based sampling could potentially improve structural reliability of LLaMA output. They provide an example JSON grammar as well.

rvz2y ago

Someone should just get this working on Llama 2 instead of O̶p̶e̶n̶AI.com [0]

All this is it's just talking to a AI model sitting on someone else's server.

[0] https://github.com/microsoft/TypeChat/blob/main/src/model.ts...

DanRosenwasserOP2y ago

Hi there! I'm one of the people working on TypeChat and I just want to say that we definitely welcome experimentation on things like this. We've actually been experimenting with running Llama 2 ourselves. Like you said, to get a model working with TypeChat all you really need is to provide a completion function. So give it a shot!

joelmgallant2y ago

The most recent gpt4all (https://github.com/nomic-ai/gpt4all) includes a local server compatible with OpenAPI -- this could be a useful start!

canadaduane2y ago

"Using Zod to Build Structured ChatGPT Queries"[1] is a pattern I found useful. This doesn't seem too different.

[1] https://medium.com/@canadaduane/using-zod-to-build-structure...

jensneuse2y ago

This looks quite similar to how were using OpenAI functions and zod (JSON Schema) to have OpenAI answer with JSON and interact with our custom functions to answer a prompt: https://wundergraph.com/blog/return_json_from_openai

xigoi2y ago

Why are we trying to get structured output out of something that was specifically designed to produce natural-language output?

ungerik2y ago

Because we can ;-)

davrous2y ago

This is a fantastic concept! It's going to be super useful to map users' intent to API / code in a super reliable way.

waffletower2y ago

Reliance on strong typing for LLM output coercion is a potentially lossy and inefficient approach that can introduce redundant LLM queries and costs. LLM output is far more subtle than this. But the strongly typed hammer is very attractive to many developers, particularly those in the Typescript ecosystem.

nurettin2y ago

This is rather trivial. The real challenge would be to make it choose what type to return. The function api does that, but then natural conversations sometimes involve calling multiple functions, and there isn't a good schema for that.

vbezhenar2y ago

That's interesting way to validate JSON. Basically they run the whole compiler (making it a runtime dependency). Hopefully this horrible implementation would nudge TypeScript developers into a direction of implementing RTTI.

TillE2y ago

I wish Copilot did something like this. I've found it'll regularly invent C# methods which don't exist, an error which seems trivial to catch and hide from the user. No output is better than bad output.

phillipcarter2y ago

I'd love to see a robust study on the effectiveness of this and several other ways to coax a structured response out:

- Lots of examples / prompt engineering techniques

- MS Guideance

- TypeChat

- OpenAI functions (the model itself is tuned to do this, a key differentiator)

- ...others?

obiefernandez2y ago

If I can use this instead of functions, it's gonna save me a buttload of API usage, because the Typescript interface syntax is so concise. Can't wait to try it.

ianzakalwe2y ago

I am not sure why this exist, maybe I am missing something, and it does not seem like there is much value past “hey check this out this is possible”

ameyab2y ago

Here's a relevant paper that folks may find interesting: <snip>Semantic Interpreter leverages an Analysis-Retrieval prompt construction method with LLMs for program synthesis, translating natural language user utterances to ODSL programs that can be transpiled to application APIs and then executed.</snip>

https://arxiv.org/abs/2306.03460

bestcoder692y ago

Why this instead of GPT Functions?

verdverm2y ago

it's basically the same thing, but uses a more concise spec for writing the schema (typescript vs jsonschema)

In the end, both methods try to coax the model into returning a JSON object, one method can be used with any model, the other is tied to a specific, ever changing vendor API

Why would one choose to only support "OpenAI" and nothing else?

yanis_t2y ago

TL;DR: This is ChatGPT + TypeScript.

I'm totally happy to be able to receive structured queries, but I'm also not 100% sure TypeScript is the right tool, it seems to be an overkill. I mean obviously you don't need the power of TS with all its enums, generics, etc.

Plus given that it will run multiple queries in loop, it might end up very expensive for it abide by your custom-mage complex type

nchase2y ago

this is going to create space for some hilarious and funky input attacks.

arc96932y ago

TL;DR: It's asking ChatGPT to format response according to a schema.

j / k navigate · click thread line to collapse

169 comments

verdverm2y ago

I don't see the value add here.

Here's the core of the message sent to the LLM: https://github.com/microsoft/TypeChat/blob/main/src/typechat...

bwestergard2y ago

The value is in:

1. Running the typescript type checker against what is returned by the LLM.

2. If there are type errors, combining those into a "repair prompt" that will (it is assumed) have a higher likelihood of eliciting an LLM output that type checks.

3. Gracefully handling the cases where the heuristic in #2 fails.

https://github.com/microsoft/TypeChat/blob/main/src/typechat...

BoorishBears2y ago

Here's a project that does that better imo:

https://github.com/dzhng/zod-gpt

And by better I mean doesn't tie you to OpenAI for no good reason

4 more replies

verdverm2y ago

these are trivial steps you can add in any script, as your link demonstrates.

Why would I want to add all this extra stuff just for that? The opaque retry until it returns valid JSON? That sounds like it will make for many pleasant support cases or issues

2 more replies

politelemon2y ago

But that said it still feels like using a library is the right thing to do... so I'm still watching this space to see what matures and emerges as a good-enough approach.

TechBro86152y ago

Where's the vendor lock-in? This is an open source library and the file you linked to even includes configs for two vendors: ChatGPT and Bard.

verdverm2y ago

vendor lock in to a library and the design choices they make

if anything, choose a broader framework like langchain and have something like this an extension or plugin to the framework, no need for a library for this one little thing

1 more reply

parentheses2y ago

The value is turn unstructured data into structured data and ensure it satisfies schema constraints.

For example: you have 1000 free-text survey responses about your product, building a schema and for-each `TypeChat`ing them would get you a dataset for that free-text. It's mind-bogglingly useful.

verdverm2y ago

yes, turning unstructured data into structured data is one of the most useful ways to use an LLM right now. It has been done before with using schemas and could be done without all the extra cruft.

There are several more making use of OpenAPI / JSONSchema rather than TS.

whimsicalism2y ago

Yes as the abstractions gets better it becomes easier to code useful things.

verdverm2y ago

2 more replies

ofslidingfeet2y ago

Getting these models to reliably return a consistent structure without frequent human intervention and/or having to account for the personal moral opinions of big tech CEOs is not trivial, no.

verdverm2y ago

1 more reply

nfw22y ago

It’s essentially prompt engineering as a service with some basic quality-control features thrown in.

Sure, your engineers could implement it themselves, but don’t they have better things to do?

verdverm2y ago

the quality of the prompt does not look that good from my experience reaching flexible structured output based on a schema

quickthrower22y ago

You can probably define the python language grammar as a typescript type though!

andy_xor_andrew2y ago

Here's one thing I don't get.

Why all the rigamarole of hoping you get a valid response, adding last-mile validators to detect invalid responses, trying to beg the model to pretty please give me the syntax I'm asking for...

This is what Guidance does already, also from Microsoft: https://github.com/microsoft/guidance

zarzavat2y ago

Also I believe that such a method cannot capture the full complexity of TypeScript types.

[0] https://www.snopes.com/fact-check/brown-out/

tonyonodi2y ago

rolisz2y ago

I'm not sure, I'm just wondering out loud.

geysersam2y ago

Well, if the output doesn't conform to the format it's useless. If the model can't produce good and correct output then it's simply not up to the task.

2 more replies

donfotto2y ago

I agree that sampling only valid tokens is a very promising approach.

Then again, finetuning is perhaps not perfect for very general applications. When you get input that you didn't anticipate in your training dataset, you're in trouble.

csomar2y ago

Of course, if you are on the web, it makes no sense. It is much easier to use the mouse to click on a couple of items.

Scaevolus2y ago

Llama.cpp recently added grammar based sampling, which constraints token selection to follow a rigid format like you describe.

https://github.com/ggerganov/llama.cpp/pull/1773

CGamesPlay2y ago

OpenAI doesn’t expose this information because it makes it vastly easier to train your model off theirs.

paxys2y ago

I swear I think of something and Anders Hejlsberg builds it.

unshavedyak2y ago

Yup, a general desire of mine is to locally run an LLM which has actionable interfaces that i provide. Things like "check time", "check calendar", "send message to user" and etc.

TypeChat seems to be in the right area. I can imagine an extra layer of "fit this JSON input to a possible action, if any" and etc.

I see a neat hybrid future where a bot (LLM/etc) works to glue layers of real code together. Sometimes part of ingestion, tagging, etc - sometimes part of responding to input, etc.

psyphy2y ago

I just published CopilotKit, which lets you implement this exact functionality for any web app via react hooks.

`useMakeCopilotActionable` = you pass the type of the input, and an arbitrary typescript function implementation.

https://github.com/RecursivelyAI/CopilotKit

Feedback welcome

sdwr2y ago

I was thinking about this yesterday. ChatGPT really is good enough to act as a proper virtual assistant / home manager, with enough toggles exposed.

1 more reply

paragraft2y ago

Tell me about it - I implemented this just yesterday except with a focus on functions rather than objects.

_the_inflator2y ago

This as a dynamic mapper in a backend layer can be huge.

For example, try to keep up with (frequent) API payload changes around a consumer in Java. We implemented a NodeJS layer just to stay sane. (Banking, huge JSON payloads, backends in Java)

Mapping is really something LLMs could shine.

tylerrobinson2y ago

It could shine, or it could be an absolute disaster.

1 more reply

sidnb132y ago

Maybe worth looking into: https://news.ycombinator.com/item?id=36750083

sidnb132y ago

maybe worth looking into: https://news.ycombinator.com/item?id=36750083

dvt2y ago

[1] https://venturebeat.com/business/facebook-opens-its-messenge...

JSavageOne2y ago

Hard disagree on AI being just a bubble with limited applicability.

> It won't be able to self-drive cars due to all the edge cases, it won't be able to perform surgeries because it might kill people, etc.

> We went through a similar bubble a few years ago with chatbots

> even the most useful AI tools (Copilot, etc.) are marginally useful at best

Sure, but you're literally seeing them in their worst versions. ChatGPT has been a life-changer for me, and it doesn't even execute code yet (Code Interpreter does though, which I haven't tested yet)

By 2030 humans probably won't be typing code anymore, it'll just be prompting machines and directing AI agents. By then most peoples' jobs will also be automated.

hnlmorg2y ago

> By 2030 humans probably won't be typing code anymore, it'll just be prompting machines and directing AI agents. By then most peoples' jobs will also be automated.

Eventually, perhaps. But by 2023? Definitely not.

I think both you and the GP are at opposite ends of the extreme and the reality is somewhere in that gulf in between

coffeemug2y ago

dvt2y ago

dwaltrip2y ago

> we're slowly entering the "tooling" phase of AI, where people realize there's no real value generation here

Hard disagree. A very clear counterexample from my usage:

Gpt-4 is phenomenal at helping a skilled person work on tangential tasks where their skills generally translate but they don’t have strong domain knowledge.

I’ve been writing code for a decade, and recently I’ve been learning some ML for the first time. I’m using gpt-4 everyday and it’s been a delight.

I expect the value provided to continue growing. We haven’t plucked all of the low-hanging or mid-hanging fruit yet.

I can share chat transcripts if you are interested.

phillipcarter2y ago

> DeFi/crypto went through this phase 2 years ago.

A key difference is that these things, no matter how impressive their technical merits, required people to completely reshape whatever they were doing to get the first bit of benefit.

notRobot2y ago

I feel like this is actually a very sensible take. AI has many uses, and it can be really good at some things, but it's not the hail mary it's being treated as.

ploppyploppy2y ago

Your analysis is based on what's possible now. This is the worst it'll ever be.

bottlepalm2y ago

How does no voice assistant (Apple, Google, Amazon, Microsoft) integrate LLMs into their service yet, and how has OpenAI not released their own voice assistant?

Also like RSS, if there were some standard URL a websites exposed for AI interaction, using this TypeChat to expose the interfaces, we'd be well on our way here.

dbish2y ago

bottlepalm2y ago

3 more replies

nonethewiser2y ago

> How does no voice assistant (Apple, Google, Amazon, Microsoft) integrate LLMs into their service yet

When I first learned what ChatGPT was my thought was "oh so like what Siri is supposed to be."

perryizgr82y ago

Talking to Alexa is laughable now, after having interacted with ChatGPT and Bing. It's so frustrating to see capable hardware being let down by crappy software for years upon years.

zitterbewegung2y ago

Microsoft is doing that to replace Cortana in windows 11

nathan_f772y ago

9dev2y ago

Seriously, it feels like there’s some collusion going on behind the scenes. This is the most obvious use case for the technology, but none of the big vendors have explored it.

jomohke2y ago

It takes a while to develop a product, and the world only woke up to them mere months ago

mavamaarten2y ago

But I feel you. My Google Assistant doesn't even seem to look for answers to questions anymore. All I get, even for simple queries, is a "sorry, I don't understand".

COGlory2y ago

Willow, and the Willow Interference Server have the option to use Vicuna with speech input and TTS

joefreeman2y ago

> It's unfortunately easy to get a response that includes { "name": "grande latte" }

    type Item = {
        name: string;
        ...
        size?: string;

I'm not really following how this would avoid `name: "grande latte"`?

But then the example response:

    "size": 16

> This is pretty great!

Is it? It's not even returning the type being asked for?

I'm guessing this is more of a typo in the example, because otherwise this seems cool.

DanRosenwasserOP2y ago

graypegg2y ago

Their example here is really weak overall IMO. Like more than just that typo. You also probably wouldn’t want a “name” string field anyway. Like there’s nothing stoping you from receiving

    {
        name: “the brown one”,
        size: “the espresso cup”,
    … }

But now why would you want that to be locked into the type syntax? You probably want something more like Zod where you can use some runtime data to build up those union types.

The type syntax seems like the wrong place to describe this stuff.

mynameisvlad2y ago

I feel like that's just a documentation bug. I'm guessing they changed from number of ounces to canonical size late in the drafting of the announcement and forgot to change the output value to match.

There would be no way for a system to map "grande" to 16 based on the code provided, and 16 does not seem to be used anywhere else.

hirsin2y ago

The rest of the paragraph discusses "what happens when it ignores type?", so I think that's where they were going with that?

33a2y ago

Looks like it just runs the LLM in a loop until it spits out something that type checks, prompting with the error message.

This is a cute idea and it looks like it should work, but I could see this getting expensive with larger models and input prompts. Probably not a fix for all scenarios.

babyshake2y ago

At least with OpenAI, wouldn't it be better if under the hood it was using the new function call feature?

akavi2y ago

Typescript's type system is much more expressive than the one the function call feature makes available.

I imagine closing the loop (using the TS compiler to restrict token output weights) is in the works, though it's probably not totally trivial. You'd need:

* An incremental TS compiler that could report "valid" or "valid prefix" (ie, valid as long as the next token is not EOF)

* The ability to backtrack the model

Idk how hard either one piece is.

1 more reply

osaariki2y ago

I'm not familiar with how TypeChat works, but Guidance [1] is another similar project that can actually integrate into the token sampling to enforce formats.

[1]: https://github.com/microsoft/guidance

J_Shelby_J2y ago

It’s logit bias. You don’t even need another library to do this. You can do it with three lines of python.

Here’s an example of one of my implementations of logit bias.

https://github.com/ShelbyJenkins/shelby-as-a-service/blob/74...

behnamoh2y ago

except that guidance is defunct and is not maintained anymore.

1 more reply

SkyPuncher2y ago

I suspect most products are concerned about product-market fit then they can wrangle costs down.

There's also a good assumption that models will be improving structured output as the market is demanding it.

garrett_makes2y ago

I built and released something really similar to this (but smaller scope) for Laravel PHP this week: https://github.com/adrenallen/ai-agents-laravel

With systems like these you can just focus on writing the actual PHP code, adding a few clear comments, and then the bot can immediately use your code like a tool in whatever task you give it.

jasongill2y ago

Starred this as I've been working on a similar but maybe more broader scoped approach, but I think some of your ideas are really slick!

katamaster8182y ago

tehsauce2y ago

maybe this function?

https://github.com/microsoft/TypeChat/blob/4d34a5005c67bc494...

katamaster8182y ago

yup, just found that, super neat, I am 100% interested in using this for other runtime validation...

I wonder what the performance overhead of starting up an instance of tsc in memory is? Is this suitable for low latency situations? Lots of testing to do...

1 more reply

parentheses2y ago

I'm very surprised that they're not using `guidance` [0] here.

It not only would allow them to suggest that required fields be completed (avoiding the need for validation [1]) and probably save them GPU time in the end.

There must be a reason and I'm dying to know what it is! :)

Side-note, I was in the process of building this very thing and good ol' Misrocoft just swung in and ate my lunch.. :/

[0] https://github.com/microsoft/guidance

[1] https://github.com/microsoft/TypeChat/blob/main/src/typechat...

Zaheer2y ago

It's not super clear how this differs from another recently released library from Microsoft: Guidance (https://github.com/microsoft/guidance).

They both seem to aim to solve the problem of getting typed, valid responses back from LLMs

DanRosenwasserOP2y ago

tlrobinson2y ago

    const schema = fs.readFileSync(path.join(__dirname, "sentimentSchema.ts"), "utf8");
    const translator = typechat.createJsonTranslator<SentimentResponse>(model, schema, "SentimentResponse");

It would have been much nicer if they took this an an opportunity to build generic runtime type introspection into TypeScript.

mahalex2y ago

Honestly, this is getting beyond embarrassing. How is this the world we live in?

jlnho2y ago

It's because not everyone can be as gifted as you.

Edes2y ago

I'm gonna love trying to figure out what query gets the support chatbot to pair me with an actual human so that I can solve something that's off script

1 more reply

TeeWEE2y ago

Yeah it’s basically a retry loop. I’m curious about the average response time and the worst case amount of iterations.

At best, all these “retry until successfully” are just hacks to bridge the formal world with the stochastic. It’s just useless without some stats on how it performs.

And even if it conforms. Your not sure the data makes sense. Probably .. but exactly that probably

I would not recommend using this in production.

lsh1232y ago

Hm... so how do we know that the actual values in the produced json are correct???

mahalex2y ago

As with anything output by “AI”: you don’t.

siva72y ago

One of the authors is Anders Hejlsberg, the guy behind c# and delphi

rob742y ago

I think he's probably more of an author in the way that the leader of a research team is always credited on any paper by the team, even if he didn't personally do any actual work on it?

Anyway, TIL that Hejlsberg is also involved with TypeScript...

mahalex2y ago

That’s what makes it even more embarrassing.

gigel822y ago

I agree with comments saying this is basically a 10-line "demo script" everyone could write and it is weird to have big names associated with it.

huac2y ago

I've written a version of this in Golang (tied to OpenAI API, mostly): https://github.com/stillmatic/gollum/blob/main/dispatch.go

Define a struct and tag it with golang's json comments. Then, give it a prompt and ...

    type dinnerParty struct {
        Topic       string   `json:"topic" jsonschema:"required" jsonschema_description:"The topic of the conversation"`
        RandomWords []string `json:"random_words" jsonschema:"required" jsonschema_description:"Random words to prime the conversation"`
    }
    completer := openai.NewClient(os.Getenv("OPENAI_API_KEY"))
    d := gollum.NewOpenAIDispatcher[dinnerParty]("dinner_party", "Given a topic, return random words", completer, nil)
    output, _ := d.Prompt(context.Background(), "Talk to me about dinosaurs")

and you should get a response like

    expected := dinnerParty{
        Topic:       "dinosaurs",
        RandomWords: []string{"dinosaur", "fossil", "extinct"},
    }

trafnar2y ago

It's not clear to me how they ensure the responses will be valid JSON, are they just asking for it, then parsing the result with error checking?

esafak2y ago

Yes. https://github.com/microsoft/TypeChat/blob/main/src/typechat...

davnicwil2y ago

seems like they run the generated response through the typescript type checker, and if it fails, retry using the error message as a further hint to the LLM, until it succeeds.

anonzzzies2y ago

I would expect that, if it doesn’t do that even, why bother… that is also trivial to do anyway.

1 more reply

verdverm2y ago

also some very basic prompt engineering

robbie-c2y ago

sandkoan2y ago

Relevant: Built this which generalizes to arbitrary regex patterns / context free grammars with 100% adherence and is model-agnostic — https://news.ycombinator.com/item?id=36750083

_andrei_2y ago

abhinavkulkarni2y ago

Just like many similar methods, this is based on logit biasing, so it may have an impact on quality.

geysersam2y ago

Anyone knows in what situations this approach is superior to jsonformer (https://github.com/1rgs/jsonformer) and vice versa?

Or are they solving different problems?

waffletower2y ago

rvz2y ago

Someone should just get this working on Llama 2 instead of O̶p̶e̶n̶AI.com [0]

All this is it's just talking to a AI model sitting on someone else's server.

[0] https://github.com/microsoft/TypeChat/blob/main/src/model.ts...

DanRosenwasserOP2y ago

joelmgallant2y ago

The most recent gpt4all (https://github.com/nomic-ai/gpt4all) includes a local server compatible with OpenAPI -- this could be a useful start!

canadaduane2y ago

"Using Zod to Build Structured ChatGPT Queries"[1] is a pattern I found useful. This doesn't seem too different.

[1] https://medium.com/@canadaduane/using-zod-to-build-structure...

jensneuse2y ago

xigoi2y ago

Why are we trying to get structured output out of something that was specifically designed to produce natural-language output?

ungerik2y ago

Because we can ;-)

davrous2y ago

This is a fantastic concept! It's going to be super useful to map users' intent to API / code in a super reliable way.

waffletower2y ago

nurettin2y ago

vbezhenar2y ago

TillE2y ago

phillipcarter2y ago

I'd love to see a robust study on the effectiveness of this and several other ways to coax a structured response out:

- Lots of examples / prompt engineering techniques

- MS Guideance

- TypeChat

- OpenAI functions (the model itself is tuned to do this, a key differentiator)

- ...others?

obiefernandez2y ago

If I can use this instead of functions, it's gonna save me a buttload of API usage, because the Typescript interface syntax is so concise. Can't wait to try it.

ianzakalwe2y ago

I am not sure why this exist, maybe I am missing something, and it does not seem like there is much value past “hey check this out this is possible”

ameyab2y ago

https://arxiv.org/abs/2306.03460

bestcoder692y ago

Why this instead of GPT Functions?

verdverm2y ago

it's basically the same thing, but uses a more concise spec for writing the schema (typescript vs jsonschema)

In the end, both methods try to coax the model into returning a JSON object, one method can be used with any model, the other is tied to a specific, ever changing vendor API

Why would one choose to only support "OpenAI" and nothing else?

yanis_t2y ago

TL;DR: This is ChatGPT + TypeScript.

Plus given that it will run multiple queries in loop, it might end up very expensive for it abide by your custom-mage complex type

nchase2y ago

this is going to create space for some hilarious and funky input attacks.

arc96932y ago

TL;DR: It's asking ChatGPT to format response according to a schema.

j / k navigate · click thread line to collapse