Here's the core of the message sent to the LLM: https://github.com/microsoft/TypeChat/blob/main/src/typechat...
You are basically getting a fixed prompt to return structured data with a small amount of automation and vendor lockin. All these LLM libraries are just crappy APIs to the underlying API. It is trivial to write a script that does the same and will be much more flexible as models and user needs evolve.
As an example, think about how you could change the prompt or use python classes instead. How much work would this be using a library like this versus something that lifts the API calls and text templating to the user like: https://github.com/hofstadter-io/hof/blob/_dev/flow/chat/llm...
1. Running the typescript type checker against what is returned by the LLM.
2. If there are type errors, combining those into a "repair prompt" that will (it is assumed) have a higher likelihood of eliciting an LLM output that type checks.
3. Gracefully handling the cases where the heuristic in #2 fails.
https://github.com/microsoft/TypeChat/blob/main/src/typechat...
In my experience experimenting with the same basic idea, the heuristic in #2 works surprisingly well for relatively simple types (i.e. records and arrays not nested too deeply, limited use of type variables). It turns out that prompting LLMs to return values inhabiting relatively simple types can be used to create useful applications. Since that is valuable, this library is valuable inasmuch as it eliminates the need to hand roll this request pattern, and provides a standardized integration with the typescript codebase.
https://github.com/dzhng/zod-gpt
And by better I mean doesn't tie you to OpenAI for no good reason
Why would I want to add all this extra stuff just for that? The opaque retry until it returns valid JSON? That sounds like it will make for many pleasant support cases or issues
Personally, I have found investing more effort in the actual prompt engineering improves success rates and reduces the need to retry with an appended error message. Especially helpful are input/output pairs (i.e. few-shot) and while we haven't tried it yet, I imagine fine-tuning and distillation would improve the situation even more
But that said it still feels like using a library is the right thing to do... so I'm still watching this space to see what matures and emerges as a good-enough approach.
basically, since it reduces the user input space, you are giving up flexibility and control for some questionably valuable abstractions, such as a predefined prompt, no ability to prompt engineer, CoT/ToT, etc...
if anything, choose a broader framework like langchain and have something like this an extension or plugin to the framework, no need for a library for this one little thing
For example: you have 1000 free-text survey responses about your product, building a schema and for-each `TypeChat`ing them would get you a dataset for that free-text. It's mind-bogglingly useful.
There was a similar example a few months back using XML instead, but I haven't heard much about it since, because again, the library did not add value on top of doing these things in a more open or scripted setting.
MSFT has another project in similar vain, guardrails, interesting idea, but made worse by wrapping it in a library. Most of these LLM ideas are better as a function than a library, make them transform the i/o rather than every library needing to write wrappers around the LLM APIs as well
There are several more making use of OpenAPI / JSONSchema rather than TS.
We use a subset of CUE, essentially JSON without as many quotes or commas. The LLMs are quite flexible with few-shot learning. They can be made more reliable with fine-tuning. They can be made faster and cheaper with distillation.
Sure, your engineers could implement it themselves, but don’t they have better things to do?
There are other questionable decisions and a valuable use of engineering time is indeed to evaluate candidate abstractions and think about the long-term cost of adopting them. In this case, it does not seem like it saves that much effort and in the long run means a lot of important LLM knobs are out of your control. Not a good tradeoff
Why all the rigamarole of hoping you get a valid response, adding last-mile validators to detect invalid responses, trying to beg the model to pretty please give me the syntax I'm asking for...
...when you can guarantee a valid JSON syntax by only sampling tokens that are valid? Instead of greedily picking the highest-scoring token every time, you select the highest-scoring token that conforms to the requested format.
This is what Guidance does already, also from Microsoft: https://github.com/microsoft/guidance
But OpenAI apparently does not expose the full scores of all tokens, it only exposes the highest-scoring token. Which is so odd, because if you run models locally, using Guidance is trivial, and you can guarantee your json is correct every time. It's faster to generate, too!
Also I believe that such a method cannot capture the full complexity of TypeScript types.
Yes, you can guarantee a syntactically correct JSON that way, but will it be a semantically correct? If the model really really really wanted to put another token there, but you are forcing it to put a {, maybe the following generated text won't be as good.
I'm not sure, I'm just wondering out loud.
I experimented a bit with finetuning open source LLMs for JSON parsing (without guided token sampling). Depending on one's use case, 70B parameters might be an overkill. I've seen promising results with much much smaller models. Finetuning a small model combined with guided token sampling would be interesting.
Then again, finetuning is perhaps not perfect for very general applications. When you get input that you didn't anticipate in your training dataset, you're in trouble.
Of course, if you are on the web, it makes no sense. It is much easier to use the mouse to click on a couple of items.
Structured requests and responses are 100% the next evolution of LLMs. People are already getting tired of chatbots. Being able to plug in any backend without worrying about text parsing and prompts will be amazing.
Yup, a general desire of mine is to locally run an LLM which has actionable interfaces that i provide. Things like "check time", "check calendar", "send message to user" and etc.
TypeChat seems to be in the right area. I can imagine an extra layer of "fit this JSON input to a possible action, if any" and etc.
I see a neat hybrid future where a bot (LLM/etc) works to glue layers of real code together. Sometimes part of ingestion, tagging, etc - sometimes part of responding to input, etc.
All around this is a super interesting area to me but frankly, everything is moving so fast i haven't concerned myself with diving too deep in it yet. Lots of smart people are working on it so i feel the need to let the dust settle a bit. But i think we're already there to have my "dream home interface" working.
`useMakeCopilotActionable` = you pass the type of the input, and an arbitrary typescript function implementation.
https://github.com/RecursivelyAI/CopilotKit
Feedback welcome
For example, try to keep up with (frequent) API payload changes around a consumer in Java. We implemented a NodeJS layer just to stay sane. (Banking, huge JSON payloads, backends in Java)
Mapping is really something LLMs could shine.
Code/functionality archeology is already insanely hard in orgs with old codebases. Imagine the facepalming that Future You will have when you see that the way the system works is some sort of nondeterministic translation layer that magically connects two APIs where versions are allowed to fluctuate.
DeFi/crypto went through this phase 2 years ago. Mark my words, it's going to end up being this weird limbo for a few years where people will slowly realize that AI is a feature, not a product. And that its applicability is limited and that it won't save the world. It won't be able to self-drive cars due to all the edge cases, it won't be able to perform surgeries because it might kill people, etc.
I keep mentioning that even the most useful AI tools (Copilot, etc.) are marginally useful at best. At the very best it saves me a few clicks on Google, but the agents are not "intelligent" in the least. We went through a similar bubble a few years ago with chatbots[1]. These days, no one cares about them. "The metaverse" was much more short-lived, but the same herd mentality applies. "It's the next big thing" until it isn't.
[1] https://venturebeat.com/business/facebook-opens-its-messenge...
> It won't be able to self-drive cars due to all the edge cases, it won't be able to perform surgeries because it might kill people, etc.
You literally just cherry-picked the most difficult applications of AI. The vast majority of peoples' jobs don't involve life or death, and thus are ripe for automation. And even if the life or death jobs retain a human element, they will most certainly be augmented by AI agents. For example a surgery might still be handled by a human, but it will probably become mandatory for a doctor or nurse to diagnose a patient in conjunction with an AI.
> We went through a similar bubble a few years ago with chatbots
Are you honestly comparing that to now? ChatGPT got to 100 million users in a few months and everyone and their grandma has used it. I wasn't even aware of any chatbot bubble a few years ago, it certainly wasn't that significant.
> even the most useful AI tools (Copilot, etc.) are marginally useful at best
Sure, but you're literally seeing them in their worst versions. ChatGPT has been a life-changer for me, and it doesn't even execute code yet (Code Interpreter does though, which I haven't tested yet)
By 2030 humans probably won't be typing code anymore, it'll just be prompting machines and directing AI agents. By then most peoples' jobs will also be automated.
AI isn't just some fad, it's going to change literally every industry, and way faster than people think. The cynicism here trying to dismiss the implications of AI by comparing it to the metaverse are just absurd and utterly lacking in imagination. Yes there is still a lot of work that needs to be done, specifically in the AI agent side of things, but we will get there, probably way faster than people realize, and the implications are enormous.
Eventually, perhaps. But by 2023? Definitely not.
I think both you and the GP are at opposite ends of the extreme and the reality is somewhere in that gulf in between
That said, AlphaGo went from "hallucinating" bad moves to the best player in the world in a fairly short period of time. If this is at all doable for language models, GPT-x may blow all this out of the water.
I think the state space when looking at something like Go v. natural language (or even formal languages like programming languages or first/second order logic) is not even remotely comparable. The number of states in Go is 3^361. The number of possible sentences in English, while technically infinite, has some sensible estimates (Googling shows the relatively tame 10^570 figure).
Hard disagree. A very clear counterexample from my usage:
Gpt-4 is phenomenal at helping a skilled person work on tangential tasks where their skills generally translate but they don’t have strong domain knowledge.
I’ve been writing code for a decade, and recently I’ve been learning some ML for the first time. I’m using gpt-4 everyday and it’s been a delight.
To be fair, I can see one might find the rough edges annoying on occasion. For me, it’s quite manageable and not much of a bother. I’ve gotten better at ignoring or working around them. There is definitely an art to using these tools.
I expect the value provided to continue growing. We haven’t plucked all of the low-hanging or mid-hanging fruit yet.
I can share chat transcripts if you are interested.
A key difference is that these things, no matter how impressive their technical merits, required people to completely reshape whatever they were doing to get the first bit of benefit.
Modern AI (and really, usually LLMs) has immediate and broad applicability across nearly every economic sector, and that's why so many of us are already building and releasing features with it. There's incredible value in this stuff. Completely world-changing? No. But enough to create new product categories and fundamentally improve large swaths of existing product capabilities? Absolutely.
Also like RSS, if there were some standard URL a websites exposed for AI interaction, using this TypeChat to expose the interfaces, we'd be well on our way here.
Not to mention that none of these assistants actually make any money, they all lose money really, and are only worthwhile to big companies with other ways to make cash or drive other parts of their business (phones, shopping, whatever), so there's less incentive for a startup to do it.
I worked on both Cortana and Alexa in the past, thought a lot about trying to build a new version of them ground up with the LLM advancements, and while the tech was all straight forward and even had some new ideas for use cases that are enabled now, could not figure out a business model that would work (and hence, working on something completely different now).
When I first learned what ChatGPT was my thought was "oh so like what Siri is supposed to be."
But I feel you. My Google Assistant doesn't even seem to look for answers to questions anymore. All I get, even for simple queries, is a "sorry, I don't understand".
type Item = {
name: string;
...
size?: string;
I'm not really following how this would avoid `name: "grande latte"`?But then the example response:
"size": 16
> This is pretty great!Is it? It's not even returning the type being asked for?
I'm guessing this is more of a typo in the example, because otherwise this seems cool.
{
name: “the brown one”,
size: “the espresso cup”,
… }
Like that’s just as bad as parsing the original string. You probably want big string union types for each one of those representing whatever known values you want, so the LLM can try and match them.But now why would you want that to be locked into the type syntax? You probably want something more like Zod where you can use some runtime data to build up those union types.
You also want restrictions on the types too, like quantity should be a positive, non-fractional integer. Of course you can just validate the JSON values afterwards, but now the user gets two kinds of errors. One from the LLM which is fluent and human sounding, and the other which is a weird technical “oops! You provided a value that is too large for quantity” error.
The type syntax seems like the wrong place to describe this stuff.
There would be no way for a system to map "grande" to 16 based on the code provided, and 16 does not seem to be used anywhere else.
This is a cute idea and it looks like it should work, but I could see this getting expensive with larger models and input prompts. Probably not a fix for all scenarios.
I imagine closing the loop (using the TS compiler to restrict token output weights) is in the works, though it's probably not totally trivial. You'd need:
* An incremental TS compiler that could report "valid" or "valid prefix" (ie, valid as long as the next token is not EOF)
* The ability to backtrack the model
Idk how hard either one piece is.
Here’s an example of one of my implementations of logit bias.
https://github.com/ShelbyJenkins/shelby-as-a-service/blob/74...
There's also a good assumption that models will be improving structured output as the market is demanding it.
My take on this is, it should be easy for an engineer to spin up a new "bot" with a given LLM. There's a lot of boring work around translating your functions into something ChatGPT understands, then dealing with the response and parsing it back again.
With systems like these you can just focus on writing the actual PHP code, adding a few clear comments, and then the bot can immediately use your code like a tool in whatever task you give it.
Another benefit to things like this, is that it makes it much easier for code to be shared. If someone writes a function, you could pull it into a new bot and immediately use it. It eliminates the layer of "converting this for the LLM to use and understand", which I think is pretty cool and makes building so much quicker!
None of this is perfect yet, but I think this is the direction everything will go so that we can start to leverage each others code better. Think about how we use package managers in coding today, I want a package manager for AI specific tooling. Just install the "get the weather" library, add it to my bot, and now it can get the weather.
https://github.com/microsoft/TypeChat/blob/4d34a5005c67bc494...
It's interesting because I've always been under the impression the TS team was against the use of types at runtime (that's why projects like https://github.com/nonara/ts-patch exist), but now they're doing it themselves with this project...
I wonder what the performance overhead of starting up an instance of tsc in memory is? Is this suitable for low latency situations? Lots of testing to do...
It not only would allow them to suggest that required fields be completed (avoiding the need for validation [1]) and probably save them GPU time in the end.
There must be a reason and I'm dying to know what it is! :)
Side-note, I was in the process of building this very thing and good ol' Misrocoft just swung in and ate my lunch.. :/
[0] https://github.com/microsoft/guidance
[1] https://github.com/microsoft/TypeChat/blob/main/src/typechat...
They both seem to aim to solve the problem of getting typed, valid responses back from LLMs
The thing to keep in mind with these different libraries is that they are not necessarily perfect substitutes for each other. They often serve different use-cases, or can be combined in various ways -- possibly using the techniques directly and independent of the libraries themselves.
const schema = fs.readFileSync(path.join(__dirname, "sentimentSchema.ts"), "utf8");
const translator = typechat.createJsonTranslator<SentimentResponse>(model, schema, "SentimentResponse");
It would have been much nicer if they took this an an opportunity to build generic runtime type introspection into TypeScript.Honestly, this is getting beyond embarrassing. How is this the world we live in?
I think the (arguably very prototypical) implementation is not what's interesting here. It's the concept itself. Natural language may soon become the default interface for most of the computing people do on a day to day basis, and tools like these will make it easier to create new applications in this space.
At best, all these “retry until successfully” are just hacks to bridge the formal world with the stochastic. It’s just useless without some stats on how it performs.
And even if it conforms. Your not sure the data makes sense. Probably .. but exactly that probably
I would not recommend using this in production.
Anyway, TIL that Hejlsberg is also involved with TypeScript...
But I heard from MS friends that AI is an absolute "need to have". If you're not working on AI, you're not getting (as much) budget. I suspect this is more about ticking the box than producing some complex project. Unfortunately, throughout the company, folks are doing all kinds of weird things to tick the box like writing a "copilot" (with associated azure openai costs) fine-tuned on a handful of documentation articles :(
Define a struct and tag it with golang's json comments. Then, give it a prompt and ...
type dinnerParty struct {
Topic string `json:"topic" jsonschema:"required" jsonschema_description:"The topic of the conversation"`
RandomWords []string `json:"random_words" jsonschema:"required" jsonschema_description:"Random words to prime the conversation"`
}
completer := openai.NewClient(os.Getenv("OPENAI_API_KEY"))
d := gollum.NewOpenAIDispatcher[dinnerParty]("dinner_party", "Given a topic, return random words", completer, nil)
output, _ := d.Prompt(context.Background(), "Talk to me about dinosaurs")
and you should get a response like expected := dinnerParty{
Topic: "dinosaurs",
RandomWords: []string{"dinosaur", "fossil", "extinct"},
}Just like many similar methods, this is based on logit biasing, so it may have an impact on quality.
Or are they solving different problems?
It seems jsonformer has some advantages such as only generating tokens for the values and not the structure of the JSON. But this project seems to have more of a closed feedback loop prompt the model to do the right thing.
All this is it's just talking to a AI model sitting on someone else's server.
[0] https://github.com/microsoft/TypeChat/blob/main/src/model.ts...
[1] https://medium.com/@canadaduane/using-zod-to-build-structure...
- Lots of examples / prompt engineering techniques
- MS Guideance
- TypeChat
- OpenAI functions (the model itself is tuned to do this, a key differentiator)
- ...others?
In the end, both methods try to coax the model into returning a JSON object, one method can be used with any model, the other is tied to a specific, ever changing vendor API
Why would one choose to only support "OpenAI" and nothing else?
I'm totally happy to be able to receive structured queries, but I'm also not 100% sure TypeScript is the right tool, it seems to be an overkill. I mean obviously you don't need the power of TS with all its enums, generics, etc.
Plus given that it will run multiple queries in loop, it might end up very expensive for it abide by your custom-mage complex type