Yeah the semantics are very weird, but I guess “prompt engineering” is weird too, so it makes sense :) .
Everything between “sample” and “from” is basically a script that generates a prompt, which is incrementally fed to the LM.
Each line contained in double quotes will get appended to the prompt, using an f-string syntax, like normal LM templates. So if you have a local python variable “foo”, you can say “how do I make {foo}?” and it will substitute its value into the prompt (not interesting).
But things in square brackets are called “hole variables”, and do the opposite. If you follow up the previous with the line “you make it by [instructions]” , the prompt up to that point is passed to the LM, and the hole in the prompt is filled, and the result is stored in a local variable “instructions” which you can reference later on in the prompt, or in python script.
Any lines in between that don’t have double quotes are interpreted as python. So you can make program logic and LM calls conditional on the result of previous LM calls, or other results of some other process. So for example you could build a critique loop like the critique chain in the LC docs out of an actual while loop, where the while loop breaks when the LM determines the output is acceptable.
The exact same thing is possible with LangChain already, but it would involve creating templates, instantiating chains, etc, which isn’t bad, but adds complexity. In LMQL syntax, you can glance at the program and plainly see what it does using your programming brain… “yeah this while loop breaks when the screenplay is good enough, and the refined version gets returned” whereas I think LC’s abstractions make something simple like this look complex.
The “where” clause is where you specify constraints, which allow you to limit what the value of a hole can be. In this case you could apply a “where” constraint to a hole variable [rating] that forces it to be either “good enough” or “needs improvement”, and nothing else can possibly be sampled from the token distribution. This makes pipelines a lot more efficient by eliminating the need for “correction chains” in a lot of places. Also, once the tokens “ne” or “go” have been generated, LMQL doesn’t have to request any more tokens because the result is already uniquely determined, and it can substitute the rest and move on.
The other thing that I love about LMQL is that everything is async. Last time I tried, maybe two months ago, making a LC chain asynchronous didn’t feel natural. In my use cases, chains were async more often than not and it was kind of annoying.
In fact under the hood, the LMQL query is compiled to a decorated async function. So at the end of the day, you can use any of your queries as simple async functions. If you want to make react Agents, or any other LM abstraction you like, you pretty much just have to stick a few @lmql.query decorated functions inside a class definition and you’re good to go. That’s what I meant by the Tensorflow/keras analogy.
LMQL still isn’t mature and there’s a lot on the roadmap. Prompting is a wild west, and altogether we haven’t even discovered a lot of the problems we will need to solve. I like to think the situation is like how I imagine operating systems and a lot of software in general looked before Bell labs. For now at least, I think of all the options, LMQL is closest to the golden path.
Let me know if you have any more questions, feel free to send an email!