from events -- table is first, which enables autocomplete
select
count(), -- * is implied, easier to type
customer_id, -- trailing commas allowed everywhere
group by all -- automatically groups by all non-aggregate columns
order by all -- orders rows by all columns in selected order
https://duckdb.org/docs/stable/sql/dialect/friendly_sqlImagine three joins of three queries A,B and C, where first join J1 joins A and B, second join J2 joins A and C and third join J3 joins J1 and J2. Note that I said "queries," not "tables" - these A, B and C can be complex things one would not want or be able to compute more than once. Forget about compute, A, B and C can be quite complex to even write down and the user may really do not want to repeat itself. Look at TPC-DS, there are subqueries in the "with" sections that are quite complex.
This is why pipeline replacements for SQL are more or less futile efforts. They simplify simple part and avoid touching complex one.
I think that something like Verse [1] is more or less way to go. Not the Verse itself, but functional logic programming as an idea, where you can have first class data producers and effect system to specify transactions.
> SQL is not a pipeline, it is a graph.
Maybe it's both? and maybe there will always be hard-to-express queries in SQL, and that's ok?
the RDBMS's relational model is certainly a graph and joins accordingly introduce complexity.
For me, just as creators of the internet regret that subdomains come before domains, I really we could go back in time and have `FROM` be the first predicate and not `SELECT`. This is much more intuitive and lends itself to the idea of a pipeline: a table scan (FROM) that is piped to a projection (SELECT).
Yes, there will always be hard-to-express queries, the question is how far can we go?
I haven't seen anyone make the point about graphs before. FWIW PRQL allows defining named subqueries that can be reused, like J1 and J2 in your example.
It will be around for a long time, there's an infinite number of resources and examples for it and if you ever have to onboard someone into your code they don't need to learn something new. You can get pretty far by just using CTEs to "pipeline".
At the same time, it's challenging that SQL cannot be iteratively improved and experimented upon.
IMHO, PRQL is a reasonable approach to extending SQL without replacing SQL.
But what I'd love to see is projects like Google's zeta-sql [1] and Substrait [2] get more traction. It would provide a more stable, standardized foundation upon which SQL could be improved, which would make the case for "SQL forever" even more strong.
I've blogged about this before [3].
[1]: https://github.com/google/googlesql [2]: https://substrait.io/ [3]: https://roundup.getdbt.com/p/problem-exists-between-database...
Is there even a db vendor that offers full ANSII SQL support? Last I'd checked the answer was no.
The fact that you need to replicate the same complex expressions in multiple values that you select or multiple parts of a where clause is bad enough.
That there’s no way to pipe the result of a query into another query is just adding insult to injury. (Just create a custom view bro).
But if technology competed in quality and not in vendor lock in, we wouldn’t have to deal with C++ or JavaScript.
It's not as elegant as PRQL, because of course it's bolted onto the existing SQL syntax, rather than a redesign from scratch. But it has a big name behind it, and it's actually running in prod in Google Cloud... so it might have more momentum.
[0]: https://cloud.google.com/blog/products/data-analytics/simpli...
Indeed we're doing fewer new features (and haven't posted to the posts page in a long time, as you noticed).
But it's still maintained, folks are still using it, if anyone finds bugs in simple-to-moderate queries then we'll fix them.
LLMs probably took a bit of the wind out of our sails for making this "the new standard". But I still think it's a really nice language and interface; if the world changed again such that it became more widely useful, I'd jump to spending lots of time on it again.
I just found out about PRQL yesterday! I was looking for a query language that is more token efficient and easier to reason about for LLMs than SQL.
PRQL looks amazing for data analytics agents. Our first few test are quite promising.
I also really appreciate the python bindings. We don't give our agent direct access to the database, we only provide the schema information. The python api makes it super easy to convert a query into an AST, which lets us do some basic offline validation of table names, etc.
I'm sad to hear that about LLMs. I sometimes wonder if the software world is going to be "locked into" our existing languages, because it's what the LLMs can work with.
FWIW, I think the PRQL syntax is beautiful.
I always find that funny. If you have to provide a pronunciation guide for your product, perhaps consider a different name. I guarantee you’ll still have people pronouncing each individual letter, either because they don’t know or because it’ll be less ambiguous.
— D. Richard Hipp
Even people who know because then they have to explain it which wastes time for no benefit.
But even then, it makes sense to choose to pronounce it “the wrong way”. I say “sequelite” because that’s fairly clear in context, but “sequel” might not be so I pronounce each letter in that case.
Did know PNG is supposed to be pronounced “ping”? I don’t know anyone who chooses to do that, even if they know.
Personally, it's weird to me that `FROM` (scan) comes after `SELECT` (projection). IMHO the datasource should come first!
CTEs don't solve this problem they just let you chain multiple SELECTs together.
A real use case is that it would allow intellisense to kick in a lot earlier!
Instead you have to write `SELECT * FROM my_table` and only after can you edit the `*` and get auto-complete suggestions of the columns from `my_table`
They kind of do in my head. "WITH" reads to me exactly like the datasource you are looking for.
How does it work if you want to join multiple complex subqueries?
How far can a new query language like this go? Could this be added as a native query language in e.g. postgresql?