Pipelined Relational Query Language, Pronounced "Prequel" (opens in new tab)

(prql-lang.org)

66 pointsdmit3mo ago58 comments

58 comments

DuckDB had the right idea: just allow some flexibility in the relative order of the `select` and `from` clauses, and make a few other concessions for ergonomics. This then becomes valid:

    from events      -- table is first, which enables autocomplete
    select
        count(),     -- * is implied, easier to type
        customer_id, -- trailing commas allowed everywhere
    group by all     -- automatically groups by all non-aggregate columns
    order by all     -- orders rows by all columns in selected order

https://duckdb.org/docs/stable/sql/dialect/friendly_sql

andrew_lettuce3mo ago

I get the ease of use - and sometimes use them myself- but implied (or relative) shortcuts are IMO a bad habit that can lead to serious issues that don't manifest as errors. I do like the from clause first which better matches the underlying relationship algebra!

thesz3mo ago

SQL is not a pipeline, it is a graph.

Imagine three joins of three queries A,B and C, where first join J1 joins A and B, second join J2 joins A and C and third join J3 joins J1 and J2. Note that I said "queries," not "tables" - these A, B and C can be complex things one would not want or be able to compute more than once. Forget about compute, A, B and C can be quite complex to even write down and the user may really do not want to repeat itself. Look at TPC-DS, there are subqueries in the "with" sections that are quite complex.

This is why pipeline replacements for SQL are more or less futile efforts. They simplify simple part and avoid touching complex one.

I think that something like Verse [1] is more or less way to go. Not the Verse itself, but functional logic programming as an idea, where you can have first class data producers and effect system to specify transactions.

[1] https://en.wikipedia.org/wiki/Unreal_Engine#Verse

data_ders3mo ago

TIL about Verse looks cool I'll have to check it out.

> SQL is not a pipeline, it is a graph.

Maybe it's both? and maybe there will always be hard-to-express queries in SQL, and that's ok?

the RDBMS's relational model is certainly a graph and joins accordingly introduce complexity.

For me, just as creators of the internet regret that subdomains come before domains, I really we could go back in time and have `FROM` be the first predicate and not `SELECT`. This is much more intuitive and lends itself to the idea of a pipeline: a table scan (FROM) that is piped to a projection (SELECT).

thesz3mo ago

Pipeline is a specific kind of a graph.

Yes, there will always be hard-to-express queries, the question is how far can we go?

snthpy3mo ago

Thanks, I'll check out Verse.

I haven't seen anyone make the point about graphs before. FWIW PRQL allows defining named subqueries that can be reused, like J1 and J2 in your example.

jnpnj3mo ago

Crazy to think that Fortnite might unleash a new population of people who toyed with functional-logic as their first paradigm.

lloydatkinson3mo ago

Does it really help to call SQL a graph?

data_ders3mo ago

right? like it's a graph and a relational model query and a pipeline and a language and an abstract syntax tree and declarative logical plan

thesz3mo ago

It does. Just like any other programming language.

lloydatkinson3mo ago

May as well call everything a graph at that point; meaningless.

1 more reply

dewey3mo ago

Every time I see these layers on top of SQL I think: Just use regular, boring SQL

It will be around for a long time, there's an infinite number of resources and examples for it and if you ever have to onboard someone into your code they don't need to learn something new. You can get pretty far by just using CTEs to "pipeline".

data_ders3mo ago

I'm as big a SQL stan as the next person and I'm also very skeptical anytime anyone says that SQL needs to be replaced.

At the same time, it's challenging that SQL cannot be iteratively improved and experimented upon.

IMHO, PRQL is a reasonable approach to extending SQL without replacing SQL.

But what I'd love to see is projects like Google's zeta-sql [1] and Substrait [2] get more traction. It would provide a more stable, standardized foundation upon which SQL could be improved, which would make the case for "SQL forever" even more strong.

I've blogged about this before [3].

[1]: https://github.com/google/googlesql [2]: https://substrait.io/ [3]: https://roundup.getdbt.com/p/problem-exists-between-database...

itishappy3mo ago

Does anybody just use "regular, boring SQL" in practice though? All queries I have seen are loaded with regex and other non-standard extensions.

Is there even a db vendor that offers full ANSII SQL support? Last I'd checked the answer was no.

dewey3mo ago

In my case I consider Postgres / MariaDB "regular, boring SQL".

itishappy3mo ago

The problem persists, as Postgres and MariaDB use incompatible SQL dialects, right down to (imo) core concepts such as how to specify an automatically generated primary key.

1 more reply

kubb3mo ago

Complex queries in SQL can quickly get out of control.

The fact that you need to replicate the same complex expressions in multiple values that you select or multiple parts of a where clause is bad enough.

That there’s no way to pipe the result of a query into another query is just adding insult to injury. (Just create a custom view bro).

But if technology competed in quality and not in vendor lock in, we wouldn’t have to deal with C++ or JavaScript.

tomjakubowski3mo ago

DuckDB's "friendly SQL" variant fixes some of these little problems with SQL, including giving the ability to use a column alias in WHERE clauses.

https://duckdb.org/docs/stable/sql/dialect/friendly_sql

Taikonerd3mo ago

Google's "pipe syntax" is a similar idea: [0]

It's not as elegant as PRQL, because of course it's bolted onto the existing SQL syntax, rather than a redesign from scratch. But it has a big name behind it, and it's actually running in prod in Google Cloud... so it might have more momentum.

[0]: https://cloud.google.com/blog/products/data-analytics/simpli...

hardwaregeek3mo ago

There’s probably a good reason why not, but I’d love a query language with sum types. They just feel like a natural way to model a lot of data

Taikonerd3mo ago

Is this project stalling out? The last post on the "posts" page is from March 2023. But the last commit to the git repo was last week...

maximilianroos3mo ago

Maintainer here!

Indeed we're doing fewer new features (and haven't posted to the posts page in a long time, as you noticed).

But it's still maintained, folks are still using it, if anyone finds bugs in simple-to-moderate queries then we'll fix them.

LLMs probably took a bit of the wind out of our sails for making this "the new standard". But I still think it's a really nice language and interface; if the world changed again such that it became more widely useful, I'd jump to spending lots of time on it again.

marcoglauser2mo ago

Thank you for building and maintaining PRQL! I'm surprised to hear that growth stalled due to LLMs.

I just found out about PRQL yesterday! I was looking for a query language that is more token efficient and easier to reason about for LLMs than SQL.

PRQL looks amazing for data analytics agents. Our first few test are quite promising.

I also really appreciate the python bindings. We don't give our agent direct access to the database, we only provide the schema information. The python api makes it super easy to convert a query into an AST, which lets us do some basic offline validation of table names, etc.

Taikonerd3mo ago

Hi Maximilian -- nice to hear from you!

I'm sad to hear that about LLMs. I sometimes wonder if the software world is going to be "locked into" our existing languages, because it's what the LLMs can work with.

FWIW, I think the PRQL syntax is beautiful.

maximilianroos3mo ago

Thanks!

I think at the moment that's indeed the case...

But also maybe that will change — LLMs can learn new languages faster than people, and can _write_ new languages much faster than people. So wide confidence bounds for the future!

jeltz3mo ago

Matching SQL in features is very hard, especially if you also want to make it more sane and more powerful at the same time while also wanting to be able the generate valid SQL from your syntax. So I am not surprised that it stalled out.

Taikonerd3mo ago

Sure, but they never intended to support everything you can do in SQL. For example, they say on the Roadmap page that they're only going to support SELECTs -- there won't be a PRQL way to do an INSERT, UPDATE, etc.

jeltz3mo ago

I was only thinking about SELECT queries when I wrote ny comment because those are the hard things to implement.

andrew_lettuce3mo ago

When it comes to SQL it's the select that's by far the majority of the work though, the hard work with mutating operations is on the database implementation, not really the syntax or query plan

zerkten3mo ago

latexr3mo ago

The title of the submission is literally the first line on the website.

I always find that funny. If you have to provide a pronunciation guide for your product, perhaps consider a different name. I guarantee you’ll still have people pronouncing each individual letter, either because they don’t know or because it’ll be less ambiguous.

chuckadams3mo ago

For the first half of the 90's I pronounced Linux as "LINE-nucks". Then while he still had a thick accent, Linus told us all how he pronounced it "LEE-nooks".

rgovostes3mo ago

> I wrote SQLite, and I think it should be pronounced "S-Q-L-ite". Like a mineral. But I'm cool with y'all pronouncing it any way you want. :-)

— D. Richard Hipp

dmitOP3mo ago

I mean, as someone who grew up pronouncing it "Ess-Cue-Ell", I wish I learned earlier on that "Sequel" was the intended pronunciation. :)

yhavr3mo ago

Yes, in Ukrainian/Russian PRQL can be easily read as "prikol" (joke/gag/quirk). But I guess the best name would be "perkele" (emotional, like "damn") in Finnish.

tremon3mo ago

I always used ess-cue-ell to refer to the language, and sequel to refer to the Microsoft product. It would never occur to me to pronounce the Open Source alternative as postgressequel either, that's also invariably called post-gress-cue-ell here.

ajuc3mo ago

Nobody calls it sequel in my country.

Even people who know because then they have to explain it which wastes time for no benefit.

latexr3mo ago

Which is my point. A better name wouldn’t have had that problem. How could you ever know how it’s pronounced if you bump into it on a blog or social media post instead of the official website? We don’t write “SQL (pronounced “sequel”)” every time, we just write “SQL”.

But even then, it makes sense to choose to pronounce it “the wrong way”. I say “sequelite” because that’s fairly clear in context, but “sequel” might not be so I pronounce each letter in that case.

Did know PNG is supposed to be pronounced “ping”? I don’t know anyone who chooses to do that, even if they know.

randallsquared3mo ago

I pronounce PNG "ping". Also JPEG as "jay peg" but, counter to the creator's intention, GIF with a hard "g".

1 more reply

WorldMaker3mo ago

SQL's terrible name is IBM's fault for multiple reasons. They were going to spell it Sequel but IBM's lawyers found another company had a trademark on Sequel and forced them to rename it to avoid lawsuits. Rather than pick a new, more original name they instead shrunk the acronym and then IBM's legal also made them tell people to spell it out instead of call it "sequel" to continue to avoid the other company's trademark. So IBM made sure the name and its pronunciation was always a mixed message from very early in the language's history. Thanks, IBM

(I've always pronounced PNG "ping". It's interesting that there is a split there, I'm not sure I would have expected a large one.)

Avshalom3mo ago

I continue to pronounce it S-Q-L... and G-U-I; generally I pronounce most things as initialism and I'm right to do so.

bob10293mo ago

"Pipelined" SQL already exists in the form of common table expressions. I don't know of any providers where this is not available. SQLite has had support since 2014.

data_ders3mo ago

I agree that CTEs help solve the problem of being able to read a SQL query from top to bottom, but I wouldn't say they're a panacea!

Personally, it's weird to me that `FROM` (scan) comes after `SELECT` (projection). IMHO the datasource should come first!

CTEs don't solve this problem they just let you chain multiple SELECTs together.

A real use case is that it would allow intellisense to kick in a lot earlier!

Instead you have to write `SELECT * FROM my_table` and only after can you edit the `*` and get auto-complete suggestions of the columns from `my_table`

bob10293mo ago

> CTEs don't solve this problem

They kind of do in my head. "WITH" reads to me exactly like the datasource you are looking for.

infogulch3mo ago

This looks pretty nice to use!

How does it work if you want to join multiple complex subqueries?

How far can a new query language like this go? Could this be added as a native query language in e.g. postgresql?

jdkfjdo3mo ago

Sometimes I wonder if the only thing needed in SQL is to switch the order of FROM and SELECT. I think that would satisfy many people who are bothered by the syntax.

andrew_lettuce3mo ago

You might already know this but in relational algebra the select is SQL's from and the projection is SQL's select, which makes more sense. I always preferred the linq syntax with the from first too

cyanydeez3mo ago

Its not the only thing needed. To do anything, first you need major databases to implement.

hbarka3mo ago

Procedural language fanatics have been trying for years to overturn the best declarative language for relational data.

ux2664783mo ago

That would be the domain of logic programming languages like Prolog. SQL and its dialects are more for very specific and restricted applications of relational calculus, not general languages for expression of relations, conditions and categories.

esafak3mo ago

PRQL is declarative. They are just heeding the maxim "If it's broke, fix it".

otabdeveloper43mo ago

Typing fields before table name is like the least bad thing about SQL and doesn't need fixing.

data_ders3mo ago

what do you think is the "most bad" thing about SQL?

1 more reply

j / k navigate · click thread line to collapse

58 comments

jelder3mo ago

DuckDB had the right idea: just allow some flexibility in the relative order of the `select` and `from` clauses, and make a few other concessions for ergonomics. This then becomes valid:

    from events      -- table is first, which enables autocomplete
    select
        count(),     -- * is implied, easier to type
        customer_id, -- trailing commas allowed everywhere
    group by all     -- automatically groups by all non-aggregate columns
    order by all     -- orders rows by all columns in selected order

https://duckdb.org/docs/stable/sql/dialect/friendly_sql

andrew_lettuce3mo ago

thesz3mo ago

SQL is not a pipeline, it is a graph.

This is why pipeline replacements for SQL are more or less futile efforts. They simplify simple part and avoid touching complex one.

[1] https://en.wikipedia.org/wiki/Unreal_Engine#Verse

data_ders3mo ago

TIL about Verse looks cool I'll have to check it out.

> SQL is not a pipeline, it is a graph.

Maybe it's both? and maybe there will always be hard-to-express queries in SQL, and that's ok?

the RDBMS's relational model is certainly a graph and joins accordingly introduce complexity.

thesz3mo ago

Pipeline is a specific kind of a graph.

Yes, there will always be hard-to-express queries, the question is how far can we go?

snthpy3mo ago

Thanks, I'll check out Verse.

I haven't seen anyone make the point about graphs before. FWIW PRQL allows defining named subqueries that can be reused, like J1 and J2 in your example.

jnpnj3mo ago

Crazy to think that Fortnite might unleash a new population of people who toyed with functional-logic as their first paradigm.

lloydatkinson3mo ago

Does it really help to call SQL a graph?

data_ders3mo ago

right? like it's a graph and a relational model query and a pipeline and a language and an abstract syntax tree and declarative logical plan

thesz3mo ago

It does. Just like any other programming language.

lloydatkinson3mo ago

May as well call everything a graph at that point; meaningless.

1 more reply

dewey3mo ago

Every time I see these layers on top of SQL I think: Just use regular, boring SQL

data_ders3mo ago

I'm as big a SQL stan as the next person and I'm also very skeptical anytime anyone says that SQL needs to be replaced.

At the same time, it's challenging that SQL cannot be iteratively improved and experimented upon.

IMHO, PRQL is a reasonable approach to extending SQL without replacing SQL.

I've blogged about this before [3].

[1]: https://github.com/google/googlesql [2]: https://substrait.io/ [3]: https://roundup.getdbt.com/p/problem-exists-between-database...

itishappy3mo ago

Does anybody just use "regular, boring SQL" in practice though? All queries I have seen are loaded with regex and other non-standard extensions.

Is there even a db vendor that offers full ANSII SQL support? Last I'd checked the answer was no.

dewey3mo ago

In my case I consider Postgres / MariaDB "regular, boring SQL".

itishappy3mo ago

The problem persists, as Postgres and MariaDB use incompatible SQL dialects, right down to (imo) core concepts such as how to specify an automatically generated primary key.

1 more reply

kubb3mo ago

Complex queries in SQL can quickly get out of control.

The fact that you need to replicate the same complex expressions in multiple values that you select or multiple parts of a where clause is bad enough.

That there’s no way to pipe the result of a query into another query is just adding insult to injury. (Just create a custom view bro).

But if technology competed in quality and not in vendor lock in, we wouldn’t have to deal with C++ or JavaScript.

tomjakubowski3mo ago

DuckDB's "friendly SQL" variant fixes some of these little problems with SQL, including giving the ability to use a column alias in WHERE clauses.

https://duckdb.org/docs/stable/sql/dialect/friendly_sql

Taikonerd3mo ago

Google's "pipe syntax" is a similar idea: [0]

[0]: https://cloud.google.com/blog/products/data-analytics/simpli...

hardwaregeek3mo ago

There’s probably a good reason why not, but I’d love a query language with sum types. They just feel like a natural way to model a lot of data

Taikonerd3mo ago

Is this project stalling out? The last post on the "posts" page is from March 2023. But the last commit to the git repo was last week...

maximilianroos3mo ago

Maintainer here!

Indeed we're doing fewer new features (and haven't posted to the posts page in a long time, as you noticed).

But it's still maintained, folks are still using it, if anyone finds bugs in simple-to-moderate queries then we'll fix them.

marcoglauser2mo ago

Thank you for building and maintaining PRQL! I'm surprised to hear that growth stalled due to LLMs.

I just found out about PRQL yesterday! I was looking for a query language that is more token efficient and easier to reason about for LLMs than SQL.

PRQL looks amazing for data analytics agents. Our first few test are quite promising.

Taikonerd3mo ago

Hi Maximilian -- nice to hear from you!

I'm sad to hear that about LLMs. I sometimes wonder if the software world is going to be "locked into" our existing languages, because it's what the LLMs can work with.

FWIW, I think the PRQL syntax is beautiful.

maximilianroos3mo ago

Thanks!

I think at the moment that's indeed the case...

But also maybe that will change — LLMs can learn new languages faster than people, and can _write_ new languages much faster than people. So wide confidence bounds for the future!

jeltz3mo ago

Taikonerd3mo ago

jeltz3mo ago

I was only thinking about SELECT queries when I wrote ny comment because those are the hard things to implement.

andrew_lettuce3mo ago

When it comes to SQL it's the select that's by far the majority of the work though, the hard work with mutating operations is on the database implementation, not really the syntax or query plan

zerkten3mo ago

latexr3mo ago

The title of the submission is literally the first line on the website.

chuckadams3mo ago

For the first half of the 90's I pronounced Linux as "LINE-nucks". Then while he still had a thick accent, Linus told us all how he pronounced it "LEE-nooks".

rgovostes3mo ago

> I wrote SQLite, and I think it should be pronounced "S-Q-L-ite". Like a mineral. But I'm cool with y'all pronouncing it any way you want. :-)

— D. Richard Hipp

dmitOP3mo ago

I mean, as someone who grew up pronouncing it "Ess-Cue-Ell", I wish I learned earlier on that "Sequel" was the intended pronunciation. :)

yhavr3mo ago

Yes, in Ukrainian/Russian PRQL can be easily read as "prikol" (joke/gag/quirk). But I guess the best name would be "perkele" (emotional, like "damn") in Finnish.

tremon3mo ago

ajuc3mo ago

Nobody calls it sequel in my country.

Even people who know because then they have to explain it which wastes time for no benefit.

latexr3mo ago

Did know PNG is supposed to be pronounced “ping”? I don’t know anyone who chooses to do that, even if they know.

randallsquared3mo ago

I pronounce PNG "ping". Also JPEG as "jay peg" but, counter to the creator's intention, GIF with a hard "g".

1 more reply

WorldMaker3mo ago

(I've always pronounced PNG "ping". It's interesting that there is a split there, I'm not sure I would have expected a large one.)

Avshalom3mo ago

I continue to pronounce it S-Q-L... and G-U-I; generally I pronounce most things as initialism and I'm right to do so.

bob10293mo ago

"Pipelined" SQL already exists in the form of common table expressions. I don't know of any providers where this is not available. SQLite has had support since 2014.

data_ders3mo ago

I agree that CTEs help solve the problem of being able to read a SQL query from top to bottom, but I wouldn't say they're a panacea!

Personally, it's weird to me that `FROM` (scan) comes after `SELECT` (projection). IMHO the datasource should come first!

CTEs don't solve this problem they just let you chain multiple SELECTs together.

A real use case is that it would allow intellisense to kick in a lot earlier!

Instead you have to write `SELECT * FROM my_table` and only after can you edit the `*` and get auto-complete suggestions of the columns from `my_table`

bob10293mo ago

> CTEs don't solve this problem

They kind of do in my head. "WITH" reads to me exactly like the datasource you are looking for.

infogulch3mo ago

This looks pretty nice to use!

How does it work if you want to join multiple complex subqueries?

How far can a new query language like this go? Could this be added as a native query language in e.g. postgresql?

jdkfjdo3mo ago

Sometimes I wonder if the only thing needed in SQL is to switch the order of FROM and SELECT. I think that would satisfy many people who are bothered by the syntax.

andrew_lettuce3mo ago

You might already know this but in relational algebra the select is SQL's from and the projection is SQL's select, which makes more sense. I always preferred the linq syntax with the from first too

cyanydeez3mo ago

Its not the only thing needed. To do anything, first you need major databases to implement.

hbarka3mo ago

Procedural language fanatics have been trying for years to overturn the best declarative language for relational data.

ux2664783mo ago

esafak3mo ago

PRQL is declarative. They are just heeding the maxim "If it's broke, fix it".

otabdeveloper43mo ago

Typing fields before table name is like the least bad thing about SQL and doesn't need fixing.

data_ders3mo ago

what do you think is the "most bad" thing about SQL?

1 more reply

j / k navigate · click thread line to collapse