Against SQL (opens in new tab)

(scattered-thoughts.net)

499 pointsdeafcalculus4y ago339 comments

339 comments

I think the problem of this essay is that it's overly technical: only those versed well enough in SQL will really care to read the whole thing, and if they are already at that level, either they accepted that "SQL will get the job done in the end", or they learned to live along it and now even kinda embrace it, and are happy to write about how the examples are very poor and dismiss the critique based on that, when the essay kinda explains it main point pretty well:

>> The core message [...] is that there is potentially a huge amount of value to be unlocked by replacing SQL

To me, a lot of people defends SQL saying that "perfect is the enemy of good" and that SQL simply works. Not the favourite of anyone, but everyone kinda accepts it.

And yeah, it's true. People use SQL because it's good enough, and trying to reinvent the wheel would take more work (individually speaking) than just dealing with SQL as it is right now. For large organizations where the effort could be justified, all your engineers already know SQL anyway, so it's not so great either.

But for something so relevant as relational databases, perfect is not the enemy of good. We do deserve better. We generally agree that SQL has many pitfalls, it's not great for any kind of user (for non-technical users, a visual programming language would work well here, more like what Airtable does, closing the bridge between spreadsheet and hardcore database, and for technical users, it does feel unwieldy and quirky). We should be more open to at least consider critiques and proposals for better. We might find out that people, from time to time, are making some good points.

barrkel4y ago

I think I'm experienced enough to understand the article, and I agree. I've written multiple optimizing SQL generators (altering generated SQL to access better plans), and rewritten hundreds of queries for better performance, which involves trying many semantically identical rewrites of the same query.

I agree with Jamie. I think SQL is irritatingly non-composable, many operations require gymnastics to express, and I'd like a more expressive and regular language to write queries in.

I also suspect that such a language would be harder to optimize, and might be more practical to implement if it was less declarative and closer to the plan.

I also think that as you scale up, offloading computation to the database is a false economy. It's closer to the data, but the database itself is a bottleneck. As you scale, you want to limit the complexity of your queries. But this is actually an argument for things like Materialize, i.e. making copies of the data elsewhere that bake in some of the computation ahead of time.

asavinov4y ago

> I think SQL is irritatingly non-composable, many operations require gymnastics to express

One approach to radically simplify operations with data is to use mathematical functions (in addition to mathematical sets) which is implemented in Prosto data processing toolkit [0] and (new) Column-SQL [1].

[0] https://github.com/asavinov/prosto Prosto is a data processing toolkit - an alternative to map-reduce and join-groupby

[1] https://prosto.readthedocs.io/en/latest/text/column-sql.html Column-SQL (work in progress)

spicybright4y ago

What is your opinion on abstractions on top of SQL queries?

On paper, a more expressive language that spits out SQL queries sounds great, but I've never seen a single one not become a pain in the ass to use.

8 more replies

ampdepolymerase4y ago

Have you tried Logica?

https://logica.dev/

xarope4y ago

I always thought that was what hadoop had in mind with HDFS, i.e. move the storage and query engines closer in terms of spatial and temporal locality, albeit only if you partitioned/clustered properly.

Speaking of which, what has happened to hadopp and HDFS? Used to be flavor of the month about 10 years ago, but now I hardly ever hear people talk about it.

endymi0n4y ago

SQL is exactly like the QWERTY layout: A first quickshot with little design thoughts and unfixable architectural issues that‘s so widespread that everyone is used to it by now.

Trying to change to the Dvorak layout taught me a lot about enacting change on such a grand scale.

After a lot of hassle switching machines and OSes, typing on other user‘s computers, them typing on mine and general headaches among internationalization and inaccessible key combinations, I switched back to QWERTY in the end.

As said in other threads, there is no shortage of attempts to replace SQL. A lot of them are pretty good. But having learned SQL the hard way, I feel zero urge to learn another language from scratch right now.

It‘s why nearly all big databases eventually switched to it despite the FORTRAN vibe and its general ugliness.

Anyway, probably time to SET TRANSACTION LEVEL READ UNCOMMITTED and call it a day ^^

DeusExMachina4y ago

While I agree with your point, I wonder why that does not seem to be the case with programming languages.

For example, in iOS development (and, more in general, on Apple platforms), there has been a huge shift from Objective-C to Swift.

The same arguments should apply there. Swift is much better, but Objective-C got the work done, and many codebases were written in it, especially at Apple. And yet, the whole community switched pretty quickly.

One could argue that Swift was easier to pick up for newcomers. While that's true, I would then expect the argument to apply also to SQL alternatives.

So, what is the difference here?

8 more replies

hyperpallium24y ago

You're right that it got established early and network effects, but Codd had two goes at a relational query language before Chamberlin and Boyce had a go. So not literally "a first quickshot".

dan-robertson4y ago

I think the fact that many people were willing to try switching to mongodb despite the fact that it doesn’t use sql and had quite poor semantics should suggest that there would be some way for a better relational database query language (and engine) to catch on.

On the other hand, the fact that many new time series databases (and other engines, including Materialize which the author of the article worked on) wanted sql (or fake sql-like languages when they couldn’t manage sql) suggests that not being sql can be a big hindrance.

goodpoint4y ago

> typing on other user‘s computers

Off-topic, but changing layouts is easy. Also, using other people's keyboards is not the best for hygiene even before covid.

1337shadow4y ago

Dvorak-intl-code user here, SQL has so much more engineering than the QWERTY layout it's hardly comparable at all.

Aeolun4y ago

I don’t know. If someone gives me an interface like Prisma, and the backing database does not use SQL, but is otherwise the same, that just removes one needless translation step, and I wouldn’t have to do anything.

BeFlatXIII4y ago

I find the fact that others cannot borrow my computer to be a major plus of using non-QWERTY layouts. Your point about keyboard shortcuts is spot-on: forget hjkl navigation in vim if you’re not QWERTY.

1 more reply

ako4y ago

There's a lot of talk about what is wrong with SQL, but i haven't seen something yet that is actually better than SQL for most use cases.

Stop fighting SQL so much, and just focus on bringing a better solution. If potential users see it has significant benefits they'll start using it.

trollied4y ago

I agree. I’ve seen people bashing SQL ever since I started using it 25 years ago, but I’ve yet to see an attempt to replace or supersede it succeed.

Many of these blog articles seem to be written by devs that have only really experienced the user-facing application side of things, and really don’t realise the sheer number of financial, analytic & business systems, that keep the world spinning round, that are all happily buzzing away using SQL.

Moaning about JSON manipulation in SQL is madness. A client should use appropriate joins to get the data from the database & then do the transformation. Storing JSON when you actually need to perform operations on said data makes 0 sense. It’s a huge overhead, an utter nightmare to index etc.

1 more reply

blacktriangle4y ago

In theory datalog is better than SQL. However the reality is that we have millions of programmer-hours invested in making SQL perform well where datalog has no where near this effort.

I think that's why NoSQL is where the gains are being made. You can't magically improve things with a query language without putting the effort into performance, but you can get huge gains by changing your assumptions about how data is structured.

1 more reply

Tarean4y ago

Inventing a new language and a new query engine and a new storage engine at the same time, competitive with the state of the art, is maybe just too much to be feasible.

There were multiple optimizing compilers which can do a lot of these asks, allowing composable queries that return nested types, but produce SQL queries. I think the pathfinder compiler had the most real-world use, it was meant to efficiently mix SQL and xquery to query SQL+XML documents stored in postgres. It had a c# linq frontend, but also fairly hefty compile times.

glogla4y ago

I think most of the criticism is about SQL not being the best programming language - a language in which pros build application, especially web applications. And SQL is truly not best at that. It is not the best API.

But in my world, SQL is much more of a Human Data Interface than Application Programming Interface. SQL is used for both ad-hoc and automated data querying and transformation. It is something manufacturing intelligence and supply chain experts and researchers and others learn to empower themselves to get to the data - and yes, you won't see any CEO running reports in SQL themselves, but it is not for programmers only.

Those people would not benefit from syntax with more brackets instead of words, easier way to define and call functions, or the ability to install libraries - in fact I think it would make it harder and less accessible for them.

The OP is right that for machine-to-machine interface, all the syntax baroqueness is not worth it. And of course, having more portability would be great.

But while machine-to-machine interfaces written by skilled developers who knows exactly what are they doing might actually be the most common use of SQL (since half the internet runs on Wordpress and MySQL and most smartphone apps might have SQLite in them), it is not where the majority of complexity or time is spent - that one is with the people working with data.

nytgop774y ago

Spot on.

blihp4y ago

Replacing SQL is a tough sell because at its core, it really does work pretty well for what it was designed to do. (i.e. not most of the things they've bolted onto it and the corresponding databases this century) But it's not a silver bullet and people should stop trying to jam things into relational databases that aren't a good fit. There are already a number of potentially useful alternatives out there depending on your particular problem.

In the event that nothing exists that is right for the job: don't worry about replacing SQL, worry about creating/using a matched DSL/API/(whatever is appropriate) and data store for the particular problem you're trying to solve that SQL isn't a good fit for. If you create something that solves its particular problem even half as well as SQL did for relational databases you'll probably have a winner in that space that people with the same problems would be happier using.

b33j0r4y ago

To me, SQL looks like something I should be using 79-char punchcards for.

Scalable databases are just so difficult that we’re still driving a ‘64 IMPALA

Most of this opinion comes from “SQL” being vendor-specific. Is JSON vendor-specific? Is anything else, that we actually use by choice?

Mad at you too, Graph DBs, for sending us on another snipe hunt by adding vendor-imposed innovations, because it makes the enterprise marginally profitable. It’s how the world works, I’m still not happy about it.

(Disclaimer: I possibly just traveled back to 2002 and said this on slashdot)

vbsteven4y ago

> Most of this opinion comes from “SQL” being vendor-specific. Is JSON vendor-specific? Is anything else, that we actually use by choice?

Two things that come to mind are Markdown with all its flavors and Regex with multiple engines.

edit: and to a lesser extent maybe C/C++ compilers and JS engines.

edit2: also JVM, Python and Ruby runtimes

But both edits describe technologies with an official spec and slightly different implementations. Markdown/Regex are more comparable to SQL because they have vendor-specific syntax.

1 more reply

ako4y ago

You're mixing up different things that don't nescessarily need to be addressed at the same time. SQL is the declarative programming language used to express what data you need.

What type of database engine is used to execute that query is independent of the language. It could be highly scalable or it could be focussed on single user, single process.

What is the problem that needs solving?

mamcx4y ago

> We do deserve better.

Totally! SQL is like shell, c and js that somehow generate this "never ever try to improve them for real, lets stay suffering for all eternity!"

And the AMOUNT of energy in mask them (transpilers, linters, "good practiques", tricks, etc) is a monumental waste of effort that cost so much than actually go and fix the core issues.

And in this case, SQL is the most trivial to fix. It could get simplified and streamlined far easier that is to replace C, for example.

And a lot of interactions to the DBs are using bridges already. Create a "good sql" is similar to create WASM and then for legacy reasons support in EOL fashion all the old sql and all the new go against the new way.

But this mean at least 2 or 3 major database vendors go for it (I dream: sqlite & postgresql).

P.D: I'm exploring building a relational language, so already have a simplified dialect for it: https://tablam.org/tutorial

vincnetas4y ago

Well there were a lot of atempts to relace SQL. And there will be more. But untill then SQL gets job done®. By the way, why "replace" let people choose. If you have better tool for the job, pick that over SQL.

moksly4y ago

There are probably a few of us who simply disagree. The fact that there aren’t libraries for SQL is a huge advantage in my book, and I say that as someone who’s overseen his developers move more and more from C# to Python because it has libraries that are simply easier to work with.

In this you might be able to guess my objection to many of OPs points. Because the key thing we look for when we decide to implement tech isn’t how good it is, but how easy it is to work with and more importantly, how easy it is to find other people who can pick up when someone’s career take them forward in life.

SQL is the one technology that has been stable enough over the years that no matter who we hire, we can always count on them being capable of working with our Databases. Yes, some of the more expansive functions, especially some of the recursive ones, take a little time to get accustomed to and there will always be some back-and-forth preferences in regards to stores procedures, but on a whole, it’s the one technology that never causes us any pain and never requires retraining.

I’m sure it’s not efficient, and OP is completely correct about the parts about the untapped potential, and I fully support people developing rational alternatives to SQL, I’m simply pointing out why SQL has been the power horse it has since, well basically, since we moved beyond punch cards and why it’ll likely still be so in 50 years.

Because at the end of the day, technology doesn’t really matter to the business. Cost vs benefit does, and as long as it’s easier and therefor cheaper to have people work with SQL (in the big picture), it’ll be what businesses work with.

The only reason you don’t see something similar in declarative programming (or even functional) is really beyond me but it probably has to do with how things evolved and how much easier it is to change that part of the tech stack compared to your databases. If we get a new hire who can do magic with JavaScript/Python/whatever we may consider allowing him/her to do that. We don’t want to, again because the big picture is easier to manage with fewer technologies, but it’s much easier to justify the cost-benefit of that compared to someone wanting to use a different database than the one in our managed cluster. Already you need more staff than that person to maintain it, as developers don’t run database server operations, aren’t part of the contingency staff and so on.

Like I said, I fully encourage the direction OP wants to go. Enterprise rarely invent things, we’d be happy with ball frames if that’s what made us the most money, so we need people like OP to invent things for us to adopt.

thrusong4y ago

I completely agree with you. Is SQL perfect? No. Have I accepted and embraced it? Yes, because it'll get the job done.

I also happen to really dislike how the author hasn't capitalized the syntax like SELECT FROM WHERE or CREATE TABLE which, to me, poorly affects the legibility and therefore makes me less interested in reading the argument overall.

MetaWhirledPeas4y ago

Among the various languages I use, why is SQL the only one that favors ALL CAPS EVERYTHING? When writing ad hoc queries I ignore that convention just to be ornery.

3 more replies

bmn__4y ago

It's not 1960 any more. We have colour video. We are allowed to highlight syntax with colours instead of the less legible all caps.

aliasEli4y ago

We certainly do deserve better, and the author made some good suggestions.

Unfortunately, building a new database is a huge project and there appears to be no party currently willing to sponsor it.

danbruc4y ago

I wonder how much of the limitation are necessary in order for the query optimizer to have any chance at finding a good execution plan. As you add more and more abstractions and more and more general computations in the middle of your queries, it will probably become harder and harder for the query optimizer to understand what you are actually trying to do and figure out how to do it efficiently. Are you not running the risk that the database will have to blindly scan through entire tables calling your user defined functions as black boxes on each row?

I would also guess that we could have a better SQL but I do not think it could and should look anything like a general purpose programming language because otherwise you might get in the way of efficient query execution. Maybe some kind of declarative data flow description with sufficient facilities to name and reuse bits and pieces.

And maybe you actually want two languages which SQL with its procedural parts already kind of has. One limited but well optimizable language to actually access the data, one more general language to perform additional processing without having to leave the server. Maybe the real problem of SQL lies mostly in its procedural part and how it interfaces and interacts with the query part.

masklinn4y ago

> I wonder how much of the limitation are necessary in order for the query optimizer to have any chance at finding a good execution plan.

Probably in exactly the opposite way: the limitations of SQL put a lot of work on the back of the query optimiser without allowing for said optimiser to easily reason about the queries, or for the writer to easily feed the optimiser (short of dedicated extensions e.g. Oracle's optimizer hints).

Or so I would think, I've never heard of SQL being praised for its optimisability.

ris4y ago

I think it's important to distinguish between what's highly optimizable in theory and what's easily optimizable in practise. The latter working here and now, and the latter being a (possibly perpetual) decade of compiler development away.

An example here is how, sure, in theory, JITs can outpace AOT compilation because they have all the information the AOT compiler has plus runtime insights. But the ability to truly do that always seems to be a decade of compiler development away, with many giving up on the idea entirely.

It's also important to consider what we're comparing SQL's optimizability against. If it's against typical NoSQL databases, most of which seem to favour a lower-level query specification, I can defend SQL's optimizability to the end - with SQL databases having the freedom to dynamically switch algorithms to adapt to the actual contents of the database at the time of the query. Something which, ironically, a stale optimizer hint (i.e. written with the size & shape of the database as it was last year in mind) can actually get in the way of. Not that I'm saying that SQL planners never produce stupid query plans of course.

1 more reply

taffer4y ago

Optimizability is one of the core ideas of the relational model. The others being data independence and a declarative query language. SQL is based on relational calculus and relational algebra, which in turn are based on first-order logic. The reason that every attribute in the relational model must be atomic is that first-order logic cannot be applied to nested sets. However, first-order logic is what makes query optimization possible in the first place. Thus, if one were to develop a query language that did not have these constraints, then one would lose optimizability.

1 more reply

Aeolun4y ago

> otherwise you might get in the way of efficient query execution

Maybe? But I kind of feel that the query optimizer often gets in the way of efficient query execution.

There’s so much stuff that’s semantically the same, but if you shuffle some strings around, the query is suddenly 10 times faster.

paulmd4y ago

I think the criticisms of the article are basically right. SQL sucks in a lot of ways, and the base SQL standard really sucks such that virtually everyone has extended it at least somewhat, but they've all done it in a completely nonstandard way so nothing is portable, and the standard is never officially updated anymore. Oh also it's a massively leaky abstraction and you may have to tune your query to the database anyway, like even just normally, not even something ported, hence hints/etc.

There are quite a lot of pretty basic things many programmers would want to do that just are way more stupid than they should be. One that irks me is things like "return me the top 100 items", "return me/join against the most event for item X in a history log", etc) that end up requiring way more shit than they should just because there's no standards-compliant way to say "select first X rows from ... where ... order by ..." or "join first row where .. order by ...". In the case of top 100 you just wrap it in another query, but for more analytical stuff like "join the most recent X for this item" you have to use window functions and the syntax fucking sucks.

Since you mention optimization, perhaps it would help to allow the abstraction to be peeled away and write clauses that express imperative commands. Like being able to write that what you want is an "index join table X on index Y", etc. That's sorta what hints do, but roll them into the language in a portable way. It could also allow the query planner to look at it and tell you "that's just not possible and you need to check your keys/indexes/queries/etc", rather than having to run it and guess what it's trying to do.

Because I kinda feel that's a lot of the tension for SQL queries (beyond the QOL stuff that everyone knows is stupid). It's a guessing game. You're trying to write queries that will be planned well, but the query planner is not a simple thing that you can easily predict. The closest analogy is probably something like OpenGL/DirectX where you're just guessing at what behaviors might trigger some given heuristic in the driver and there are a lot of unseen weights and flags. There are "equivalent" SQL expressions that are much heavier than another seemingly similar one (like "select from ... join ... where exists (select ...)" vs "select where x in (select ...)". There are operations that are mysteriously expensive because you're missing an index or the query isn't getting planned the way you expect.

The suggestion of a "procedural" expression is, I think, also probably correct for some situations. PL/SQL functions are extremely useful, just obnoxious and in some cases arcane to write (as someone who never had formal education in DBA). It would be even nicer if you could have an embedded Python-style thing that let you have "python" syntax with iteration and list comprehensions and shit, that represent DB queries using cursors and shit, and perhaps defer execution until some final "execute" command while transforming it all into a SQL query. Like C#'s LINQ but for Python, but instead of buffering streams of objects transform it into a database query. Transform operations/etc on fields into SQL statements that work on columns, etc.

Or java if you will. Imagine a Java Streams API that compiles down to planner ops. I know Hibernate Criteria and JPA exists, but skip that API and express it as streams: iterations and transforms and so on, and map that onto the DB. Being able to build subqueries, then attach them to larger queries, etc. That way they execute in bytecode rather than pl/java.

ako4y ago

Postgres allows you to run python in the database, Oracle has support for java, Postgres can do limit 100, you can build subqueries using views, and reuse these in larger queries...

Problem with query optimization is that it needs to be done at runtime, you can't optimize it in some procedure language easily. The optimal way to retrieve data depends on the number of records in your tables, the where clauses you use, the actual values you are filtering by, the records already in cache, etc.

99% of all programmers would not be able to program better performing queries when doing this with a procedural language or streams expressions or it would take them way too long.

1 more reply

adsharma4y ago

https://github.com/adsharma/fquery/ https://adsharma.github.io/fquery/

Uses python expressions and generates SQL.

Also does static typing, so you can run a type checker on the code

bob10294y ago

I feel like most frustrations with SQL boil down to fighting against a shitty schema.

When you are sitting in a properly normalized database, it is a lot easier to write joins and views such that you can compose higher order queries on top.

If you are doing any sort of self-joins or other recursive/case madness, the SQL itself is typically not the problem. Whoever sat down with the business experts on day 1 in that conference room probably got the relational model wrong and they are ultimately to blame for your suffering.

If you have an opportunity to start over on a schema, don't try to do it in the database the first few times. Build it in excel and kick it around with the stakeholders for a few weeks. Once 100% of the participants are comfortable and understand why things are structured (related) the way they are, you can then proceed with the prototype implementation.

Achieving 3NF or better is usually a fundamental requirement for ensuring any meaningfully-complex schema doesn't go off the rails over time.

Only after you get it correct (facts/types/relations) should you even think about what performance issues might arise from what you just modeled. Premature optimization is how you end up screwing yourself really badly 99% of the time. Model it correctly, then consider an optimization pass if performance cannot be reconciled with basic indexing or application-level batching/caching.

samatman4y ago

The categories were made for man, not man for the categories.

A not-shitty schema is always to be preferred over a shitty one, to be sure. But for any data schema, there will be queries which cut against unexpected joints (joins?).

And SQL is bad for this. The entire Fine Article is a detailed exploration of how it's bad for this. A decent query language would keep the easy things easy (SQL is ok at this) and wouldn't make the hard things pointlessly difficult or impossible-except-at-the-application-layer.

Sometimes it's just a weird or one-off query, and sometimes there are so many of them that migrating to a better schema is indicated. Since it's difficult to do the dozen or so things the article sketches out, it's also difficult to iterate your way to a better schema arrangement.

The sort of waterfall database design you advocate (which, yes, think long and hard about your data!) always breaks down, because building business logic is a process of discovery. There are always missing pieces.

balfirevic4y ago

> I feel like most frustrations with SQL boil down to fighting against a shitty schema.

Which one of the frustrations from the article boils down to fighting against shitty schema?

bob10294y ago

None of them in particular. The author of this article seems mostly concerned with constructing pedantic strawmen and handily defeating them with clever demonstrations. Claiming SQL is not expressive is a fault on the part of the developer of the schema. The notion of a broader ER model and why a good one matters seems to have completely eluded the author, aside from a brief mention of 6NF (which is definitely not the right place to start the conversation on normalization).

It is really easy to take a negative stance on a technology and poke holes in it without having some grounding in reality. If you are dealing with a problem domain that has hundreds of types/facts/relations, and the need to author logic spanning all of these things all at once, you are going to have a hell of a time managing this with any other technology. Looking at SQL through the lens of toy problems is a massive mistake. You have to look at it through the lens of the worst problems imaginable to begin seeing the light.

I used to think SQL was some ancient trash until I saw the glory of a 40 table join in a factory automation setting. If you need dynamic, hard answers right now, SQL is the only option that makes sense at this kind of scale.

I felt it would contribute more meaningfully to the overall discussion on HN if I were to sidestep a direct critique and present a view of the problem from a different perspective.

pc864y ago

The vast majority of recursion in SQL is probably the result of a bad schema.

2 more replies

batty_alex4y ago

Yeah, this has been my experience, too. Sql and RDBs are a lot less frustrating when someone takes the time to actually do some design and planning.

Personal experience incoming: At startups, it's usually a mess because hiring someone who knows databases seems to always come so late in the game. At bigger corporations, well, hopefully the developers and database people get along and talk - otherwise, one of those teams is going to be a bottleneck.

> Model it correctly, then consider an optimization pass if performance cannot be reconciled with basic indexing or application-level batching/caching

So true. This also extends into general purpose languages, everything is so much easier when you take the time to model things correctly.

vaughan4y ago

Schemas and requirements evolve over time. There is no perfect schema.

Schema design always results in endless "are we gonna need this?" questions, and usually everything is postponed down the road because it adds needless upfront complexity.

All data are naturally triplets, and with the relational model we are forced to group the triplet attributes into tables. The difficulty arises because these groupings are often incorrect, and then difficult to change. There is always premature optimization, because the SQL query statements become insane with too much normalization.

A bigger problem is how sticky schemas are due to how difficult they are to change, and how difficult it is to update your code to support a schema modification.

I think these two problems need more attention.

We've gone too far down the code-first / code-generation ORM approach which makes tinkering difficult.

I think all database design should happen visually. MS Access style. You should design your schema alongside your data (as you suggested by using spreadsheets). But instead of then transferring that to code, it should be a living schema that connects to your IDE code refactoring. The more declarative queries are and the closer they are to what needs to be finally rendered the more chance of reliable refactoring. The more imperative code modifying the queried data, the more complex and difficult to refactor things become. A lot of the cruft in a codebase is when schema changes are additive because people are too lazy or too risk averse to refactor.

E.g. Think about a User having one address, and then needing to support multiple addresses. There's quick and dirty way by adding more columns or adding to a JSON column, or we add a new Address entity and keep the existing User address columns, or we delete those columns and migrate data to the new Address table - which is the cleanest way to represent this schema change. I think there are few who would do the last option though, and this is the problem. Hiding this behind an API also causes more problems down the line because our frontend data model starts to separate from our actual DB data model.

We need better tools, and less code.

kaba04y ago

Without any stakes in the game, Intellij IDEA does a pretty good job at integrating the application being written with the db schemas it uses. It can autocomplete string SQL queries, and I think it can do database refactors - and though it doesn’t automatically does it at code site, it does add warnings when a string query becomes incorrect.

NavinF4y ago

>The usual response to complaints about the lack of union types in sql is that you should use an id column that joins against multiple tables, one for each possible type.

>create table json_value(id integer);

>create table json_bool(id integer, value bool)

>create table json_number(id integer, value double);

No, the usual response is "Don't do that!"

99% of the time you either know the data types (so each JSON object becomes a row in a table where the column names are keys) or you don't know the data types and store the whole object as a BLOB

I'd be on board with adding unions to SQL, but I doubt I'd use that feature very often.

valenterry4y ago

But the world just works like that. There are unions everywhere. No matter how it's implemented under the hood, it should really be a first class concept in any language, including SQL.

goto114y ago

SQL already have first class support for union types in the form of relations. Adding union-type columns would actually be second class compare to this. You would have to add special-case operators and it wouldn't give you more power compared to just using relations.

That said, I'm not averse to the idea if someone can provide a realistic use case. The JSON example in the article is misguided though - you should not save the structure of the syntax of a serialization format, you should save the data model which is represented by the syntax.

1 more reply

thom4y ago

In most languages where unions are a first class concept, are they not generally warned against?

3 more replies

NavinF4y ago

I was talking specifically about the JSON example in the article. Needing to store objects in that manner is just a silly problem to have. Any solution will be slow or ugly or both.

You're right, unions are everywhere. Right now a human has to think about each union and how to represent it in a database. It would be really cool if I could store capnp objects like the one below and still get optimal query performance and aesthetics without thinking about it:

struct Shape {

  area @0 :Float64;   
  union {
    circle :group {
      radius @1 :Float64;
    }
    rectangle :group {
      width @2 :Float64;
      height @3 :Float64;
    }
  }
}

2 more replies

progre4y ago

I do love SQL and at least where I live (MS SQL Server) it can be made to run amazingly fast if you take some care with your queries and indexes. It's not portable though: as far as I know not a single one of the big sql vendors follows the standards 100% and more importantly, spending some time with one vendor will give you some habits that are sure to not work as well with another (cursor constructs are generally a death blow to performance in tsql but they are the way to do it on oracle for example). So I kind of agree with the author here.

But I also feel that maybe they are asking a bit much from SQL. The complaint that complex subqueries are complex... Well then don't use them? I would use WITH constructs in that situation because I find them easier to read but that's beside the point. I think its perfectly fine to pull out multiple result sets from simple queries and then do the complex stuff in your host language.

kinjba114y ago

> maybe they are asking a bit much from SQL

But this article is thought provoking to say the least. It follows the courtroom logic of holding the defendant SQL on trial for as much as possible. And SQL is guilty of a lot of crimes.

I do hope GraphQL and similar query languages become more prevalent and standardized, as it seems SQL could really use some stiffer competition.

goto114y ago

SQL is often conflated with the relational model (hence the term NoSQL for non-relational databases), but the article is careful to explain that the relational model is great, but SQL is a clunky syntax/standard.

Going to graph databases is certainly throwing the baby out with the bathwater. The relational model was invented to address shortcomings in the hierarchical and graph database models.

3 more replies

Aeolun4y ago

I do not understand the comparison between SQL and GraphQL. As far as I can see they’re not close to the same. And GraphQL is also missing a bunch of really necessary things.

ako4y ago

GraphQL is very limited in expression power compared to SQL. I doubt you could do 20% with GraphQL compared to what you can do with SQL.

xenomachina4y ago

Are you also talking about switching from a relational model to something graph-based, or is there some way one could use GraphQL with a relational database?

1 more reply

initplus4y ago

GraphQL is stiff competition, in the sense of "stiff as a corpse". Yes, it has the words "query language" in the name, but that's where the similarities end.

In GraphQL, you can select fields of JSON objects, optionally a field may have parameters which affect the values it returns in unspecified ways. That's it. Because of this design, unlike in SQL where you are concerned with modeling the structure of your data, GraphQL also requires you to think about the API all users will use to access the data. In SQL, I can write a table defining the structure of my dataset, and then users of that table can perform arbitrary queries when they know what data they need (aggregate with a month average grouped by account ID, filter to only rows where x = 7, JOIN this to grab some data from some other table etc.).

GraphQL has no aggregates (sum, average...), no grouping, no joins, no sorting, no filtering, other than what you manually design each using parameters at schema design time. Good luck anticipating the use cases of every future consumer of your data. Miss one? Better dig back into your implementation code & implement that use case each & every time a new one comes up.

The only part of GraphQL that is standardized is the query syntax. In SQL, the actual underlying relational data model exists and the syntax of queries exists within that context, not so in GraphQL land. In SQL, I define my data structures, and users can write queries and access the data. But GraphQL throws up it's hands and says "not my problem, try one of these libraries that ask you to implement your own custom data access functionality for all your data types".

OK, so it's a rubbish query language, but even the graph part of the name is misleading. Assuming that you even have a domain that it makes sense to model with a graph of types, GraphQL provides you no tools for dealing with the backend complexity of such a design. Because the syntax is so simplified, there is no mechanism within the syntax to define rules about relationships between types. For example, imagine a simple parent/child relationship. There is no mechanism within the syntax to tell GraphQL that for parent X, parent.child = parent.child.parent . So you can't even think about writing a GraphQL query optimizer, because there isn't enough information about the structure of the data encoded into the schema or query to do so.

So in practice no GraphQL implementations that I know of have anything resembling a query optimizer - someone asks for cart.item, and then item.cart for a cart with 1000 items? Have fun re-requesting the cart from your DB 1000 times (yes you can cache the item by ID to save a DB lookup, but we shouldn't even need to hit cache here! Every programmer involved knows the data is the same, it's just dumb GraphQL has no clue about the details of the relationship).

1 more reply

atupis4y ago

My bet is that somepoint we are going to get very good NLP model where you feed tabular data, and then you are going query througt normal language. That would like be pretty big change and would eat pretty big share of sql market share.

qPM9l3XJrF4y ago

>spending some time with one vendor will give you some habits that are sure to not work as well with another

It seems like having multiple vendors is only valuable if their products are to some degree differentiated, no?

aenis4y ago

For me the opposite is true. The value of multiple vendors with a similar product is that it drives prices down and liberates me from a lock in. I genrally avoid managed services that are not available from all three major cloud providers, and put abstractions between my suff and their so I can move my workloads around.

As to SQL. Its a weird feeling to read all that; I've spent 15 years working with very large relational databases -- and about 5 years ago I ditched it all in favor of using key-value object stores, wasting some storage but saving a metric ton of manhours of development work. Not looking back.

1 more reply

yellow_lead4y ago

I agree with your last point to an extent. There's something about "inexpressiveness" that can actually be good, in that it requires you to simplify your data model. However, I imagine as it gets more complex, SQL becomes completely unwieldable. You basically have to use a NoSQL db as the author points out.

paulmd4y ago

Ideally a RDBMS can also be a NoSQL DB. There is no reason it can't/shouldn't, if you have JSON-formatted columns.

Mostly the noSQL pattern is a mistake because you almost certainly have columns that recur reasonably frequently, but if you do have high-dimensional data or want to store processed documents/etc, you can represent them as a JSON/JSONB in postgres/etc, and even run indexes or queries on them.

AtNightWeCode4y ago

Use a tool to generate the SQL schema. Then you can generate it to different DB types. Stored procedures is what breaks things in my experience.

pjmlp4y ago

Yeah, however it is kind of strange to single out SQL, when we have examples like a very famous kernel that can only make use of specific compiler, or Web APIs that are only implemented by a specific browser.

DangitBobby4y ago

I don't see it as strange. They probably spend lots of time mucking around in SQL trying to force it to things it's not good at, so now they have strong opinions on SQL. They specifically state that it's too late to turn back the clock on SQL, but it's not too late to expose RDMS access with a better language, or with a sufficiently flexible API to allow arbitrary QL implementations. They aren't shitting on anyone here either. They say that the assumptions made when SQL were designed ultimately didn't pan out:

> The original idea of relational databases was that they would be queried directly from the client. With the rise of the web this idea died - SQL is too complex to be easily secured against adversarial input, cache invalidation for SQL queries is too hard, and there is no way to easily spawn background tasks (eg resizing images) or to communicate with the rest of the world (eg sending email). And the SQL language itself was not an appealing target for adding these capabilities.

> So instead we added the 'application layer' - a process written in a reasonable programming language that would live between the database and the client and manage their communication. And we invented ORM to patch over the weaknesses of SQL, especially the lack of compressibility.

birdyrooster4y ago

maxrev174y ago

Any more info? ;)

1 more reply

james_woods4y ago

SQL was not made for programmers alone. It has been invented also for not so technical people so that verboseness and overhead is part of the deal.

>When Ray and I were designing Sequel in 1974, we thought that the predominant use of the language would be for ad-hoc queries by planners and other professionals whose domain of expertise was not primarily data- base management. We wanted the language to be simple enough that ordinary people could ‘‘walk up and use it’’ with a minimum of training.

https://ieeexplore.ieee.org/document/6359709

zabzonk4y ago

> We wanted the language to be simple enough that ordinary people could ‘‘walk up and use it’’ with a minimum of training.

In that case it has been an abject failure. I have been using SQL since the mid 1980s (so pretty much since the start of its widespread adoption) and I have never met "ordinary people" (by which I assume intelligent business-oriented professionals) who could (or wanted to) cope with it.

I like it, but the idea of sets does not come naturally to most people (me included). But I once worked with a programmer who had never been exposed to SQL - I leant him an introductory book on it and he came in the next day and said "Oh, of course, it's all sets, isn't it?" and then went on to write some of the most fiendish queries I have ever seen.

papito4y ago

I don't think this is disputed that the original goal of SQL was a flop. The designers grossly underestimated the technical chops of a layman. However, I would argue that us tech people did benefit from that original goal of simplicity. I mean, SELECT first, last FROM employees WHERE id = 10 is not too bad at all. Kind of elegant, no?

If SQL was designed "by engineers for engineers", you would be using esoteric Git commands just to blow off steam.

2 more replies

hvocode4y ago

This is very important. I personally like old technologies and languages where the designers considered users who had limited technical skills, and most importantly, assumed that those users had no interest or need to improve their technical skills. Removing the assumption that users are willing to increase their technical sophistication forces a designer to think more about what they're designing. Looking at older languages is interesting - for all their warts, they do feel more intentional in their design than modern things that have a clear developer-centric mindset baked in.

rswail4y ago

It's similar to the discussions around COBOL back in the day. What's interesting is that the primary "end user" language is Excel formulas. Who would have thought a declaratory relational language with matrices would "win" that battle.

Any arguments that "users will write their own" languages are basically flawed. Users want results, if there's no alternative, they'll do it themselves, in the simplest, but probably most inefficient way possible.

sam_lowry_4y ago

This is a recurring theme. I was once amazed to learn that ClearCase was made for lawyers to use. The name makes suddenly a lot of sense!

erezsh4y ago

I share the author's point of view, which led me to start a new relational programming language that compiles to SQL. It's a way to build on existing databases, like postgres or mysql, with all of their advantages, but improve on many of SQL's limitations.

If that sounds interesting, you can find it here: https://github.com/erezsh/Preql

culebron214y ago

I was sketching a syntax for python to work with pandas, and avoid extra annoying code, and surprisingly some of the elements and approaches I see in your code are similar. My syntax for subsetting was

   df{\*, -column1}
   df{column1: new_name1, column2, column3: new_name3}

I noticed in working with pandas you often need to do lookups into other dataframes. It's partially solved by assignment operator if left field equals right index, or .map method, the same way.

But often you need a lookup with merging by an arbitrary column, then grouping and aggregation by the left table items. This is partially doable without special functions. But this can be a killer feature if one makes this for spatial joins.

Very often you need to do the following:

    gdf1 = geodataframe of points
    gdf2 = geodataframe of points

need to make gdf1.geometry.buffer(500 m) and sjoin it with gdf2.geometry, then lookup gdf2 fields and bring them to gdf1, and keep original gdf1.geometry (points). This operation takes a dozen of lines and leaves lots of variable garbage if not put into a separate function.

But IMO it could be condensed to something more natively supported.

nextaccountic4y ago

What I really wanted was something compatible with SQL, but that augmented it (and compiled to plain sql). Like typescript is for javascript, but instead of adding a type system, adding constructs for better composability.

Something like.. BQL http://intelligiblebabble.com/a-better-query-language-bql-la... which went nowhere https://github.com/lelandrichardson/BQL

dvdkon4y ago

Looks interesting. I've been thinking about trying this myself and one of my goals has been to create a language that's easily introspectible. I think it's much more important for a query language as opposed to an application language, since you'll want to see what code in the former does without running it for integration into application code.

My approach has been to design a very simple (in the lisp sense) syntax, kind of the opposite to SQL where everything is hard-coded into the parser. I've adopted a "pipeline programming"-like approach, where the operations are just (special) functions, which also helps with extensibility. Have you thought about this? From a cursory look, it seems Preql does rely on keywords. Admittedly fewer than SQL, but it also doesn't cover all of its features.

erezsh4y ago

I don't have a lot of experience writing Lisp-y code, so perhaps I'm speaking from ignorance, but I think there is a reason that syntax never gained huge traction. Imho a syntax that's concise and expressive is important for effective coding. Having operators for the most common operations is just a small complication that yields a big reward.

Having said that, the amount of keywords and operators that you see in Preql right now, isn't likely to grow by much. I have the basic and common operators covered, and for the most part, the rest can be done with regular functions.

I agree about introspection, which is why in Preql you can ask for the type of any value, list the columns of a table, get the resulting SQL of an expression as a string, and so on. And certainly more can and should be done towards that.

1 more reply

PudgePacket4y ago

This is a great article and you can tell the author has deep experience with SQL from the way they speak and the other projects they're involved in.

I think many of the comments here are missing the point by saying "Oh you can get around that issue in that example snippet by doing X Y Z". Sure there are workaround for everything if you know the One Weird Trick with these 10 gotchas that I won't tell you about... but that just makes the authors point.

We can do better. We deserve better.

What could things look like if you could radically alter the SQL language, replace it altogether, or even move layers from databases or applications into each other?

Who knows if it will be better or worse, but I'd like to find out.

latte4y ago

SQL and the relational model mostly works well and it's probably not practical to redo the enormous amount of work that was invested in SQL and its implementations and extensions.

As someone who frequently used SQL for analytics and less frequently for app development, I would gladly use a language that would transparently translate to SQL while adding some syntactic niceties, like Coffeescript did to JS:

- Join / subquery / CTE shortcuts for common use cases (e. g. for the FK lookups that are mentioned in the article)

- More flexible treatment of whitespace (e. g. allow trailing commas, allow reordering of clauses etc.)

And for the language to be usable, it would probably need: - First class support for some extended SQL syntax commonly used in practice (e.g. Postgres's additions)

- integration with console tools (e.g. psql), common libraries (e.g. pandas, psycopg2) and schema introspection tools

- editor support / syntax highlighting.

It would probably be good to model the syntax of that language on some DSL-friendly general purpose language (like Scala, Kotlin or Ruby).

laurent1234564y ago

Isn't it what query builders, such as Knex.js, are for?

latte4y ago

Basically it would be ideal to have a query builder that is easily integrated with shells and notebooks (so that it can be used outside the context of writing programs in a specific languages) and that is accepted across the community.

js4ever4y ago

After a little bit more than 2 decades of coding, SQL is nearly the only thing that was constant in my career.

It's a skill I used every working day, I'm pretty sure I will still use it in 20 years.

On the other side, tt's very unlikely that the ORM 'du jour' will exist in 3 years from now.

exdsq4y ago

I think you're right. I hope your name isn't though!

js4ever4y ago

Haha, in fact I believe js (vanilla, not framework of the week) is probably as durable as SQL

smitty1e4y ago

> First, while SQL allows user-defined types, it doesn't have any concept of a union type.

Isn't a union type essentially a de-normalized field?

This seems like attacking arithmetic operators for their lousy character string support.

Weren't XML databases (briefly) a (marketing) thing some decades back?

One idea might be to have everyone integrate jq[1] into their database engines. My understanding is that one can make the JSON do back flips with jq. Then we can move to complaining about queries that appear to have been written in Klingon instead of boring ol' SQueaL.

[1] https://stedolan.github.io/jq/manual/

masklinn4y ago

> Isn't a union type essentially a de-normalized field?

No? You have to denormalize to emulate unions when they're missing. Sum types are a fundamental category of types, that SQL only supports product types is a problem you have to work around.

smitty1e4y ago

See mannykannot's reply.

The argument for union types seems to get weak when one asks: how do we index their components?

Because there seems little middle ground between needing discrete fields and safely just parking the data as a memo field and deferring the management to the application.

Unix win by letting the system utilities specialize.

SQL need not be "one language to rule them all".

1 more reply

mannykannot4y ago

It is more fundamental than that: a union violates the fundamental tenets of the relational model. In that model, each element of a relation must be a simple value of some type.

If you are working with unions and, for whatever reason, want to put unions into a data store, this may seem to be a capricious limitation, but this rule, together with the other principles Codd stipulated, are the basis of the semantic transparency of the relational model, in which each relation expresses an atomic fact about the world of discourse. This, in turn, is the basis of its desirable features, such as its openness to ad-hoc querying, and the applicability of referential integrity constraints.

To look at it in more concrete terms, suppose you had an attribute with a union type: the meaning of any particular bit-pattern would be ambiguous - it might depend on the value of a different datum, or, worse, be context-dependent in a more complex way. This is going to make querying more complicated, whether you are using SQL or some replacement for it, and while one or two cases may seem expedient and harmless, these are the sort of accommodations that, as they accumulate, lead to programs becoming hard to understand and brittle.

At this point, I am unsure whether it would be acceptable, within the relational model, to have types that are, structurally, a union together with a flag disambiguating it. On the one hand, this would avoid the problem of disambiguation I mentioned above, but if it were implemented in such a way that the flag value is independently queryable and/or settable, that would seem to open a back door to let in all the seems-expedient-but-ends-badly design choices that raw union types would facilitate.

SPBS4y ago

SQL is a pretty warty implementation of relational databases (with non-composability being its primary sin IMO), but we're stuck with it at this point. A new querying DSL that fixes all of SQL's flaws is only half the story, getting enough programmers on the planet to buy into it is another half. To do that you'd need this new piece of software to at least be as fast and as battle tested as existing SQL databases. Even the new generation of massively-scalable relational databases stick with some form of SQL instead of inventing a new DSL because of the sheer momentum behind this sorry syntax.

sizzler4y ago

Anybody can criticise SQL, programming languages, etc. It isn't hard and it doesn't make you better than the people that wrote them. When someone says "this thing that has been working fine for decades needs to be completely replaced" and barely mentions any alternative, I don't think they understand the process involved in replacing things or the terrible (non) proposition they are offering.

Increment on SQL, write a translation layer, and see if people adopt it. Maybe 10 years from now your idea will be more popular than standard SQL. Most likely your idea sucks though and you will stay in the easy land of criticising things.

The front-end is infinitely more complex than SQL on the backend. I write fairly common web applications and the SQL part is maybe 10% of my time, and very easy. React is where I spend most of my time. I don't have any problem that really needs to be solved. SQL works for me even though it isn't perfect. Any imperfections can most likely be incrementally fixed. I use tagged templates in JavaScript to deal with parameters, composability, and reusability.

The fact the the author highlights GraphQL as supposedly the great alternative shows how ridiculous the proposition is. GraphQL does basically nothing. It is 10% of the functionally of SQL.

ngrilly4y ago

When I compare the codebase of a tool like gqlgen (a GraphQL library in Go) with the code base of PostgreSQL or SQLite, I'd say even 10% is generous.

pjungwir4y ago

It would be really cool if databases had an Option<T> type. Then you could remove all the NULLs. Although you can mark a column as NOT NULL, that restriction doesn't "travel": it isn't present for function inputs/outputs, subquery results, etc. Adding it to the type system gives you a lot more mileage. And then joins could be option-aware: an inner join would have outputs matching the input types, but an outer join would have Option outputs (for at least one side).

I'm curious how much work has been done on optimizers for Tutorial D or other D variants. It looks way nicer to use, but I wonder if it is easier to stumble into pathological cases.

zX41ZdbW4y ago

This is how it is done in ClickHouse. It has Nullable(T) type. The functions of non-Nullable types will return non-Nullable types (except some specific functions).

https://clickhouse.tech/docs/en/sql-reference/data-types/nul...

fnord774y ago

you can use COALESCE to remove nulls

bmn__4y ago

Explanation for your downvotes: you misunderstood GP's post, he refers to removing the concept of null as it is currently specified from the language, not removing null values from data.

einpoklum4y ago

> It would be really cool if databases had an Option<T> type. Then you could remove all the NULLs.

A nullable T column _is_ an Option<T> column, with NULL representing "Empty" or "No Value".

Teafling4y ago

OP explains why this is not the same in all the other sentences of their comment

tome4y ago

The only way out that I can see is to design embedded domain specific languages (EDSLs) that inherit the expressiveness, composability and type safety from the host language. That's what Opaleye and Rel8 (Postgres EDSLs for Haskell do. Haskell is particularly good for this. The query language can be just a monad and therefore users can carry all of their knowledge of monadic programming to writing database queries.

This approach doesn't resolve all of the author's complaints but it does solve many.

Disclaimer: I'm the author of Opaleye. Rel8 is built on Opaleye. Other relational query EDSLs are available.

[1] https://github.com/tomjaguarpaw/haskell-opaleye/ [2] https://github.com/circuithub/rel8/

kthejoker24y ago

As someone from the analytics side who's been working with SQL for 30 years (First Choice, remember that?) (but who also wrote a fair share of ORM boilerplate), I find these debates fascinating .. but also kind of trivial, in the sense that SQL has a lot of other pros and cons that app devs rarely consider.

Truly it is blind men evaluating an elephant.

Given SQL's roots as a human-friendly declarative interface, the only thing I see completely replacing it in the near future is a Copilot-style neural implant where you just think of the results you want.

ngrilly4y ago

A more pragmatic view in that article: https://blog.nelhage.com/post/some-opinionated-sql-takes/

quietbritishjim4y ago

Thanks, that's a great article. It has just the right balance of some interesting things I didn't know with enough things I agree with that I believe it!

I agree with the desire for a data-based language, rather than text-based one as SQL is. A classic example of this is MongoDB: you can add a new filter by just adding a new entry to a dict in Python or object in JS etc. I think 99% of the reason MongoDB was successful, at least in the early days, was because of its data based API. (Polite request to all: please don't reply to this comment with pros/cons of MongoDB except it's query language.)

I especially agree with the point about having to trick query planners into using the indices you wrote. I get that sometimes it's nice to let the database engine cleverly choose the best strategy (dynamically building queries with a data-based API would be a case in point). But in other situations you'll have carefully designed the tables and indices around one or more specific queries, and then it's frustrating not being able to directly express that design in the code.

I don't have any experience with live migration of production databases (thankfully!) so that was interesting, especially the conclusion that MySQL is best for this, which I didn't expect. The idea of separating out the type system into lower-level "storage types" and higher-level "semantic types" was also food for thought.

ngrilly4y ago

I agree that a programmatic API semantically equivalent to SQL – for example like the JSON-based MongoDB API you mentioned – would go a long way making things easier for application developers and removing the need for ORMs (which I consistently avoid).

F1, Google's SQL database, uses Protocol Buffer in an interesting way: https://storage.googleapis.com/pub-tools-public-publication-...

glogla4y ago

SQL is not a great API but its a great human interface. JSON based query language is much better API but terrible human interface. So depending on your objectives, it might make things better or worse.

MySQL is strange choice but think I understand why the author picked it - from his other critique he seems to look at databases as a building block of hyper-scaleable applications, not as a tool for humans to do often-ad hoc things with data.

I would never recommend MySQL for "business data" - it had and possibly still has way too many footguns with regards to number behavior, character encodings and Unicode, date and timestamp handling, and so on - hell, it doesn't have a proper MERGE and it only got CTEs in the latest version.

But if you're using it as persistence store that barely more than key-value store, why not? I have no problem believing the author that that kind of use is more common.

rawoke0836004y ago

I've been thinking about this problem a lot, CRUDs, GraphQL,ORMs, Models etc. Mostly in the "CRUD-Like" environment. I have been thinking about a "client side SQL impl".

In most CRUD's we currently have on the backend layers and layers of software with ORMS, frameworks etc, and it all boils down to "Writing/Generating the correct(good-enough) SQL"

We now have added stuff like GraphQL, which if you squint hard enough (ok very hard) can be seen as being a SQL alternative(Language to get the actual data).

Maybe SQL + "GraphQL-Like" Layers should "evolve" into ONE common "data scripting language" ?

Maybe we have something like "ClientSide-SQL" - which can be a subset of ServerSide-SQL ?

We need the "TypeScript" of "data-querying" which can be run on the server,client, moon and my device, where one can also only define any "Types" ONCE.

Anywhoo - I think there is still a lot to be done, researched and discovered in this section of CS :)

vaughan4y ago

Yeh, this is lacking right now. GQL makes joins much easier to write than SQL, which is what you want to be using in your components. But GQL is not great for offline-support and caching. You want your frontend to know about how your data relates to each other. Your frontend GQL should query a local SQL db.

I'm not sure you need a full SQL implementation on the frontend though, as the data is not going to get all that large to need the optimizations it affords, but it would be nice to be able to use the same queries on your browser DB as your backend DB.

twodave4y ago

We have a general rule on our team that complex SQL is a code smell. In our project complex queries are usually an indication of a poor design.

Anything SQL that can be made simpler via dynamic generation (which is safe as long as you use proper parameters for user inputs) is favored over creating logical branches in queries. Anything that can be processed further quickly in memory in the app (mapping operations, string ops, ordering/filtering predictably small data sets, etc.) we tend to offload from SQL into something more suitable.

And we tend to solve a class of problem in our data layer and reuse those generalized patterns heavily. This makes our codebase predictable even when dealing with unfamiliar subject matter.

Of course there are always places where some complex query is necessary (especially when building reports), but if it’s status quo then you’re doing something wrong—-it’s only a matter of time until you end up with a performance nightmare on your hands.

CornCobs4y ago

I think the author makes a valid point about this though - that it is precisely because of the limitations of SQL that we think this way. If SQL was a much more expressive data processing language we may not even require an app layer.

I do agree with this - after all SQL is supposed to be the data layer - why should we think that data processing shouldn't happen there?

karmakaze4y ago

> complex queries are usually an indication of a poor design.

Can you give an illustrative example of one. I suspect that framing it this way biases designs away from 'poor ones that use complex queries' into one that foregoes other good aspects such as normalization. Sometimes the best design uses a complex query for something other than a report.

Design is not something that should be done by application of dogma and avoiding smells.

twodave4y ago

No, but by treating a SQL server as a way of storing and retrieving data efficiently FIRST then the times when a complex query is actually necessary tend to stand out better.

In reality most tables and queries start out simple enough, and poor schema choices are usually accompanied by poor architectural choices. It can be painful to come up with a decent migration scheme when the business needs change, especially if there’s fear/pressure involved, but often that’s going to be better than trying to keep the data layer the same/similar and shoehorning in data to represent new scenarios. This is what leads to a fragmented design IMO and allows the schema to diverge from the actual goal of efficient data storage/retrieval.

thinkr424y ago

Though verbose and somewhat strange at times, one thing I love about SQL is that the query statements read like a set definition from set theory. That declarative nature is pretty powerful IMO, sure there are hiccups but it is a different way of thinking.

WillDaSilva4y ago

The (mostly) declarative nature of SQL is not something the post is criticizing. You could have a good declarative language for relational database that doesn't suffer from the things the post criticizes.

christophilus4y ago

I agree. I also agree with the post. I’m a big fan of sql and write a lot of it by hand. TFA makes a lot of excellent points to which I could add quite a few more.

If we ever do get a replacement, I hope it retains the declarative set theory approach of SQL while addressing the warts.

benjiweber4y ago

> By far the most common case for joins is following foreign keys. SQL has no special syntax for this

You can use NATURAL JOIN

select * from foo natural join bar

Works as long as the keys are named the same. However, a lot of people have a habit of naming keys differently in the two tables.

vertere4y ago

And what if there is more than one way of joining two tables, or a table to itself?

A foreign key is effectively a reference to another column, but to de-reference it you have to tell the database which table and column it's a reference to. Every time. Even when this information is already specified in a foreign key constraint.

The author is talking about (not) being able to specify the "from" table and column without having specify the "to" table and column (i.e. tell the database how to de-reference it) on each query. A natural join removes the need to specify the columns, but still requires specifying both tables. So besides requiring a de facto single namespace for columns across tables and generally seeming like a footgun, it doesn't achieve the same thing.

faho4y ago

And it also breaks if two non-key columns are named the same.

This makes naming key columns differently a defence technique, so you stop people from using natural joins.

thom4y ago

I generally use a prefix for columns based on the relation name, but preserving the name of keys (so a foreign key to user_id is user_id, not order_user_id etc). You obviously can't use natural joins if you end up with _multiple_ foreign keys with different roles, but generally I find this a better way to live all round. Never having to rename five 'name' columns in some output to make it clear which is which etc.

rswail4y ago

or ON:

select * from foo join bar on (foo.x = bar.y) if the columns have a different name.

I tend to write my joins first, then use where clauses as filters. A select * from foo left inner join on (foo.x = bar.y) is semantically equivalent to foo, bar where foo.x = bar.y, but keeping the joins separate from the filters makes the query more clear.

croes4y ago

>what if we want to return the salary too?

>the only solution is to change half of the lines in the query

How about adding a second subquery for the salary.

kinjba114y ago

It is a toy example. Perhaps imagine a more realistic subquery that is much longer. Are you going to duplicate a 50 line subquery to get the salary when all you want is one more value? No, you'd probably want to restructure the query with a join or CTE instead. To the author's point, a large change relative to the gain.

croes4y ago

The author explicitly said "the only solution" and that's plain wrong.

pjmlp4y ago

No, I will write a SQL function for the subquery and call it from the main query.

gerbler4y ago

This example seemed wrong to me as well. You can have a subquery, or CTE that returns as many fields as you want and can join on the manager key

kinjba114y ago

An additional subquery and a CTE are both restructuring the query significantly, which is the author's point.

2 more replies

Izkata4y ago

The GROUP BY section is odd:

> You can use as to name scalar values anywhere they appear. Except in a group by.

  -- can't name this value
  > select x2 from foo group by x+1 as x2;
  ERROR:  syntax error at or near "as"
  LINE 1: select x2 from foo group by x+1 as x2;
  
  -- sprinkle some more select on it
  > select x2 from (select x+1 as x2 from foo) group by x2;
   ?column?
  ----------
  (0 rows)

Looking at that first one I'm just kinda like "well duh, there's nothing special there" - it doesn't work with ORDER BY either, you use that to rename columns (on SELECT) or tables (on FROM and JOIN).

And then it goes on to show ways to work around that:

> Rather than fix this bizaare oversight, the SQL spec allows a novel form of variable naming - you can refer to a column by using an expression which produces the same parse tree as the one that produced the column.

Instead of just... using the renamed column?

  select x+1 as x2 from foo group by x2;

wvenable4y ago

Sadly using the renamed column name in a group by or order by doesn't work on all database engines.

1 more reply

zX41ZdbW4y ago

This is also fixed in ClickHouse - you can set and reuse aliases in any expressions in the query.

nojvek4y ago

I agree with the Author. SQL is not a great query language. Almost every decently sized app I have written I have needed some sort of a query compiler so I don’t have to deal with nuances.

Also agree that GraphQL is a pretty fantastic language for working with graphs. And that relational databases are essentially graphs. Hasura is neat.

fbn794y ago

Admit have not read the article but has of my personal experience I think the hostility of developers vs SQL came from lack of fundamental formation and experience in declarative programming and full constant every day immersion in imperative programming.

vendiddy4y ago

The article isn't against declarative programming. The author alternative declarative solutions throughout the article.

79524y ago

I was in this state for years, thinking in an imperative way. The change for me was realising that your starting point is every possible combination of the rows in the tables. Just start with that (regardless of how massive) and then filter it down.

randomdata4y ago

The problem with SQL is the language, not the paradigm.

xupybd4y ago

This article promotes other relational alternatives and query languages.

thayne4y ago

> So instead the best we can do is add json to the SQL spec and hope that all the databases implement it in a compatible way (they don't).

Of course they are incompatible. That's just par for the course when it comes to SQL.

chris_wot4y ago

This isn't just a matter of some constant programmer overhead, like SQL queries taking 20% longer to write.

20% longer to write than what alternative? And how is this being measured?

And.. am I missing something?

By far the most common case for joins is following foreign keys. SQL has no special syntax for this:

  select foo.id, quux.value 
  from foo, bar, quux 
  where foo.bar_id = bar.id and bar.quux_id = quux.id

Why can't this be expressed as an INNER JOIN?

And can't some of these subqueries be written using a WHERE EXISTS or a windowing function?

gwd4y ago

For SQLite I use 'natural join':

    select foo_id, quux.value
      from foo natural join bar natural join quux

Unfortunately this doesn't actually use the foreign key relation; it matches on the same element name. So you have to have `foo.bar_id` and `bar.bar_id`, as well as `bar.quux_id` and `quux.quux_id`. But I find that actually makes queries more readable.

euroderf4y ago

I find that the use of this technique, plus banning NULLs (handle them in application code), promotes user sanity. My 0,02€.

progre4y ago

While some of their complaints are legit I think most of the SQL in this article are straw men. Most of it can be made more readable and more performant.

kinjba114y ago

Care to back up those claim with evidence? The SQL queries are toy examples. Experienced SQL users don't write SQL like in the article because they know SQL's pitfalls and avoid them. It doesn't mean SQL isn't full of pitfalls and bad decisions (like many very old programming languages still in use).

1 more reply

erezsh4y ago

> Why can't this be expressed as an INNER JOIN?

`from foo, bar, quux` is an inner join, it's a shorthand syntax. He's lamenting that he has to keep specifying and matching ids, when the database can figure it out on its own from the foreign keys.

initplus4y ago

Depending on your database, it may be worse than just shorthand syntax. I encountered once a DB (that shall remain nameless) where replacing this syntax with the actual JOIN keyword resulted in dramatically better query plans everywhere it was used.

pdkl954y ago

They should use NATURAL JOIN if auto-selection of the join keys is that important. I wouldn't recommend relying on that type of automagic behavior because it is very brittle; adding a column to a table might accidentally break existing queries.

3 more replies

_the_inflator4y ago

I feel the pain.

As someone who only uses SQL a couple of times a year, I feel that SQL shares the same fate as everything in IT: invented almost 50 years ago, not with today's world in mind, it has been blown up somewhat. Reminds me a bit of JavaScript: everything that can be done in JavaScript, will be done in JavaScript.

Like after C followed C++ and here Java and others there will be new DSL and techniques on top of SQL.

The article has its merits. Better abstractions for different use cases.

exmadscientist4y ago

I think the biggest difference between JavaScript and SQL is that there are things that SQL is actually extremely good at. For certain tasks, it really is the best language available, and not in the "least bad" sense but the "why would anyone even try to do this any other way?" sense.

To get that level of applicability, of course, you have to make your problem match the form SQL needs. For applications on its home turf, for example a simple inventory system, this can be both easy to do and beneficial (since if you're on SQL's home turf and you want to do something you can't do, there's probably a Very Good Reason). Unfortunately, this is not always easy to do, or even possible at all, and even when it is possible you usually need to know some basic relational algebra. (Tangentially, I am convinced that many of SQL's critics would be quieter if they knew a bit of relational algebra themselves, though I don't think that applies to this article.)

As you say, though, trying to make a tool do something it just shouldn't is the road to madness. The article's discussion of JSON in SQL is a pretty decent indicator of how that goes wrong even when it goes right. For further snapshots of the road to madness, the interested reader might examine C++, JavaScript, and a competent psychiatrist. Sometimes it really is time to move on, or at least add on.

maxeonyx4y ago

Just adding that JavaScript is also the best at something - running code sandboxed - especially in a web browser. Like SQL for databases, JS has no competition for web development (because the web needs full compatibility, and adding another language doesn't have a good value proposition - just compile to JS) and thus is immensely popular.

Most languages have something they are the best at. SQL is probably THE language with the strongest value proposition - relational databases are even more important and ubiquitous than web browsers. But why doesn't SQL have any competition?

I would love to see an alternative to SQL in the style Jamie suggests. Maybe SQL would immediately not be the best anymore?

samatman4y ago

Half of the triumph of SQL here is the strength of the relational model, which rests on a solid mathematical foundation and is well and truly the best way to model the vast majority of data domains.

The other half is just having no meaningful competition in that one domain. So I agree with the author (and you presumably) that building something better on top of the relational algebra should be a priority for the profession.

jackbravo4y ago

Here in hacker news it was posted this article about the story of SQL biggest rival, QUEL, which is pretty related: https://www.holistics.io/blog/quel-vs-sql/?

It is not that we didn't try to replace it, but just as other comments have said, SQL was good enough, and already has the biggest mind share.

asavinov4y ago

One alternative to SQL (type of thinking) is Column-SQL [1] which is based on a new data model. This model is relies on two equal constructs: sets (tables) and functions (columns). It is opposed to the relational algebra which is based on only sets and set operations. One benefit of Column-SQL is that it does not use joins and group-by for connectivity and aggregation, respectively, which are known to be quite difficult to understand and error prone in use. Instead, many typical data processing patterns are implemented by defining new columns: link columns instead of join, and aggregate columns instead of group-by.

More details about "Why functions and column-orientation" (as opposed to sets) can be found in [2]. Shortly, problems with set-orientation and SQL are because producing sets is not what we frequently need - we need new columns and not new table. And hence applying set operations is a kind of workaround due the absence of column operations.

This approach is implemented in the Prosto data processing toolkit [0] and Column-SQL[1] is a syntactic way to define its operations.

[0] https://github.com/asavinov/prosto Prosto is a data processing toolkit - an alternative to map-reduce and join-groupby

[1] https://prosto.readthedocs.io/en/latest/text/column-sql.html Column-SQL (work in progress)

[2] https://prosto.readthedocs.io/en/latest/text/why.html Why functions and column-orientation?

mlinksva4y ago

Inspiring article (I love SQL, but it's also frustrating). My only wish would be to see the criticisms used as a checklist to evaluate SQL improvements or new query languages.

EdgeQL, indirectly linked at the end of the article, looks at a glance like it might score well. EdgeDB's blog post [1] criticizing SQL and introducing EdgeQL seems to cover the same concepts (inexpressive, incompressible, non-porous) with slightly differing language in some cases (e.g.. system cohesion for porousness).

Noticed after posting this comment that there's a post today about EdgeQL. [2]

[1] https://www.edgedb.com/blog/we-can-do-better-than-sql [2] https://news.ycombinator.com/item?id=27793398

historyloop4y ago

SQL isn't immutable, it's always evolving. I find it awkward some of the arguments in the article read like "you couldn't do that before CTE was added". But it WAS added, so?

If you want to fix SQL, contribute to the next version of the standard, or provide example by implementing what you want to see out there.

jmull4y ago

So what?

Complaining about SQL is the easy part. Actually, it's the first skill most new SQL developers truly master.

I'm waiting for the viable alternative. There are a lot (a LOT) of solutions that handle some cases, but inevitably you need to get into the SQL anyway because that's the DBMS' native API (and now you also need to fight your way through the abstraction, oh and since there are a LOT of solutions a different one is used every chance someone gets, so you need to relearn how to fight through the abstraction all the time).

I doubt it's going to change. There's actually no significant reason. SQL (actually, the set of mutually incompatible SQL variants) is thoroughly entrenched and a small problem... that is, it's rarely the dominant reason a project/product succeeds or fails, or takes too long, or becomes unmaintainable, etc.

jandrewrogers4y ago

Another subtle issue with SQL is that it tacitly assumes a great deal about the internal architecture of the database engine implementing it. SQL is designed to be easy to implement for the way SQL databases worked in the 1990s. Unfortunately, modern high-end databases today have radically different internal architectures, are capable of much greater internal expressivity as a minimum, and are designed to support data models as first-class citizens that weren't even on the radar in the 1990s. Patching the first-class capabilities of modern database kernels into the SQL language, such as generalized recursion, can often be awkward or require non-standard syntax or behaviors that defeat easy optimization. The DDL has similar issues, particularly around its concept of what an "index" can look like under the hood or the myriad ways in which data can be organized.

I've used and even written SQL databases for much of my career. SQL is pretty satisfactory for what it was designed to do. I view SQL like classic inheritance-based OOP; it works well for the problem domains for which it was originally designed, but is poor for efficiently expressing problem domains that are better expressed in a composition-based or functional way. Yet it worked so well in its original domain that we try to apply it everywhere. The diversity of data models and the kinds of operations we want to do with them today is far greater than was considered when SQL crystallized into its current form.

The limitation of most nominal SQL replacements I've seen is that they commit the same sin of SQL originally: overfitting for a problem domain that the designer was most interested in. There is an appetite for a really good SQL replacement if done well, and in principle anything SQL can do could be directly translated into a new language for compatibility.

ComodoHacker4y ago

>By far the most common case for joins is following foreign keys. SQL has no special syntax for this

That's because there can be more than one FK relationship between the same two tables. For example, if we model a binary tree, there could be references to left, right and parent nodes.

KingOfCoders4y ago

We're currently moving into a different direction, removing Spark Code to move most of the stuff into BigQuery SQL (which can use structs, one of the points of the article), because it's easier for Data (Engineers|Analysts|Scientists) to write SQL than e.g. Scala.

dominotw4y ago

Same here. Lots of places are doing this, afaik.

shopify from pyspark -> sql

https://shopify.engineering/build-production-grade-workflow-...

LeonB4y ago

I would like a typescript style transpiler tool chain for testing out new language features that are seamlessly transpiled down to existing sql.

Once that’s in place I don’t know which features I’d want first… but there’s a lot of them!

mcv4y ago

Having worked a lot with neo4j, a graph DB, over the past two years, I must say I'm surprised how rigid and inexpressive SQL is by comparison. We started our project with a SQL database, but some queries would be 10 or more lines with multiple joins. Very hard to read. Once we switched to neo4j, the same query was a single, easily readable line.

SQL is very well-established, but it's also old, and it shows its age. It's kinda weird how easily we jump from one programming language to another, and yet we can't seem to move on from our main relational query language.

cm21874y ago

One thing I don't understand in SQL is why creating a tmp table is so verbose, why we can't use type inference.

There is an internal software where I work where to create a tmp table you just assign the result of the query to a variable. It is so much nicer. So for instance creating a tmp table becomes as simple as the below, no need to declare each columns, to do an insert, to drop the table in the end:

   @t = select colA, colB from tbl

   select top 10 * from @t order by colB

lelanthran4y ago

> One thing I don't understand in SQL is why creating a tmp table is so verbose, why we can't use type inference.

> There is an internal software where I work where to create a tmp table you just assign the result of the query to a variable. It is so much nicer. So for instance creating a tmp table becomes as simple as the below, no need to declare each columns, to do an insert, to drop the table in the end:

> @t = select colA, colB from tbl

> select top 10 * from @t order by colB

Unless I am misunderstanding what you are looking for, 'SELECT INTO' works the way you want: https://www.postgresql.org/docs/9.1/sql-selectinto.html

It's on every RDBMS I've used, IIRC.

saurik4y ago

You don't have to do all of those things though? Just create table as select (or sometimes select into).

cm21874y ago

Thanks. I didn't know this syntax. You still have to drop it in the end but somehow I never ran into this syntax.

3 more replies

bvrmn4y ago

CREATE TABLE AS can help. As well as CTEs.

AtNightWeCode4y ago

Not that bad workarounds. The N + 1 problem is usually not a big issue with ORM:s but one should the check the generated code I think. Seen far worse written code. (Well, if you don't do SELECT *...)

I have other issues with SQL:

The linear way resources are needed with the amount of data but no built in way to handle it.

That integer ids are way overused and basically locking every database to a specific environment.

The index tweaking.

The workarounds for write speed.

The fact that you can do anything in SQL and people know it.

lenkite4y ago

Been coding for over a decade and written thousands of simple and complex queries and I have always thought SQL sucked but was too afraid to ever express that opinion since everyone else believes it is the best thing since sliced bread. Quite relieved that some experts feel the same way.

zug_zug4y ago

I guess I don't get it. It uses a bunch of big sounding technical terms ("inexpressive" "non-pourous") to criticize sql, but when I actually read it this seems to be mostly miniscule details that could be added trivially to an SQL engine if there was demand. For example, joining natively on foreign key seems like a trivial convenience, I'm not sure it proves any larger point to me, many people prefer code that is more verbose and clear about what it does than magical/implicit.

Another example complaint hidden behind a ominous-sounding word boils down to "Using a table expression inside a scalar expression is generally not possible, unless the table expression returns only 1 column and either a) the table expression is guaranteed to return at most 1 row or b) your usage fits into one of the hard-coded patterns such as exists."

Uh, great I've never needed to do that in my career, and so if you care so much make a PR, but suggesting that SQL itself is somehow the problem is laughable. It would be orders of magnitude more effort to try to standardize the industry on a new query language than to patch table expressions. I can scarcely imagine what a productivity loss it would be to the industry of SQL standardization were dropped, it would be much worse than python 2/3 debacle.

Also "incompressible" - Sounds like the author doesn't use views/materialized-views.

Finally the "fragile" example is just the author writing a bad query. The example here is performant and less fragile: https://stackoverflow.com/questions/612231/how-can-i-select-...

etc.

bvrmn4y ago

> fk_join(foo, 'bar_id', bar, 'quux_id', quux)

This example has same amount of semantic entities as in SQL. Also there is USING. Also why author needs a strict modeling over json when one can model in native types? It's a very strange article.

hugodrax1574y ago

It's also not obvious to me what has been gained from introducing a function which takes 5 unnamed arguments. Without looking at some separate (and therefore possibly wrong) documentation there's no way to guess what it's doing. Writing the SQL may take a bit longer but reading and understanding it is far easier.

thom4y ago

What other approaches to making SQL composable can we imagine? With functions it seems very simple to extract a WHERE clause into a predicate of some sort. I'm sure even if this exact syntax isn't preferable, being able to reuse join logic would be great.

That said, in the case of wanting to abstract out or reuse joins, just write a view, I guess. And I get a lot of mileage in Postgres from just writing functions to abstract out predicates, because it allows you to write things like `select * from order where order.is_complete` instead of `where is_complete(order)`

1 more reply

Crash0v3rid34y ago

I’m always asked how I am so good at sql. I laugh given I know how crappy my sql skills are. It’s really that I just know our schema so well I can formulate a decent enough query to extract what I need.

Knowing your schema design is just as important as knowing sql.

JoelJacobson4y ago

I suggest using the fact foreign keys are constraints with unique names, and using these names to explicitly specify what column(s) to join between the two foreign key tables.

In PostgreSQL [2], foreign key contraint names only need to be unique per table, which allows using the foreign table "as is" as the constraint name, which allows for nice short names. In other databases, the names will just need to be a little longer.

Given this schema:

  CREATE TABLE baz (
  id integer NOT NULL,
  PRIMARY KEY (id)
  );
  
  CREATE TABLE bar (
  id integer NOT NULL,
  baz_id integer,
  PRIMARY KEY (id),
  CONSTRAINT baz FOREIGN KEY (baz_id) REFERENCES baz
  );
  
  CREATE TABLE foo (
  id integer NOT NULL,
  bar_id integer,
  PRIMARY KEY (id),
  CONSTRAINT bar FOREIGN KEY (bar_id) REFERENCES bar
  );

We could write a normal SQL query like this:

  SELECT
    bar.id AS bar_id,
    baz.id AS baz_id
  FROM foo
  JOIN bar ON bar.id = foo.bar_id
  LEFT JOIN baz ON baz.id = bar.baz_id
  WHERE foo.id = 123

I suggest adding a new binary operator, allowed anywhere where a table name is expected, taking the table alias to join from as left operand, and the name of the foreign kery contraint to follow as the right operand.

Perhaps "->" could be used for this purpose, since it's currently not used by the SQL spec in the FROM clause.

This would allow rewriting the above query into this:

  SELECT
    bar.id AS bar_id,
    baz.id AS baz_id
  FROM foo
  JOIN foo->bar
  LEFT JOIN bar->baz
  WHERE foo.id = 123

Where e.g. "foo->bar" means:

  follow the foreign key constraint named "bar" on the table/alias "foo"

If the same join type is desired for multiple joins, another idea is to allow chaining the operator:

  SELECT
    bar.id AS bar_id,
    baz.id AS baz_id
  FROM foo
  LEFT JOIN foo->bar->baz
  WHERE foo.id = 123

Which would cause both joins to be left joins.

  SELECT
    bar.id AS bar_id,
    baz.id AS baz_id
  FROM foo
  LEFT JOIN foo->bar->baz
  WHERE foo.id = 123

[1] https://scattered-thoughts.net/writing/against-sql/

[2] https://www.postgresql.org/

Seb-C4y ago

This is interesting, I did not know this syntax.

Alternatively there are still the NATURAL JOIN and USING syntaxes that have been standard like forever.

JoelJacobson4y ago

I should clarify this syntax is only an idea, it's not implemented yet in any vendor nor part of the SQL standard, yet.

I think it would be a nice feature to add to the SQL standard.

gumby4y ago

SQL is a COBOL-era language — though there are 15 years between them, language theory was quite rudimentary at that time.

But it exists and is adequate. And, as Gabriel’s famous essay says, Worse is Better.

tritiy4y ago

Was this written by a nnet? I found it so hard to read as if it author has written it in another language and then used some weird translation engine.

cletus4y ago

The complexity of the SQL spec is a fair point. Inconsistencies between implementations has some merit but in practice doesn't really matter (eg how often do you really replace your database?).

A lot of the rest of it reads like the author started with this conclusion and then went looking for justification.

Example: the author states it's hard to return more than one column with a correlated subquery. That's what with clauses or join with queries are for. The author later mentions with statements so is aware of them.

As for JSON, I honestly don't think anybody needs that. Either return a JSON blob (generally bad idea IMHO) or you need to construct it in code.

The example of join verbosity has issues too. First, abbreviated syntax would need to express what kind of join to do (eg inner vs outer). Second, I find this fairly natural:

    SELECT ...
    FROM a
    JOIN b ON a.id = b.a_id
    LEFT OUTER JOIN c ON b.id = c.b_id

The author instead used this syntax:

    SELECT
    FROM a, b, c
    WHERE a.id = b.a_id
    AND b.id = c.b_id

The also leaves the join type unexpressed. In some SQLs you say:

    AND b.id = c.b_id (+)

But that's kind of ugly and old-fashioned. The first syntax is preferable and clear.

On "compressability", SQL has this. They're called views. GraphQL has a notion called fragments that SQL doesn't. This is one of those things that sounds like a good idea but probably isn't. It makes queries much harder to read and I've seen this reach the point where a fragment is so widely used changing it is expensive (eg generated code) and removing anything is impossible. Plus a lot of users end up querying things they don't need.

Poor optimization and error messages of with clauses aren't really an argument against SQL. They're an argument against particular implementations. Extracting an anonymous query into a WITH clause should be a no-op to performance for any half-decent query optimizer/executor.

Writing extensions (eg functions) should be discouraged. It's harder to deploy and debug and the last thing you want is a badly written C function crashing your database.

Years ago we also had stored procedures (eg Oracle PL/SQL) and nobody does that anymore because it's terrible. You don't want that.

There's a lot in there about pathological corner cases that I honestly don't really care about.

I do agree that ORMs are generally a disaster.

Lastly, it's worth noting that SQL unless a lot of alternatives has a solid theoretical basis and that is relational algebra. SQL wasn't created in a vacuum. SQL is just a way to express those constructs.

I will say that SQL got the order of clauses wrong whereas LINQ got this right. SQL should actually look more like this:

    FROM a
    WHERE a.foo = 'bar'
    SELECT id, col1, col2

Honestly though, SQL just isn't "broken". That's why it's endured so long despite the NoSQL fad and various efforts to replace it.

patkai4y ago

Am surprised that in such a long thread nobody mentions RethinkDB.

ngrilly4y ago

RethinkDB is mentioned in the article.

boobsbr4y ago

I used RethinkDB for a while, before migrating to Azure Table Storage, and it was nice to use.

chubot4y ago

Amazing critique! It has a wealth of examples -- I liked the "N+1 query bugs" and "feral concurrency" links (stuff I've experienced but didn't have a name for).

----

The comparison of SQL vs. flink windowing ("kernel space" vs "user space") reminds me of the this 2013 call to change the design of browsers feaetures:

https://extensiblewebmanifesto.org/

Basically there's a lot of stuff implemented stuff in the C++ layer of the browser that's impossible to emulate in JavaScript, and that's a bad design.

It is indeed alarming how much syntax SQL has. It reminds me of shell, where every string manipulation function like stripping a prefix has custom syntax like ${x//pat/replace} or ${x%%prefix}. Oil (https://www.oilshell.org/) will simply have functions for this, like x.sub('pat', 'replace').

----

I also wonder if the author has worked with dplyr and the tidyverse at all? He mentions Pandas, but IMO it's a clunkier imitation of those ideas (and I'm saying that as a Python programmer).

Tidy data was my intro to the design of dplyr: http://vita.had.co.nz/papers/tidy-data.html

It's very inspired by the relational model, but it has a few more operations like "gather" and "spread" which turn "long" format into "wide" format and vice versa.

It has a clean and expressive API: https://www.rstudio.com/wp-content/uploads/2015/02/data-wran...

It composes like regular code, so you can write stuff like:

    bin_sizes %>%
      select(c(host_label, path, num_bytes)) %>%
      left_join(bytecode_size, by = c('host_label')) %>%
      mutate(native_code_size = num_bytes - bytecode_size) ->
      sizes

Good comparison of the relational model and data frames: Is a Dataframe Just a Table? https://plateau-workshop.org/assets/papers-2019/10.pdf

I link all of these in What is a Data Frame? (In Python, R, and SQL) https://www.oilshell.org/blog/2018/11/30.html

pjmlp4y ago

> Why did SQL have to add it to the language spec?

Most likely because there isn't cargo for SQL, everyone has to make do with a default install offers, and most big boys databases offer FFI to Java, .NET and C.

> This works for data modelling (although it's still clunky because you must try joins against each of the tables at every use site rather than just ask the value which table it refers to)

Only if one never learned what views are for, and the various flavours they come in.

> By far the most common case for joins is following foreign keys. SQL has no special syntax for this:

    select foo.id, quux."value" 
    from foo
    inner join bar on foo.bar_id = bar.id
    inner join quux on bar.quux_id = quux.id

Really, how much time was spent learning SQL before complaining?

bokwoon4y ago

Given that the author also wrote https://scattered-thoughts.net/writing/select-wat-from-sql/, I don't think he knows nothing about inner joins. I think he was just using that equivalent form to compare it with the terser `foo.bar.quux`. It is pretty strange to compare it with `fk_join(foo, 'bar_id', bar, 'quux_id', quux)` though, because SQL already has the equivalent `foo JOIN bar USING bar_id JOIN quux USING quux_id`.

pjmlp4y ago

Yeah, it reads like going down the path to write some SQL like parser without actually having used SQL in anger.

mickael-kerjean4y ago

After years of doing that same technique, in my new job, people would write:

   SELECT foo.id, quux.value
   FROM foo, quux, bar
   WHERE foo.bar_id = bar.id AND bar.quux_id = quux.id

I couldn't find anyone telling me the difference between those 2 ways to write a query, do someone know more about this?

kthejoker24y ago

There's no performance difference in the engine.

The way you have written it (ANSI-89)used to be the only way joins could be written.

The second one (ANSI-92) was introduced to allow for composability since the entities being joined and the join condition are next to each other in the code and multiple joins can be generated one after the other.

IMO it also enhances developmemt quality of life since you can understand a new-to-you query faster (especially complex ones), you can just comment out a join in one line when testing replacement, cut and paste between queries easier, etc.

An SO question on the topic

https://stackoverflow.com/questions/334201/why-isnt-sql-ansi...

trapatsas4y ago

There are two types of people in the world. The ones that are pro-SQL and the ones that don’t understand how SQL works

Kuinox4y ago

I invite you to read the article.

ltbarcly34y ago

Lots of the examples here are yhe author writing very poor, non idiomatic SQL and then criticizing it.

I could write a point by point rebuttal but I'll just pick one point, compressibility: VIEWs.

kinjba114y ago

I've been around quite a bit of SQL, and views are great. But they're not exactly as easy as assigning to a variable. If I followed your logic per compressibility I'd have to be creating views - permanent global objects - for almost every query I write. Do you create views every time you need to write some ad-hoc SQL?

wruza4y ago

And much worse problem is naming them correctly. And maintaining in such global-only space.

But just give up, you hit the wall. Many people tried for decades to make this argument, but every time it was raised you’d see this thread full of people who see no problems. There is a lot of powerful, unrepeatable-in-your-lifetime software behind stupidest frontends that you can’t bypass, because most people write only straightforward code with no need for any abstractions beyond what was given to them.

chris_wot4y ago

Yeah, some of these subqueries could be fairly easily rewritten.

bullen4y ago

I thought they where talking about the data not being able to compress, the actual queries don't need to be compressed.

But you need to separate the data and the index so you can compress the data while still searching the index, and none of the SQL databases do that because they don't have one file per value (for obvious disk-size reasons).

We need to approach the database as files, even add features to our filesystems to accomodate that.

In my distributed HTTP/JSON database I use ext4 type small to not run out of inodes before disk space.

kinjba114y ago

Should the filesystem matter? I'd assume a database would just allocate a large chunk of disk space and memory and do what it will with that on a layer closer to the metal than files.

bullen4y ago

Files are closest to the metal.

That's where we can improve things most for everyone.

But drivers need to be compatible with everything and hardware needs to also be compatible with everything, there is no progress.

To really change things we should design hardware (disks and monitors) that have drivers that only work with them.

But in the meantime having simple database like functionallity in the filesystem, like being able to list files 100-200 in alpabetical order f.ex without wasting CPU/memory would be interesting (today you need to use multiple commands that list the whole directory)!

Also looking at directory compression that does not require you to uncompress the whole directory to get a small subset of files.

historyloop4y ago

Did the author forget that we had this entire "NoSQL" period that lasted well over a decade, where SQL was the worst thing ever, and everyone kept coming with the superior alternatives to SQL?

What happened?

What happened is many of those NoSQL products started adding SQL syntax and features to their databases, others disappears, and yet others specialized into niches where they don't compete with SQL RDBMS at all, which remains the primary database paradigm and language.

So those are the facts. If someone still believes they know better, put up or shut up.

boxed4y ago

Those were afair not even relational systems. So doesn't apply here. The article clearly states so in the very first sentence.

justshowpost4y ago

It's all about the background. For HLL and even basic programmers grasping SQL poses little-to-no challenges. Some are even falling in love with SQL despite some minor inconsistencies and prolix wordy verbosity and asking for writing more SQL.

In contrast, users of, for example, the lingo where object minus object equals NaN are terrified when suddenly exposed to type zoo like https://www.postgresql.org/docs/9.5/datatype.html (Disclaimer: a relatively randomly chosen example, neither endorsement nor preference of particular RDBMS/dialect). And let's keep in mind what types above form a structures and these structures getting manipulated en mass as intrinsically unordered sets (which are data types too!). That is, a leap from barely existing concept of data types to circa 30% of DDL/DML keeps scripters out of SQL.

So the reason behind that endless «SQL bad» teeth gnashing turns out to be very simple.

tonymet4y ago

Relational Tables & SQL should be just one storage mechanism for your app.

What if someone told you: build an app, but only use b-trees? Then you start complaining about all the shortcomings of b-trees.

The point is that you have relational tables / SQL, along with many other persistence , storage & indexing mechanisms: distributed hashtables, queues, lists, etc.

All the apps I've worked on have mixed SQL with all of the other data structures with consistent or inconsistent replication among them depending on the use-case.

One way to manage this is a key-value online tier and a relational offline tier, with inconsistent replication online to offline.

SQL & RDMBS are very powerful, but like any tool, limited to the designated use case. Stop trying to make it do everything.

roenxi4y ago

One of the elephants in the room with SQL is that it is one of a small number of popular languages that doesn't use

    function(arg, arg, arg)

It is strange that "SELECT a, b, c FROM schema.table" keeps any aura of respectability. That is legitimately outdated syntax, people don't write languages that way any more. It was a 70s era experiment and what was learned from that experiment is that the style has no upside and comes with downsides. It should be 2 or 3 functions, with brackets.

With full knowledge of SQL, the successful languages that followed it were C/Python/Java/Javascript that use lots of functions and a smattering of special syntax for control structures.

dbsmith834y ago

Comparing SQL to those other languages doesn't really make sense. Their purpose is different. For what SQL does, the syntax makes a lot of sense because it is a completely different paradigm. I think it's dismissive to refer to SQL as merely a 70s experiment. It is used so widely today still

mongol4y ago

I think SQL is amazing. There are few technologies within IT that has held up as well as it has. Skills you could gave learned 40 years ago still useful today. I think learning SQL well is one of the best investments you can do in yourself.

roenxi4y ago

What advantages do you think

    SELECT a, b, c FROM d

has over even a trivial modernisation like, say,

    table(d) |> select(a, b, c)

?

5 more replies

goto114y ago

LINQ in C# supports a function-based syntax for querying. E.g. instead of

   SELECT foo, bar FROM baz WHERE zig = 7

You can write

   db.baz.Where(baz => baz.zig == 7).Select(baz => new { baz.foo, baz.bar });

It does have some nice properties compared to SQL, but it also very quickly becomes incomprehensible. E.g. the join syntax in SQL:

   SELECT baz.foo, wawa.bar FROM baz JOIN wawa ON baz.id = wawa.baz_id

looks like:

   db.baz.GroupJoin(db.wawa, baz=>baz.id, wawa=>wawa.baz_id, (baz, wawa)=>new { baz.foo, wawa.bar} );

I don't think anybody would find this easier, and C# actually added additional custom syntax, so you could use more SQL-like syntax instead of the method-based syntax.

croes4y ago

So they wanted it to be easy for non programmers, more natural language like, so functions and brackets are quite the opposite.

zozbot2344y ago

Mixfix syntax is not that weird. Objective-C is one well known language that uses it, for example.

pjmlp4y ago

Here is your function(arg, arg, arg)

    CREATE [OR REPLACE] FUNCTION function_name (arg, arg, arg)
        RETURN return_type
    IS
      ---
    END;

Then

    SELECT function_name (arg, arg, arg...) FROM dual;


    SELECT columns FROM xpt where xpt.id = function_name (arg, arg, arg...);

    IF function_name (arg, arg, arg...) = ... THEN ...

rswail4y ago

well, except "FROM dual" is an Oracle-ism because they require a FROM in a SELECT.

1 more reply

j / k navigate · click thread line to collapse

339 comments

slx264y ago

>> The core message [...] is that there is potentially a huge amount of value to be unlocked by replacing SQL

To me, a lot of people defends SQL saying that "perfect is the enemy of good" and that SQL simply works. Not the favourite of anyone, but everyone kinda accepts it.

barrkel4y ago

I agree with Jamie. I think SQL is irritatingly non-composable, many operations require gymnastics to express, and I'd like a more expressive and regular language to write queries in.

I also suspect that such a language would be harder to optimize, and might be more practical to implement if it was less declarative and closer to the plan.

asavinov4y ago

> I think SQL is irritatingly non-composable, many operations require gymnastics to express

[0] https://github.com/asavinov/prosto Prosto is a data processing toolkit - an alternative to map-reduce and join-groupby

[1] https://prosto.readthedocs.io/en/latest/text/column-sql.html Column-SQL (work in progress)

spicybright4y ago

What is your opinion on abstractions on top of SQL queries?

On paper, a more expressive language that spits out SQL queries sounds great, but I've never seen a single one not become a pain in the ass to use.

8 more replies

ampdepolymerase4y ago

Have you tried Logica?

https://logica.dev/

xarope4y ago

Speaking of which, what has happened to hadopp and HDFS? Used to be flavor of the month about 10 years ago, but now I hardly ever hear people talk about it.

endymi0n4y ago

SQL is exactly like the QWERTY layout: A first quickshot with little design thoughts and unfixable architectural issues that‘s so widespread that everyone is used to it by now.

Trying to change to the Dvorak layout taught me a lot about enacting change on such a grand scale.

It‘s why nearly all big databases eventually switched to it despite the FORTRAN vibe and its general ugliness.

Anyway, probably time to SET TRANSACTION LEVEL READ UNCOMMITTED and call it a day ^^

DeusExMachina4y ago

While I agree with your point, I wonder why that does not seem to be the case with programming languages.

For example, in iOS development (and, more in general, on Apple platforms), there has been a huge shift from Objective-C to Swift.

One could argue that Swift was easier to pick up for newcomers. While that's true, I would then expect the argument to apply also to SQL alternatives.

So, what is the difference here?

8 more replies

hyperpallium24y ago

You're right that it got established early and network effects, but Codd had two goes at a relational query language before Chamberlin and Boyce had a go. So not literally "a first quickshot".

dan-robertson4y ago

goodpoint4y ago

> typing on other user‘s computers

Off-topic, but changing layouts is easy. Also, using other people's keyboards is not the best for hygiene even before covid.

1337shadow4y ago

Dvorak-intl-code user here, SQL has so much more engineering than the QWERTY layout it's hardly comparable at all.

Aeolun4y ago

BeFlatXIII4y ago

1 more reply

ako4y ago

There's a lot of talk about what is wrong with SQL, but i haven't seen something yet that is actually better than SQL for most use cases.

Stop fighting SQL so much, and just focus on bringing a better solution. If potential users see it has significant benefits they'll start using it.

trollied4y ago

I agree. I’ve seen people bashing SQL ever since I started using it 25 years ago, but I’ve yet to see an attempt to replace or supersede it succeed.

1 more reply

blacktriangle4y ago

In theory datalog is better than SQL. However the reality is that we have millions of programmer-hours invested in making SQL perform well where datalog has no where near this effort.

1 more reply

Tarean4y ago

Inventing a new language and a new query engine and a new storage engine at the same time, competitive with the state of the art, is maybe just too much to be feasible.

glogla4y ago

The OP is right that for machine-to-machine interface, all the syntax baroqueness is not worth it. And of course, having more portability would be great.

nytgop774y ago

Spot on.

blihp4y ago

b33j0r4y ago

To me, SQL looks like something I should be using 79-char punchcards for.

Scalable databases are just so difficult that we’re still driving a ‘64 IMPALA

Most of this opinion comes from “SQL” being vendor-specific. Is JSON vendor-specific? Is anything else, that we actually use by choice?

(Disclaimer: I possibly just traveled back to 2002 and said this on slashdot)

vbsteven4y ago

> Most of this opinion comes from “SQL” being vendor-specific. Is JSON vendor-specific? Is anything else, that we actually use by choice?

Two things that come to mind are Markdown with all its flavors and Regex with multiple engines.

edit: and to a lesser extent maybe C/C++ compilers and JS engines.

edit2: also JVM, Python and Ruby runtimes

But both edits describe technologies with an official spec and slightly different implementations. Markdown/Regex are more comparable to SQL because they have vendor-specific syntax.

1 more reply

ako4y ago

You're mixing up different things that don't nescessarily need to be addressed at the same time. SQL is the declarative programming language used to express what data you need.

What type of database engine is used to execute that query is independent of the language. It could be highly scalable or it could be focussed on single user, single process.

What is the problem that needs solving?

mamcx4y ago

> We do deserve better.

Totally! SQL is like shell, c and js that somehow generate this "never ever try to improve them for real, lets stay suffering for all eternity!"

And the AMOUNT of energy in mask them (transpilers, linters, "good practiques", tricks, etc) is a monumental waste of effort that cost so much than actually go and fix the core issues.

And in this case, SQL is the most trivial to fix. It could get simplified and streamlined far easier that is to replace C, for example.

But this mean at least 2 or 3 major database vendors go for it (I dream: sqlite & postgresql).

P.D: I'm exploring building a relational language, so already have a simplified dialect for it: https://tablam.org/tutorial

vincnetas4y ago

moksly4y ago

thrusong4y ago

I completely agree with you. Is SQL perfect? No. Have I accepted and embraced it? Yes, because it'll get the job done.

MetaWhirledPeas4y ago

Among the various languages I use, why is SQL the only one that favors ALL CAPS EVERYTHING? When writing ad hoc queries I ignore that convention just to be ornery.

3 more replies

bmn__4y ago

It's not 1960 any more. We have colour video. We are allowed to highlight syntax with colours instead of the less legible all caps.

aliasEli4y ago

We certainly do deserve better, and the author made some good suggestions.

Unfortunately, building a new database is a huge project and there appears to be no party currently willing to sponsor it.

danbruc4y ago

masklinn4y ago

> I wonder how much of the limitation are necessary in order for the query optimizer to have any chance at finding a good execution plan.

Or so I would think, I've never heard of SQL being praised for its optimisability.

ris4y ago

1 more reply

taffer4y ago

1 more reply

Aeolun4y ago

> otherwise you might get in the way of efficient query execution

Maybe? But I kind of feel that the query optimizer often gets in the way of efficient query execution.

There’s so much stuff that’s semantically the same, but if you shuffle some strings around, the query is suddenly 10 times faster.

paulmd4y ago

ako4y ago

Postgres allows you to run python in the database, Oracle has support for java, Postgres can do limit 100, you can build subqueries using views, and reuse these in larger queries...

99% of all programmers would not be able to program better performing queries when doing this with a procedural language or streams expressions or it would take them way too long.

1 more reply

adsharma4y ago

https://github.com/adsharma/fquery/ https://adsharma.github.io/fquery/

Uses python expressions and generates SQL.

Also does static typing, so you can run a type checker on the code

bob10294y ago

I feel like most frustrations with SQL boil down to fighting against a shitty schema.

When you are sitting in a properly normalized database, it is a lot easier to write joins and views such that you can compose higher order queries on top.

Achieving 3NF or better is usually a fundamental requirement for ensuring any meaningfully-complex schema doesn't go off the rails over time.

samatman4y ago

The categories were made for man, not man for the categories.

A not-shitty schema is always to be preferred over a shitty one, to be sure. But for any data schema, there will be queries which cut against unexpected joints (joins?).

balfirevic4y ago

> I feel like most frustrations with SQL boil down to fighting against a shitty schema.

Which one of the frustrations from the article boils down to fighting against shitty schema?

bob10294y ago

I felt it would contribute more meaningfully to the overall discussion on HN if I were to sidestep a direct critique and present a view of the problem from a different perspective.

pc864y ago

The vast majority of recursion in SQL is probably the result of a bad schema.

2 more replies

batty_alex4y ago

Yeah, this has been my experience, too. Sql and RDBs are a lot less frustrating when someone takes the time to actually do some design and planning.

> Model it correctly, then consider an optimization pass if performance cannot be reconciled with basic indexing or application-level batching/caching

So true. This also extends into general purpose languages, everything is so much easier when you take the time to model things correctly.

vaughan4y ago

Schemas and requirements evolve over time. There is no perfect schema.

Schema design always results in endless "are we gonna need this?" questions, and usually everything is postponed down the road because it adds needless upfront complexity.

A bigger problem is how sticky schemas are due to how difficult they are to change, and how difficult it is to update your code to support a schema modification.

I think these two problems need more attention.

We've gone too far down the code-first / code-generation ORM approach which makes tinkering difficult.

We need better tools, and less code.

kaba04y ago

NavinF4y ago

>The usual response to complaints about the lack of union types in sql is that you should use an id column that joins against multiple tables, one for each possible type.

>create table json_value(id integer);

>create table json_bool(id integer, value bool)

>create table json_number(id integer, value double);

No, the usual response is "Don't do that!"

99% of the time you either know the data types (so each JSON object becomes a row in a table where the column names are keys) or you don't know the data types and store the whole object as a BLOB

I'd be on board with adding unions to SQL, but I doubt I'd use that feature very often.

valenterry4y ago

But the world just works like that. There are unions everywhere. No matter how it's implemented under the hood, it should really be a first class concept in any language, including SQL.

goto114y ago

1 more reply

thom4y ago

In most languages where unions are a first class concept, are they not generally warned against?

3 more replies

NavinF4y ago

I was talking specifically about the JSON example in the article. Needing to store objects in that manner is just a silly problem to have. Any solution will be slow or ugly or both.

struct Shape {

  area @0 :Float64;   
  union {
    circle :group {
      radius @1 :Float64;
    }
    rectangle :group {
      width @2 :Float64;
      height @3 :Float64;
    }
  }
}

2 more replies

progre4y ago

kinjba114y ago

> maybe they are asking a bit much from SQL

But this article is thought provoking to say the least. It follows the courtroom logic of holding the defendant SQL on trial for as much as possible. And SQL is guilty of a lot of crimes.

I do hope GraphQL and similar query languages become more prevalent and standardized, as it seems SQL could really use some stiffer competition.

goto114y ago

Going to graph databases is certainly throwing the baby out with the bathwater. The relational model was invented to address shortcomings in the hierarchical and graph database models.

3 more replies

Aeolun4y ago

I do not understand the comparison between SQL and GraphQL. As far as I can see they’re not close to the same. And GraphQL is also missing a bunch of really necessary things.

ako4y ago

GraphQL is very limited in expression power compared to SQL. I doubt you could do 20% with GraphQL compared to what you can do with SQL.

xenomachina4y ago

Are you also talking about switching from a relational model to something graph-based, or is there some way one could use GraphQL with a relational database?

1 more reply

initplus4y ago

GraphQL is stiff competition, in the sense of "stiff as a corpse". Yes, it has the words "query language" in the name, but that's where the similarities end.

1 more reply

atupis4y ago

qPM9l3XJrF4y ago

>spending some time with one vendor will give you some habits that are sure to not work as well with another

It seems like having multiple vendors is only valuable if their products are to some degree differentiated, no?

aenis4y ago

1 more reply

yellow_lead4y ago

paulmd4y ago

Ideally a RDBMS can also be a NoSQL DB. There is no reason it can't/shouldn't, if you have JSON-formatted columns.

AtNightWeCode4y ago

Use a tool to generate the SQL schema. Then you can generate it to different DB types. Stored procedures is what breaks things in my experience.

pjmlp4y ago

DangitBobby4y ago

birdyrooster4y ago

maxrev174y ago

Any more info? ;)

1 more reply

james_woods4y ago

SQL was not made for programmers alone. It has been invented also for not so technical people so that verboseness and overhead is part of the deal.

https://ieeexplore.ieee.org/document/6359709

zabzonk4y ago

> We wanted the language to be simple enough that ordinary people could ‘‘walk up and use it’’ with a minimum of training.

papito4y ago

If SQL was designed "by engineers for engineers", you would be using esoteric Git commands just to blow off steam.

2 more replies

hvocode4y ago

rswail4y ago

sam_lowry_4y ago

This is a recurring theme. I was once amazed to learn that ClearCase was made for lawyers to use. The name makes suddenly a lot of sense!

erezsh4y ago

If that sounds interesting, you can find it here: https://github.com/erezsh/Preql

culebron214y ago

   df{\*, -column1}
   df{column1: new_name1, column2, column3: new_name3}

I noticed in working with pandas you often need to do lookups into other dataframes. It's partially solved by assignment operator if left field equals right index, or .map method, the same way.

Very often you need to do the following:

    gdf1 = geodataframe of points
    gdf2 = geodataframe of points

But IMO it could be condensed to something more natively supported.

nextaccountic4y ago

Something like.. BQL http://intelligiblebabble.com/a-better-query-language-bql-la... which went nowhere https://github.com/lelandrichardson/BQL

dvdkon4y ago

erezsh4y ago

1 more reply

PudgePacket4y ago

This is a great article and you can tell the author has deep experience with SQL from the way they speak and the other projects they're involved in.

We can do better. We deserve better.

What could things look like if you could radically alter the SQL language, replace it altogether, or even move layers from databases or applications into each other?

Who knows if it will be better or worse, but I'd like to find out.

latte4y ago

SQL and the relational model mostly works well and it's probably not practical to redo the enormous amount of work that was invested in SQL and its implementations and extensions.

- Join / subquery / CTE shortcuts for common use cases (e. g. for the FK lookups that are mentioned in the article)

- More flexible treatment of whitespace (e. g. allow trailing commas, allow reordering of clauses etc.)

And for the language to be usable, it would probably need: - First class support for some extended SQL syntax commonly used in practice (e.g. Postgres's additions)

- integration with console tools (e.g. psql), common libraries (e.g. pandas, psycopg2) and schema introspection tools

- editor support / syntax highlighting.

It would probably be good to model the syntax of that language on some DSL-friendly general purpose language (like Scala, Kotlin or Ruby).

laurent1234564y ago

Isn't it what query builders, such as Knex.js, are for?

latte4y ago

js4ever4y ago

After a little bit more than 2 decades of coding, SQL is nearly the only thing that was constant in my career.

It's a skill I used every working day, I'm pretty sure I will still use it in 20 years.

On the other side, tt's very unlikely that the ORM 'du jour' will exist in 3 years from now.

exdsq4y ago

I think you're right. I hope your name isn't though!

js4ever4y ago

Haha, in fact I believe js (vanilla, not framework of the week) is probably as durable as SQL

smitty1e4y ago

> First, while SQL allows user-defined types, it doesn't have any concept of a union type.

Isn't a union type essentially a de-normalized field?

This seems like attacking arithmetic operators for their lousy character string support.

Weren't XML databases (briefly) a (marketing) thing some decades back?

[1] https://stedolan.github.io/jq/manual/

masklinn4y ago

> Isn't a union type essentially a de-normalized field?

No? You have to denormalize to emulate unions when they're missing. Sum types are a fundamental category of types, that SQL only supports product types is a problem you have to work around.

smitty1e4y ago

See mannykannot's reply.

The argument for union types seems to get weak when one asks: how do we index their components?

Because there seems little middle ground between needing discrete fields and safely just parking the data as a memo field and deferring the management to the application.

Unix win by letting the system utilities specialize.

SQL need not be "one language to rule them all".

1 more reply

mannykannot4y ago

It is more fundamental than that: a union violates the fundamental tenets of the relational model. In that model, each element of a relation must be a simple value of some type.

SPBS4y ago

sizzler4y ago

The fact the the author highlights GraphQL as supposedly the great alternative shows how ridiculous the proposition is. GraphQL does basically nothing. It is 10% of the functionally of SQL.

ngrilly4y ago

When I compare the codebase of a tool like gqlgen (a GraphQL library in Go) with the code base of PostgreSQL or SQLite, I'd say even 10% is generous.

pjungwir4y ago

I'm curious how much work has been done on optimizers for Tutorial D or other D variants. It looks way nicer to use, but I wonder if it is easier to stumble into pathological cases.

zX41ZdbW4y ago

This is how it is done in ClickHouse. It has Nullable(T) type. The functions of non-Nullable types will return non-Nullable types (except some specific functions).

https://clickhouse.tech/docs/en/sql-reference/data-types/nul...

fnord774y ago

you can use COALESCE to remove nulls

bmn__4y ago

Explanation for your downvotes: you misunderstood GP's post, he refers to removing the concept of null as it is currently specified from the language, not removing null values from data.

einpoklum4y ago

> It would be really cool if databases had an Option<T> type. Then you could remove all the NULLs.

A nullable T column _is_ an Option<T> column, with NULL representing "Empty" or "No Value".

Teafling4y ago

OP explains why this is not the same in all the other sentences of their comment

tome4y ago

This approach doesn't resolve all of the author's complaints but it does solve many.

Disclaimer: I'm the author of Opaleye. Rel8 is built on Opaleye. Other relational query EDSLs are available.

[1] https://github.com/tomjaguarpaw/haskell-opaleye/ [2] https://github.com/circuithub/rel8/

kthejoker24y ago

Truly it is blind men evaluating an elephant.

ngrilly4y ago

A more pragmatic view in that article: https://blog.nelhage.com/post/some-opinionated-sql-takes/

quietbritishjim4y ago

Thanks, that's a great article. It has just the right balance of some interesting things I didn't know with enough things I agree with that I believe it!

ngrilly4y ago

F1, Google's SQL database, uses Protocol Buffer in an interesting way: https://storage.googleapis.com/pub-tools-public-publication-...

glogla4y ago

But if you're using it as persistence store that barely more than key-value store, why not? I have no problem believing the author that that kind of use is more common.

rawoke0836004y ago

I've been thinking about this problem a lot, CRUDs, GraphQL,ORMs, Models etc. Mostly in the "CRUD-Like" environment. I have been thinking about a "client side SQL impl".

In most CRUD's we currently have on the backend layers and layers of software with ORMS, frameworks etc, and it all boils down to "Writing/Generating the correct(good-enough) SQL"

We now have added stuff like GraphQL, which if you squint hard enough (ok very hard) can be seen as being a SQL alternative(Language to get the actual data).

Maybe SQL + "GraphQL-Like" Layers should "evolve" into ONE common "data scripting language" ?

Maybe we have something like "ClientSide-SQL" - which can be a subset of ServerSide-SQL ?

We need the "TypeScript" of "data-querying" which can be run on the server,client, moon and my device, where one can also only define any "Types" ONCE.

Anywhoo - I think there is still a lot to be done, researched and discovered in this section of CS :)

vaughan4y ago

twodave4y ago

We have a general rule on our team that complex SQL is a code smell. In our project complex queries are usually an indication of a poor design.

And we tend to solve a class of problem in our data layer and reuse those generalized patterns heavily. This makes our codebase predictable even when dealing with unfamiliar subject matter.

CornCobs4y ago

I do agree with this - after all SQL is supposed to be the data layer - why should we think that data processing shouldn't happen there?

karmakaze4y ago

> complex queries are usually an indication of a poor design.

Design is not something that should be done by application of dogma and avoiding smells.

twodave4y ago

No, but by treating a SQL server as a way of storing and retrieving data efficiently FIRST then the times when a complex query is actually necessary tend to stand out better.

thinkr424y ago

WillDaSilva4y ago

christophilus4y ago

I agree. I also agree with the post. I’m a big fan of sql and write a lot of it by hand. TFA makes a lot of excellent points to which I could add quite a few more.

If we ever do get a replacement, I hope it retains the declarative set theory approach of SQL while addressing the warts.

benjiweber4y ago

> By far the most common case for joins is following foreign keys. SQL has no special syntax for this

You can use NATURAL JOIN

select * from foo natural join bar

Works as long as the keys are named the same. However, a lot of people have a habit of naming keys differently in the two tables.

vertere4y ago

And what if there is more than one way of joining two tables, or a table to itself?

faho4y ago

And it also breaks if two non-key columns are named the same.

This makes naming key columns differently a defence technique, so you stop people from using natural joins.

thom4y ago

rswail4y ago

or ON:

select * from foo join bar on (foo.x = bar.y) if the columns have a different name.

croes4y ago

>what if we want to return the salary too?

>the only solution is to change half of the lines in the query

How about adding a second subquery for the salary.

kinjba114y ago

croes4y ago

The author explicitly said "the only solution" and that's plain wrong.

pjmlp4y ago

No, I will write a SQL function for the subquery and call it from the main query.

gerbler4y ago

This example seemed wrong to me as well. You can have a subquery, or CTE that returns as many fields as you want and can join on the manager key

kinjba114y ago

An additional subquery and a CTE are both restructuring the query significantly, which is the author's point.

2 more replies

Izkata4y ago

The GROUP BY section is odd:

> You can use as to name scalar values anywhere they appear. Except in a group by.

  -- can't name this value
  > select x2 from foo group by x+1 as x2;
  ERROR:  syntax error at or near "as"
  LINE 1: select x2 from foo group by x+1 as x2;
  
  -- sprinkle some more select on it
  > select x2 from (select x+1 as x2 from foo) group by x2;
   ?column?
  ----------
  (0 rows)

And then it goes on to show ways to work around that:

Instead of just... using the renamed column?

  select x+1 as x2 from foo group by x2;

wvenable4y ago

Sadly using the renamed column name in a group by or order by doesn't work on all database engines.

1 more reply

zX41ZdbW4y ago

This is also fixed in ClickHouse - you can set and reuse aliases in any expressions in the query.

nojvek4y ago

I agree with the Author. SQL is not a great query language. Almost every decently sized app I have written I have needed some sort of a query compiler so I don’t have to deal with nuances.

Also agree that GraphQL is a pretty fantastic language for working with graphs. And that relational databases are essentially graphs. Hasura is neat.

fbn794y ago

vendiddy4y ago

The article isn't against declarative programming. The author alternative declarative solutions throughout the article.

79524y ago

randomdata4y ago

The problem with SQL is the language, not the paradigm.

xupybd4y ago

This article promotes other relational alternatives and query languages.

thayne4y ago

> So instead the best we can do is add json to the SQL spec and hope that all the databases implement it in a compatible way (they don't).

Of course they are incompatible. That's just par for the course when it comes to SQL.

chris_wot4y ago

This isn't just a matter of some constant programmer overhead, like SQL queries taking 20% longer to write.

20% longer to write than what alternative? And how is this being measured?

And.. am I missing something?

By far the most common case for joins is following foreign keys. SQL has no special syntax for this:

  select foo.id, quux.value 
  from foo, bar, quux 
  where foo.bar_id = bar.id and bar.quux_id = quux.id

Why can't this be expressed as an INNER JOIN?

And can't some of these subqueries be written using a WHERE EXISTS or a windowing function?

gwd4y ago

For SQLite I use 'natural join':

    select foo_id, quux.value
      from foo natural join bar natural join quux

euroderf4y ago

I find that the use of this technique, plus banning NULLs (handle them in application code), promotes user sanity. My 0,02€.

progre4y ago

While some of their complaints are legit I think most of the SQL in this article are straw men. Most of it can be made more readable and more performant.

kinjba114y ago

1 more reply

erezsh4y ago

> Why can't this be expressed as an INNER JOIN?

`from foo, bar, quux` is an inner join, it's a shorthand syntax. He's lamenting that he has to keep specifying and matching ids, when the database can figure it out on its own from the foreign keys.

initplus4y ago

pdkl954y ago

3 more replies

_the_inflator4y ago

I feel the pain.

Like after C followed C++ and here Java and others there will be new DSL and techniques on top of SQL.

The article has its merits. Better abstractions for different use cases.

exmadscientist4y ago

maxeonyx4y ago

I would love to see an alternative to SQL in the style Jamie suggests. Maybe SQL would immediately not be the best anymore?

samatman4y ago

Half of the triumph of SQL here is the strength of the relational model, which rests on a solid mathematical foundation and is well and truly the best way to model the vast majority of data domains.

jackbravo4y ago

Here in hacker news it was posted this article about the story of SQL biggest rival, QUEL, which is pretty related: https://www.holistics.io/blog/quel-vs-sql/?

It is not that we didn't try to replace it, but just as other comments have said, SQL was good enough, and already has the biggest mind share.

asavinov4y ago

This approach is implemented in the Prosto data processing toolkit [0] and Column-SQL[1] is a syntactic way to define its operations.

[0] https://github.com/asavinov/prosto Prosto is a data processing toolkit - an alternative to map-reduce and join-groupby

[1] https://prosto.readthedocs.io/en/latest/text/column-sql.html Column-SQL (work in progress)

[2] https://prosto.readthedocs.io/en/latest/text/why.html Why functions and column-orientation?

mlinksva4y ago

Inspiring article (I love SQL, but it's also frustrating). My only wish would be to see the criticisms used as a checklist to evaluate SQL improvements or new query languages.

Noticed after posting this comment that there's a post today about EdgeQL. [2]

[1] https://www.edgedb.com/blog/we-can-do-better-than-sql [2] https://news.ycombinator.com/item?id=27793398

historyloop4y ago

SQL isn't immutable, it's always evolving. I find it awkward some of the arguments in the article read like "you couldn't do that before CTE was added". But it WAS added, so?

If you want to fix SQL, contribute to the next version of the standard, or provide example by implementing what you want to see out there.

jmull4y ago

So what?

Complaining about SQL is the easy part. Actually, it's the first skill most new SQL developers truly master.

jandrewrogers4y ago

ComodoHacker4y ago

>By far the most common case for joins is following foreign keys. SQL has no special syntax for this

That's because there can be more than one FK relationship between the same two tables. For example, if we model a binary tree, there could be references to left, right and parent nodes.

KingOfCoders4y ago

dominotw4y ago

Same here. Lots of places are doing this, afaik.

shopify from pyspark -> sql

https://shopify.engineering/build-production-grade-workflow-...

LeonB4y ago

I would like a typescript style transpiler tool chain for testing out new language features that are seamlessly transpiled down to existing sql.

Once that’s in place I don’t know which features I’d want first… but there’s a lot of them!

mcv4y ago

cm21874y ago

One thing I don't understand in SQL is why creating a tmp table is so verbose, why we can't use type inference.

   @t = select colA, colB from tbl

   select top 10 * from @t order by colB

lelanthran4y ago

> One thing I don't understand in SQL is why creating a tmp table is so verbose, why we can't use type inference.

> @t = select colA, colB from tbl

> select top 10 * from @t order by colB

Unless I am misunderstanding what you are looking for, 'SELECT INTO' works the way you want: https://www.postgresql.org/docs/9.1/sql-selectinto.html

It's on every RDBMS I've used, IIRC.

saurik4y ago

You don't have to do all of those things though? Just create table as select (or sometimes select into).

cm21874y ago

Thanks. I didn't know this syntax. You still have to drop it in the end but somehow I never ran into this syntax.

3 more replies

bvrmn4y ago

CREATE TABLE AS can help. As well as CTEs.

AtNightWeCode4y ago

Not that bad workarounds. The N + 1 problem is usually not a big issue with ORM:s but one should the check the generated code I think. Seen far worse written code. (Well, if you don't do SELECT *...)

I have other issues with SQL:

The linear way resources are needed with the amount of data but no built in way to handle it.

That integer ids are way overused and basically locking every database to a specific environment.

The index tweaking.

The workarounds for write speed.

The fact that you can do anything in SQL and people know it.

lenkite4y ago

zug_zug4y ago

Also "incompressible" - Sounds like the author doesn't use views/materialized-views.

Finally the "fragile" example is just the author writing a bad query. The example here is performant and less fragile: https://stackoverflow.com/questions/612231/how-can-i-select-...

etc.

bvrmn4y ago

> fk_join(foo, 'bar_id', bar, 'quux_id', quux)

This example has same amount of semantic entities as in SQL. Also there is USING. Also why author needs a strict modeling over json when one can model in native types? It's a very strange article.

hugodrax1574y ago

thom4y ago

1 more reply

Crash0v3rid34y ago

Knowing your schema design is just as important as knowing sql.

JoelJacobson4y ago

I suggest using the fact foreign keys are constraints with unique names, and using these names to explicitly specify what column(s) to join between the two foreign key tables.

Given this schema:

  CREATE TABLE baz (
  id integer NOT NULL,
  PRIMARY KEY (id)
  );
  
  CREATE TABLE bar (
  id integer NOT NULL,
  baz_id integer,
  PRIMARY KEY (id),
  CONSTRAINT baz FOREIGN KEY (baz_id) REFERENCES baz
  );
  
  CREATE TABLE foo (
  id integer NOT NULL,
  bar_id integer,
  PRIMARY KEY (id),
  CONSTRAINT bar FOREIGN KEY (bar_id) REFERENCES bar
  );

We could write a normal SQL query like this:

  SELECT
    bar.id AS bar_id,
    baz.id AS baz_id
  FROM foo
  JOIN bar ON bar.id = foo.bar_id
  LEFT JOIN baz ON baz.id = bar.baz_id
  WHERE foo.id = 123

Perhaps "->" could be used for this purpose, since it's currently not used by the SQL spec in the FROM clause.

This would allow rewriting the above query into this:

  SELECT
    bar.id AS bar_id,
    baz.id AS baz_id
  FROM foo
  JOIN foo->bar
  LEFT JOIN bar->baz
  WHERE foo.id = 123

Where e.g. "foo->bar" means:

  follow the foreign key constraint named "bar" on the table/alias "foo"

If the same join type is desired for multiple joins, another idea is to allow chaining the operator:

  SELECT
    bar.id AS bar_id,
    baz.id AS baz_id
  FROM foo
  LEFT JOIN foo->bar->baz
  WHERE foo.id = 123

Which would cause both joins to be left joins.

  SELECT
    bar.id AS bar_id,
    baz.id AS baz_id
  FROM foo
  LEFT JOIN foo->bar->baz
  WHERE foo.id = 123

[1] https://scattered-thoughts.net/writing/against-sql/

[2] https://www.postgresql.org/

Seb-C4y ago

This is interesting, I did not know this syntax.

Alternatively there are still the NATURAL JOIN and USING syntaxes that have been standard like forever.

JoelJacobson4y ago

I should clarify this syntax is only an idea, it's not implemented yet in any vendor nor part of the SQL standard, yet.

I think it would be a nice feature to add to the SQL standard.

gumby4y ago

SQL is a COBOL-era language — though there are 15 years between them, language theory was quite rudimentary at that time.

But it exists and is adequate. And, as Gabriel’s famous essay says, Worse is Better.

tritiy4y ago

Was this written by a nnet? I found it so hard to read as if it author has written it in another language and then used some weird translation engine.

cletus4y ago

The complexity of the SQL spec is a fair point. Inconsistencies between implementations has some merit but in practice doesn't really matter (eg how often do you really replace your database?).

A lot of the rest of it reads like the author started with this conclusion and then went looking for justification.

As for JSON, I honestly don't think anybody needs that. Either return a JSON blob (generally bad idea IMHO) or you need to construct it in code.

The example of join verbosity has issues too. First, abbreviated syntax would need to express what kind of join to do (eg inner vs outer). Second, I find this fairly natural:

    SELECT ...
    FROM a
    JOIN b ON a.id = b.a_id
    LEFT OUTER JOIN c ON b.id = c.b_id

The author instead used this syntax:

    SELECT
    FROM a, b, c
    WHERE a.id = b.a_id
    AND b.id = c.b_id

The also leaves the join type unexpressed. In some SQLs you say:

    AND b.id = c.b_id (+)

But that's kind of ugly and old-fashioned. The first syntax is preferable and clear.

Writing extensions (eg functions) should be discouraged. It's harder to deploy and debug and the last thing you want is a badly written C function crashing your database.

Years ago we also had stored procedures (eg Oracle PL/SQL) and nobody does that anymore because it's terrible. You don't want that.

There's a lot in there about pathological corner cases that I honestly don't really care about.

I do agree that ORMs are generally a disaster.

I will say that SQL got the order of clauses wrong whereas LINQ got this right. SQL should actually look more like this:

    FROM a
    WHERE a.foo = 'bar'
    SELECT id, col1, col2

Honestly though, SQL just isn't "broken". That's why it's endured so long despite the NoSQL fad and various efforts to replace it.

patkai4y ago

Am surprised that in such a long thread nobody mentions RethinkDB.

ngrilly4y ago

RethinkDB is mentioned in the article.

boobsbr4y ago

I used RethinkDB for a while, before migrating to Azure Table Storage, and it was nice to use.

chubot4y ago

Amazing critique! It has a wealth of examples -- I liked the "N+1 query bugs" and "feral concurrency" links (stuff I've experienced but didn't have a name for).

----

The comparison of SQL vs. flink windowing ("kernel space" vs "user space") reminds me of the this 2013 call to change the design of browsers feaetures:

https://extensiblewebmanifesto.org/

Basically there's a lot of stuff implemented stuff in the C++ layer of the browser that's impossible to emulate in JavaScript, and that's a bad design.

----

I also wonder if the author has worked with dplyr and the tidyverse at all? He mentions Pandas, but IMO it's a clunkier imitation of those ideas (and I'm saying that as a Python programmer).

Tidy data was my intro to the design of dplyr: http://vita.had.co.nz/papers/tidy-data.html

It's very inspired by the relational model, but it has a few more operations like "gather" and "spread" which turn "long" format into "wide" format and vice versa.

It has a clean and expressive API: https://www.rstudio.com/wp-content/uploads/2015/02/data-wran...

It composes like regular code, so you can write stuff like:

    bin_sizes %>%
      select(c(host_label, path, num_bytes)) %>%
      left_join(bytecode_size, by = c('host_label')) %>%
      mutate(native_code_size = num_bytes - bytecode_size) ->
      sizes

Good comparison of the relational model and data frames: Is a Dataframe Just a Table? https://plateau-workshop.org/assets/papers-2019/10.pdf

I link all of these in What is a Data Frame? (In Python, R, and SQL) https://www.oilshell.org/blog/2018/11/30.html

pjmlp4y ago

> Why did SQL have to add it to the language spec?

Most likely because there isn't cargo for SQL, everyone has to make do with a default install offers, and most big boys databases offer FFI to Java, .NET and C.

> This works for data modelling (although it's still clunky because you must try joins against each of the tables at every use site rather than just ask the value which table it refers to)

Only if one never learned what views are for, and the various flavours they come in.

> By far the most common case for joins is following foreign keys. SQL has no special syntax for this:

    select foo.id, quux."value" 
    from foo
    inner join bar on foo.bar_id = bar.id
    inner join quux on bar.quux_id = quux.id

Really, how much time was spent learning SQL before complaining?

bokwoon4y ago

pjmlp4y ago

Yeah, it reads like going down the path to write some SQL like parser without actually having used SQL in anger.

mickael-kerjean4y ago

After years of doing that same technique, in my new job, people would write:

   SELECT foo.id, quux.value
   FROM foo, quux, bar
   WHERE foo.bar_id = bar.id AND bar.quux_id = quux.id

I couldn't find anyone telling me the difference between those 2 ways to write a query, do someone know more about this?

kthejoker24y ago

There's no performance difference in the engine.

The way you have written it (ANSI-89)used to be the only way joins could be written.

An SO question on the topic

https://stackoverflow.com/questions/334201/why-isnt-sql-ansi...

trapatsas4y ago

There are two types of people in the world. The ones that are pro-SQL and the ones that don’t understand how SQL works

Kuinox4y ago

I invite you to read the article.

ltbarcly34y ago

Lots of the examples here are yhe author writing very poor, non idiomatic SQL and then criticizing it.

I could write a point by point rebuttal but I'll just pick one point, compressibility: VIEWs.

kinjba114y ago

wruza4y ago

And much worse problem is naming them correctly. And maintaining in such global-only space.

chris_wot4y ago

Yeah, some of these subqueries could be fairly easily rewritten.

bullen4y ago

I thought they where talking about the data not being able to compress, the actual queries don't need to be compressed.

We need to approach the database as files, even add features to our filesystems to accomodate that.

In my distributed HTTP/JSON database I use ext4 type small to not run out of inodes before disk space.

kinjba114y ago

Should the filesystem matter? I'd assume a database would just allocate a large chunk of disk space and memory and do what it will with that on a layer closer to the metal than files.

bullen4y ago

Files are closest to the metal.

That's where we can improve things most for everyone.

But drivers need to be compatible with everything and hardware needs to also be compatible with everything, there is no progress.

To really change things we should design hardware (disks and monitors) that have drivers that only work with them.

Also looking at directory compression that does not require you to uncompress the whole directory to get a small subset of files.

historyloop4y ago

Did the author forget that we had this entire "NoSQL" period that lasted well over a decade, where SQL was the worst thing ever, and everyone kept coming with the superior alternatives to SQL?

What happened?

So those are the facts. If someone still believes they know better, put up or shut up.

boxed4y ago

Those were afair not even relational systems. So doesn't apply here. The article clearly states so in the very first sentence.

justshowpost4y ago

So the reason behind that endless «SQL bad» teeth gnashing turns out to be very simple.

tonymet4y ago

Relational Tables & SQL should be just one storage mechanism for your app.

What if someone told you: build an app, but only use b-trees? Then you start complaining about all the shortcomings of b-trees.

The point is that you have relational tables / SQL, along with many other persistence , storage & indexing mechanisms: distributed hashtables, queues, lists, etc.

All the apps I've worked on have mixed SQL with all of the other data structures with consistent or inconsistent replication among them depending on the use-case.

One way to manage this is a key-value online tier and a relational offline tier, with inconsistent replication online to offline.

SQL & RDMBS are very powerful, but like any tool, limited to the designated use case. Stop trying to make it do everything.

roenxi4y ago

One of the elephants in the room with SQL is that it is one of a small number of popular languages that doesn't use

    function(arg, arg, arg)

With full knowledge of SQL, the successful languages that followed it were C/Python/Java/Javascript that use lots of functions and a smattering of special syntax for control structures.

dbsmith834y ago

mongol4y ago

roenxi4y ago

What advantages do you think

    SELECT a, b, c FROM d

has over even a trivial modernisation like, say,

    table(d) |> select(a, b, c)

?

5 more replies

goto114y ago

LINQ in C# supports a function-based syntax for querying. E.g. instead of

   SELECT foo, bar FROM baz WHERE zig = 7

You can write

   db.baz.Where(baz => baz.zig == 7).Select(baz => new { baz.foo, baz.bar });

It does have some nice properties compared to SQL, but it also very quickly becomes incomprehensible. E.g. the join syntax in SQL:

   SELECT baz.foo, wawa.bar FROM baz JOIN wawa ON baz.id = wawa.baz_id

looks like:

   db.baz.GroupJoin(db.wawa, baz=>baz.id, wawa=>wawa.baz_id, (baz, wawa)=>new { baz.foo, wawa.bar} );

I don't think anybody would find this easier, and C# actually added additional custom syntax, so you could use more SQL-like syntax instead of the method-based syntax.

croes4y ago

So they wanted it to be easy for non programmers, more natural language like, so functions and brackets are quite the opposite.

zozbot2344y ago

Mixfix syntax is not that weird. Objective-C is one well known language that uses it, for example.

pjmlp4y ago

Here is your function(arg, arg, arg)

    CREATE [OR REPLACE] FUNCTION function_name (arg, arg, arg)
        RETURN return_type
    IS
      ---
    END;

Then

    SELECT function_name (arg, arg, arg...) FROM dual;


    SELECT columns FROM xpt where xpt.id = function_name (arg, arg, arg...);

    IF function_name (arg, arg, arg...) = ... THEN ...

rswail4y ago

well, except "FROM dual" is an Oracle-ism because they require a FROM in a SELECT.

1 more reply

j / k navigate · click thread line to collapse