>> The core message [...] is that there is potentially a huge amount of value to be unlocked by replacing SQL
To me, a lot of people defends SQL saying that "perfect is the enemy of good" and that SQL simply works. Not the favourite of anyone, but everyone kinda accepts it.
And yeah, it's true. People use SQL because it's good enough, and trying to reinvent the wheel would take more work (individually speaking) than just dealing with SQL as it is right now. For large organizations where the effort could be justified, all your engineers already know SQL anyway, so it's not so great either.
But for something so relevant as relational databases, perfect is not the enemy of good. We do deserve better. We generally agree that SQL has many pitfalls, it's not great for any kind of user (for non-technical users, a visual programming language would work well here, more like what Airtable does, closing the bridge between spreadsheet and hardcore database, and for technical users, it does feel unwieldy and quirky). We should be more open to at least consider critiques and proposals for better. We might find out that people, from time to time, are making some good points.
I agree with Jamie. I think SQL is irritatingly non-composable, many operations require gymnastics to express, and I'd like a more expressive and regular language to write queries in.
I also suspect that such a language would be harder to optimize, and might be more practical to implement if it was less declarative and closer to the plan.
I also think that as you scale up, offloading computation to the database is a false economy. It's closer to the data, but the database itself is a bottleneck. As you scale, you want to limit the complexity of your queries. But this is actually an argument for things like Materialize, i.e. making copies of the data elsewhere that bake in some of the computation ahead of time.
One approach to radically simplify operations with data is to use mathematical functions (in addition to mathematical sets) which is implemented in Prosto data processing toolkit [0] and (new) Column-SQL [1].
[0] https://github.com/asavinov/prosto Prosto is a data processing toolkit - an alternative to map-reduce and join-groupby
[1] https://prosto.readthedocs.io/en/latest/text/column-sql.html Column-SQL (work in progress)
On paper, a more expressive language that spits out SQL queries sounds great, but I've never seen a single one not become a pain in the ass to use.
Speaking of which, what has happened to hadopp and HDFS? Used to be flavor of the month about 10 years ago, but now I hardly ever hear people talk about it.
Trying to change to the Dvorak layout taught me a lot about enacting change on such a grand scale.
After a lot of hassle switching machines and OSes, typing on other user‘s computers, them typing on mine and general headaches among internationalization and inaccessible key combinations, I switched back to QWERTY in the end.
As said in other threads, there is no shortage of attempts to replace SQL. A lot of them are pretty good. But having learned SQL the hard way, I feel zero urge to learn another language from scratch right now.
It‘s why nearly all big databases eventually switched to it despite the FORTRAN vibe and its general ugliness.
Anyway, probably time to SET TRANSACTION LEVEL READ UNCOMMITTED and call it a day ^^
For example, in iOS development (and, more in general, on Apple platforms), there has been a huge shift from Objective-C to Swift.
The same arguments should apply there. Swift is much better, but Objective-C got the work done, and many codebases were written in it, especially at Apple. And yet, the whole community switched pretty quickly.
One could argue that Swift was easier to pick up for newcomers. While that's true, I would then expect the argument to apply also to SQL alternatives.
So, what is the difference here?
On the other hand, the fact that many new time series databases (and other engines, including Materialize which the author of the article worked on) wanted sql (or fake sql-like languages when they couldn’t manage sql) suggests that not being sql can be a big hindrance.
Off-topic, but changing layouts is easy. Also, using other people's keyboards is not the best for hygiene even before covid.
Stop fighting SQL so much, and just focus on bringing a better solution. If potential users see it has significant benefits they'll start using it.
Many of these blog articles seem to be written by devs that have only really experienced the user-facing application side of things, and really don’t realise the sheer number of financial, analytic & business systems, that keep the world spinning round, that are all happily buzzing away using SQL.
Moaning about JSON manipulation in SQL is madness. A client should use appropriate joins to get the data from the database & then do the transformation. Storing JSON when you actually need to perform operations on said data makes 0 sense. It’s a huge overhead, an utter nightmare to index etc.
I think that's why NoSQL is where the gains are being made. You can't magically improve things with a query language without putting the effort into performance, but you can get huge gains by changing your assumptions about how data is structured.
There were multiple optimizing compilers which can do a lot of these asks, allowing composable queries that return nested types, but produce SQL queries. I think the pathfinder compiler had the most real-world use, it was meant to efficiently mix SQL and xquery to query SQL+XML documents stored in postgres. It had a c# linq frontend, but also fairly hefty compile times.
But in my world, SQL is much more of a Human Data Interface than Application Programming Interface. SQL is used for both ad-hoc and automated data querying and transformation. It is something manufacturing intelligence and supply chain experts and researchers and others learn to empower themselves to get to the data - and yes, you won't see any CEO running reports in SQL themselves, but it is not for programmers only.
Those people would not benefit from syntax with more brackets instead of words, easier way to define and call functions, or the ability to install libraries - in fact I think it would make it harder and less accessible for them.
The OP is right that for machine-to-machine interface, all the syntax baroqueness is not worth it. And of course, having more portability would be great.
But while machine-to-machine interfaces written by skilled developers who knows exactly what are they doing might actually be the most common use of SQL (since half the internet runs on Wordpress and MySQL and most smartphone apps might have SQLite in them), it is not where the majority of complexity or time is spent - that one is with the people working with data.
In the event that nothing exists that is right for the job: don't worry about replacing SQL, worry about creating/using a matched DSL/API/(whatever is appropriate) and data store for the particular problem you're trying to solve that SQL isn't a good fit for. If you create something that solves its particular problem even half as well as SQL did for relational databases you'll probably have a winner in that space that people with the same problems would be happier using.
Scalable databases are just so difficult that we’re still driving a ‘64 IMPALA
Most of this opinion comes from “SQL” being vendor-specific. Is JSON vendor-specific? Is anything else, that we actually use by choice?
Mad at you too, Graph DBs, for sending us on another snipe hunt by adding vendor-imposed innovations, because it makes the enterprise marginally profitable. It’s how the world works, I’m still not happy about it.
(Disclaimer: I possibly just traveled back to 2002 and said this on slashdot)
Two things that come to mind are Markdown with all its flavors and Regex with multiple engines.
edit: and to a lesser extent maybe C/C++ compilers and JS engines.
edit2: also JVM, Python and Ruby runtimes
But both edits describe technologies with an official spec and slightly different implementations. Markdown/Regex are more comparable to SQL because they have vendor-specific syntax.
What type of database engine is used to execute that query is independent of the language. It could be highly scalable or it could be focussed on single user, single process.
What is the problem that needs solving?
Totally! SQL is like shell, c and js that somehow generate this "never ever try to improve them for real, lets stay suffering for all eternity!"
And the AMOUNT of energy in mask them (transpilers, linters, "good practiques", tricks, etc) is a monumental waste of effort that cost so much than actually go and fix the core issues.
And in this case, SQL is the most trivial to fix. It could get simplified and streamlined far easier that is to replace C, for example.
And a lot of interactions to the DBs are using bridges already. Create a "good sql" is similar to create WASM and then for legacy reasons support in EOL fashion all the old sql and all the new go against the new way.
But this mean at least 2 or 3 major database vendors go for it (I dream: sqlite & postgresql).
P.D: I'm exploring building a relational language, so already have a simplified dialect for it: https://tablam.org/tutorial
In this you might be able to guess my objection to many of OPs points. Because the key thing we look for when we decide to implement tech isn’t how good it is, but how easy it is to work with and more importantly, how easy it is to find other people who can pick up when someone’s career take them forward in life.
SQL is the one technology that has been stable enough over the years that no matter who we hire, we can always count on them being capable of working with our Databases. Yes, some of the more expansive functions, especially some of the recursive ones, take a little time to get accustomed to and there will always be some back-and-forth preferences in regards to stores procedures, but on a whole, it’s the one technology that never causes us any pain and never requires retraining.
I’m sure it’s not efficient, and OP is completely correct about the parts about the untapped potential, and I fully support people developing rational alternatives to SQL, I’m simply pointing out why SQL has been the power horse it has since, well basically, since we moved beyond punch cards and why it’ll likely still be so in 50 years.
Because at the end of the day, technology doesn’t really matter to the business. Cost vs benefit does, and as long as it’s easier and therefor cheaper to have people work with SQL (in the big picture), it’ll be what businesses work with.
The only reason you don’t see something similar in declarative programming (or even functional) is really beyond me but it probably has to do with how things evolved and how much easier it is to change that part of the tech stack compared to your databases. If we get a new hire who can do magic with JavaScript/Python/whatever we may consider allowing him/her to do that. We don’t want to, again because the big picture is easier to manage with fewer technologies, but it’s much easier to justify the cost-benefit of that compared to someone wanting to use a different database than the one in our managed cluster. Already you need more staff than that person to maintain it, as developers don’t run database server operations, aren’t part of the contingency staff and so on.
Like I said, I fully encourage the direction OP wants to go. Enterprise rarely invent things, we’d be happy with ball frames if that’s what made us the most money, so we need people like OP to invent things for us to adopt.
I also happen to really dislike how the author hasn't capitalized the syntax like SELECT FROM WHERE or CREATE TABLE which, to me, poorly affects the legibility and therefore makes me less interested in reading the argument overall.
Unfortunately, building a new database is a huge project and there appears to be no party currently willing to sponsor it.
I would also guess that we could have a better SQL but I do not think it could and should look anything like a general purpose programming language because otherwise you might get in the way of efficient query execution. Maybe some kind of declarative data flow description with sufficient facilities to name and reuse bits and pieces.
And maybe you actually want two languages which SQL with its procedural parts already kind of has. One limited but well optimizable language to actually access the data, one more general language to perform additional processing without having to leave the server. Maybe the real problem of SQL lies mostly in its procedural part and how it interfaces and interacts with the query part.
Probably in exactly the opposite way: the limitations of SQL put a lot of work on the back of the query optimiser without allowing for said optimiser to easily reason about the queries, or for the writer to easily feed the optimiser (short of dedicated extensions e.g. Oracle's optimizer hints).
Or so I would think, I've never heard of SQL being praised for its optimisability.
An example here is how, sure, in theory, JITs can outpace AOT compilation because they have all the information the AOT compiler has plus runtime insights. But the ability to truly do that always seems to be a decade of compiler development away, with many giving up on the idea entirely.
It's also important to consider what we're comparing SQL's optimizability against. If it's against typical NoSQL databases, most of which seem to favour a lower-level query specification, I can defend SQL's optimizability to the end - with SQL databases having the freedom to dynamically switch algorithms to adapt to the actual contents of the database at the time of the query. Something which, ironically, a stale optimizer hint (i.e. written with the size & shape of the database as it was last year in mind) can actually get in the way of. Not that I'm saying that SQL planners never produce stupid query plans of course.
Maybe? But I kind of feel that the query optimizer often gets in the way of efficient query execution.
There’s so much stuff that’s semantically the same, but if you shuffle some strings around, the query is suddenly 10 times faster.
There are quite a lot of pretty basic things many programmers would want to do that just are way more stupid than they should be. One that irks me is things like "return me the top 100 items", "return me/join against the most event for item X in a history log", etc) that end up requiring way more shit than they should just because there's no standards-compliant way to say "select first X rows from ... where ... order by ..." or "join first row where .. order by ...". In the case of top 100 you just wrap it in another query, but for more analytical stuff like "join the most recent X for this item" you have to use window functions and the syntax fucking sucks.
Since you mention optimization, perhaps it would help to allow the abstraction to be peeled away and write clauses that express imperative commands. Like being able to write that what you want is an "index join table X on index Y", etc. That's sorta what hints do, but roll them into the language in a portable way. It could also allow the query planner to look at it and tell you "that's just not possible and you need to check your keys/indexes/queries/etc", rather than having to run it and guess what it's trying to do.
Because I kinda feel that's a lot of the tension for SQL queries (beyond the QOL stuff that everyone knows is stupid). It's a guessing game. You're trying to write queries that will be planned well, but the query planner is not a simple thing that you can easily predict. The closest analogy is probably something like OpenGL/DirectX where you're just guessing at what behaviors might trigger some given heuristic in the driver and there are a lot of unseen weights and flags. There are "equivalent" SQL expressions that are much heavier than another seemingly similar one (like "select from ... join ... where exists (select ...)" vs "select where x in (select ...)". There are operations that are mysteriously expensive because you're missing an index or the query isn't getting planned the way you expect.
The suggestion of a "procedural" expression is, I think, also probably correct for some situations. PL/SQL functions are extremely useful, just obnoxious and in some cases arcane to write (as someone who never had formal education in DBA). It would be even nicer if you could have an embedded Python-style thing that let you have "python" syntax with iteration and list comprehensions and shit, that represent DB queries using cursors and shit, and perhaps defer execution until some final "execute" command while transforming it all into a SQL query. Like C#'s LINQ but for Python, but instead of buffering streams of objects transform it into a database query. Transform operations/etc on fields into SQL statements that work on columns, etc.
Or java if you will. Imagine a Java Streams API that compiles down to planner ops. I know Hibernate Criteria and JPA exists, but skip that API and express it as streams: iterations and transforms and so on, and map that onto the DB. Being able to build subqueries, then attach them to larger queries, etc. That way they execute in bytecode rather than pl/java.
Problem with query optimization is that it needs to be done at runtime, you can't optimize it in some procedure language easily. The optimal way to retrieve data depends on the number of records in your tables, the where clauses you use, the actual values you are filtering by, the records already in cache, etc.
99% of all programmers would not be able to program better performing queries when doing this with a procedural language or streams expressions or it would take them way too long.
Uses python expressions and generates SQL.
Also does static typing, so you can run a type checker on the code
When you are sitting in a properly normalized database, it is a lot easier to write joins and views such that you can compose higher order queries on top.
If you are doing any sort of self-joins or other recursive/case madness, the SQL itself is typically not the problem. Whoever sat down with the business experts on day 1 in that conference room probably got the relational model wrong and they are ultimately to blame for your suffering.
If you have an opportunity to start over on a schema, don't try to do it in the database the first few times. Build it in excel and kick it around with the stakeholders for a few weeks. Once 100% of the participants are comfortable and understand why things are structured (related) the way they are, you can then proceed with the prototype implementation.
Achieving 3NF or better is usually a fundamental requirement for ensuring any meaningfully-complex schema doesn't go off the rails over time.
Only after you get it correct (facts/types/relations) should you even think about what performance issues might arise from what you just modeled. Premature optimization is how you end up screwing yourself really badly 99% of the time. Model it correctly, then consider an optimization pass if performance cannot be reconciled with basic indexing or application-level batching/caching.
A not-shitty schema is always to be preferred over a shitty one, to be sure. But for any data schema, there will be queries which cut against unexpected joints (joins?).
And SQL is bad for this. The entire Fine Article is a detailed exploration of how it's bad for this. A decent query language would keep the easy things easy (SQL is ok at this) and wouldn't make the hard things pointlessly difficult or impossible-except-at-the-application-layer.
Sometimes it's just a weird or one-off query, and sometimes there are so many of them that migrating to a better schema is indicated. Since it's difficult to do the dozen or so things the article sketches out, it's also difficult to iterate your way to a better schema arrangement.
The sort of waterfall database design you advocate (which, yes, think long and hard about your data!) always breaks down, because building business logic is a process of discovery. There are always missing pieces.
Which one of the frustrations from the article boils down to fighting against shitty schema?
It is really easy to take a negative stance on a technology and poke holes in it without having some grounding in reality. If you are dealing with a problem domain that has hundreds of types/facts/relations, and the need to author logic spanning all of these things all at once, you are going to have a hell of a time managing this with any other technology. Looking at SQL through the lens of toy problems is a massive mistake. You have to look at it through the lens of the worst problems imaginable to begin seeing the light.
I used to think SQL was some ancient trash until I saw the glory of a 40 table join in a factory automation setting. If you need dynamic, hard answers right now, SQL is the only option that makes sense at this kind of scale.
I felt it would contribute more meaningfully to the overall discussion on HN if I were to sidestep a direct critique and present a view of the problem from a different perspective.
Personal experience incoming: At startups, it's usually a mess because hiring someone who knows databases seems to always come so late in the game. At bigger corporations, well, hopefully the developers and database people get along and talk - otherwise, one of those teams is going to be a bottleneck.
> Model it correctly, then consider an optimization pass if performance cannot be reconciled with basic indexing or application-level batching/caching
So true. This also extends into general purpose languages, everything is so much easier when you take the time to model things correctly.
Schema design always results in endless "are we gonna need this?" questions, and usually everything is postponed down the road because it adds needless upfront complexity.
All data are naturally triplets, and with the relational model we are forced to group the triplet attributes into tables. The difficulty arises because these groupings are often incorrect, and then difficult to change. There is always premature optimization, because the SQL query statements become insane with too much normalization.
A bigger problem is how sticky schemas are due to how difficult they are to change, and how difficult it is to update your code to support a schema modification.
I think these two problems need more attention.
We've gone too far down the code-first / code-generation ORM approach which makes tinkering difficult.
I think all database design should happen visually. MS Access style. You should design your schema alongside your data (as you suggested by using spreadsheets). But instead of then transferring that to code, it should be a living schema that connects to your IDE code refactoring. The more declarative queries are and the closer they are to what needs to be finally rendered the more chance of reliable refactoring. The more imperative code modifying the queried data, the more complex and difficult to refactor things become. A lot of the cruft in a codebase is when schema changes are additive because people are too lazy or too risk averse to refactor.
E.g. Think about a User having one address, and then needing to support multiple addresses. There's quick and dirty way by adding more columns or adding to a JSON column, or we add a new Address entity and keep the existing User address columns, or we delete those columns and migrate data to the new Address table - which is the cleanest way to represent this schema change. I think there are few who would do the last option though, and this is the problem. Hiding this behind an API also causes more problems down the line because our frontend data model starts to separate from our actual DB data model.
We need better tools, and less code.
>create table json_value(id integer);
>create table json_bool(id integer, value bool)
>create table json_number(id integer, value double);
No, the usual response is "Don't do that!"
99% of the time you either know the data types (so each JSON object becomes a row in a table where the column names are keys) or you don't know the data types and store the whole object as a BLOB
I'd be on board with adding unions to SQL, but I doubt I'd use that feature very often.
That said, I'm not averse to the idea if someone can provide a realistic use case. The JSON example in the article is misguided though - you should not save the structure of the syntax of a serialization format, you should save the data model which is represented by the syntax.
You're right, unions are everywhere. Right now a human has to think about each union and how to represent it in a database. It would be really cool if I could store capnp objects like the one below and still get optimal query performance and aesthetics without thinking about it:
struct Shape {
area @0 :Float64;
union {
circle :group {
radius @1 :Float64;
}
rectangle :group {
width @2 :Float64;
height @3 :Float64;
}
}
}But I also feel that maybe they are asking a bit much from SQL. The complaint that complex subqueries are complex... Well then don't use them? I would use WITH constructs in that situation because I find them easier to read but that's beside the point. I think its perfectly fine to pull out multiple result sets from simple queries and then do the complex stuff in your host language.
But this article is thought provoking to say the least. It follows the courtroom logic of holding the defendant SQL on trial for as much as possible. And SQL is guilty of a lot of crimes.
I do hope GraphQL and similar query languages become more prevalent and standardized, as it seems SQL could really use some stiffer competition.
Going to graph databases is certainly throwing the baby out with the bathwater. The relational model was invented to address shortcomings in the hierarchical and graph database models.
In GraphQL, you can select fields of JSON objects, optionally a field may have parameters which affect the values it returns in unspecified ways. That's it. Because of this design, unlike in SQL where you are concerned with modeling the structure of your data, GraphQL also requires you to think about the API all users will use to access the data. In SQL, I can write a table defining the structure of my dataset, and then users of that table can perform arbitrary queries when they know what data they need (aggregate with a month average grouped by account ID, filter to only rows where x = 7, JOIN this to grab some data from some other table etc.).
GraphQL has no aggregates (sum, average...), no grouping, no joins, no sorting, no filtering, other than what you manually design each using parameters at schema design time. Good luck anticipating the use cases of every future consumer of your data. Miss one? Better dig back into your implementation code & implement that use case each & every time a new one comes up.
The only part of GraphQL that is standardized is the query syntax. In SQL, the actual underlying relational data model exists and the syntax of queries exists within that context, not so in GraphQL land. In SQL, I define my data structures, and users can write queries and access the data. But GraphQL throws up it's hands and says "not my problem, try one of these libraries that ask you to implement your own custom data access functionality for all your data types".
OK, so it's a rubbish query language, but even the graph part of the name is misleading. Assuming that you even have a domain that it makes sense to model with a graph of types, GraphQL provides you no tools for dealing with the backend complexity of such a design. Because the syntax is so simplified, there is no mechanism within the syntax to define rules about relationships between types. For example, imagine a simple parent/child relationship. There is no mechanism within the syntax to tell GraphQL that for parent X, parent.child = parent.child.parent . So you can't even think about writing a GraphQL query optimizer, because there isn't enough information about the structure of the data encoded into the schema or query to do so.
So in practice no GraphQL implementations that I know of have anything resembling a query optimizer - someone asks for cart.item, and then item.cart for a cart with 1000 items? Have fun re-requesting the cart from your DB 1000 times (yes you can cache the item by ID to save a DB lookup, but we shouldn't even need to hit cache here! Every programmer involved knows the data is the same, it's just dumb GraphQL has no clue about the details of the relationship).
It seems like having multiple vendors is only valuable if their products are to some degree differentiated, no?
As to SQL. Its a weird feeling to read all that; I've spent 15 years working with very large relational databases -- and about 5 years ago I ditched it all in favor of using key-value object stores, wasting some storage but saving a metric ton of manhours of development work. Not looking back.
Mostly the noSQL pattern is a mistake because you almost certainly have columns that recur reasonably frequently, but if you do have high-dimensional data or want to store processed documents/etc, you can represent them as a JSON/JSONB in postgres/etc, and even run indexes or queries on them.
> The original idea of relational databases was that they would be queried directly from the client. With the rise of the web this idea died - SQL is too complex to be easily secured against adversarial input, cache invalidation for SQL queries is too hard, and there is no way to easily spawn background tasks (eg resizing images) or to communicate with the rest of the world (eg sending email). And the SQL language itself was not an appealing target for adding these capabilities.
> So instead we added the 'application layer' - a process written in a reasonable programming language that would live between the database and the client and manage their communication. And we invented ORM to patch over the weaknesses of SQL, especially the lack of compressibility.
>When Ray and I were designing Sequel in 1974, we thought that the predominant use of the language would be for ad-hoc queries by planners and other professionals whose domain of expertise was not primarily data- base management. We wanted the language to be simple enough that ordinary people could ‘‘walk up and use it’’ with a minimum of training.
In that case it has been an abject failure. I have been using SQL since the mid 1980s (so pretty much since the start of its widespread adoption) and I have never met "ordinary people" (by which I assume intelligent business-oriented professionals) who could (or wanted to) cope with it.
I like it, but the idea of sets does not come naturally to most people (me included). But I once worked with a programmer who had never been exposed to SQL - I leant him an introductory book on it and he came in the next day and said "Oh, of course, it's all sets, isn't it?" and then went on to write some of the most fiendish queries I have ever seen.
If SQL was designed "by engineers for engineers", you would be using esoteric Git commands just to blow off steam.
Any arguments that "users will write their own" languages are basically flawed. Users want results, if there's no alternative, they'll do it themselves, in the simplest, but probably most inefficient way possible.
If that sounds interesting, you can find it here: https://github.com/erezsh/Preql
df{\*, -column1}
df{column1: new_name1, column2, column3: new_name3}
I noticed in working with pandas you often need to do lookups into other dataframes. It's partially solved by assignment operator if left field equals right index, or .map method, the same way.But often you need a lookup with merging by an arbitrary column, then grouping and aggregation by the left table items. This is partially doable without special functions. But this can be a killer feature if one makes this for spatial joins.
Very often you need to do the following:
gdf1 = geodataframe of points
gdf2 = geodataframe of points
need to make gdf1.geometry.buffer(500 m) and sjoin it with gdf2.geometry, then lookup gdf2 fields and bring them to gdf1, and keep original gdf1.geometry (points). This operation takes a dozen of lines and leaves lots of variable garbage if not put into a separate function.But IMO it could be condensed to something more natively supported.
Something like.. BQL http://intelligiblebabble.com/a-better-query-language-bql-la... which went nowhere https://github.com/lelandrichardson/BQL
My approach has been to design a very simple (in the lisp sense) syntax, kind of the opposite to SQL where everything is hard-coded into the parser. I've adopted a "pipeline programming"-like approach, where the operations are just (special) functions, which also helps with extensibility. Have you thought about this? From a cursory look, it seems Preql does rely on keywords. Admittedly fewer than SQL, but it also doesn't cover all of its features.
Having said that, the amount of keywords and operators that you see in Preql right now, isn't likely to grow by much. I have the basic and common operators covered, and for the most part, the rest can be done with regular functions.
I agree about introspection, which is why in Preql you can ask for the type of any value, list the columns of a table, get the resulting SQL of an expression as a string, and so on. And certainly more can and should be done towards that.
I think many of the comments here are missing the point by saying "Oh you can get around that issue in that example snippet by doing X Y Z". Sure there are workaround for everything if you know the One Weird Trick with these 10 gotchas that I won't tell you about... but that just makes the authors point.
We can do better. We deserve better.
What could things look like if you could radically alter the SQL language, replace it altogether, or even move layers from databases or applications into each other?
Who knows if it will be better or worse, but I'd like to find out.
As someone who frequently used SQL for analytics and less frequently for app development, I would gladly use a language that would transparently translate to SQL while adding some syntactic niceties, like Coffeescript did to JS:
- Join / subquery / CTE shortcuts for common use cases (e. g. for the FK lookups that are mentioned in the article)
- More flexible treatment of whitespace (e. g. allow trailing commas, allow reordering of clauses etc.)
And for the language to be usable, it would probably need: - First class support for some extended SQL syntax commonly used in practice (e.g. Postgres's additions)
- integration with console tools (e.g. psql), common libraries (e.g. pandas, psycopg2) and schema introspection tools
- editor support / syntax highlighting.
It would probably be good to model the syntax of that language on some DSL-friendly general purpose language (like Scala, Kotlin or Ruby).
It's a skill I used every working day, I'm pretty sure I will still use it in 20 years.
On the other side, tt's very unlikely that the ORM 'du jour' will exist in 3 years from now.
Isn't a union type essentially a de-normalized field?
This seems like attacking arithmetic operators for their lousy character string support.
Weren't XML databases (briefly) a (marketing) thing some decades back?
One idea might be to have everyone integrate jq[1] into their database engines. My understanding is that one can make the JSON do back flips with jq. Then we can move to complaining about queries that appear to have been written in Klingon instead of boring ol' SQueaL.
No? You have to denormalize to emulate unions when they're missing. Sum types are a fundamental category of types, that SQL only supports product types is a problem you have to work around.
The argument for union types seems to get weak when one asks: how do we index their components?
Because there seems little middle ground between needing discrete fields and safely just parking the data as a memo field and deferring the management to the application.
Unix win by letting the system utilities specialize.
SQL need not be "one language to rule them all".
If you are working with unions and, for whatever reason, want to put unions into a data store, this may seem to be a capricious limitation, but this rule, together with the other principles Codd stipulated, are the basis of the semantic transparency of the relational model, in which each relation expresses an atomic fact about the world of discourse. This, in turn, is the basis of its desirable features, such as its openness to ad-hoc querying, and the applicability of referential integrity constraints.
To look at it in more concrete terms, suppose you had an attribute with a union type: the meaning of any particular bit-pattern would be ambiguous - it might depend on the value of a different datum, or, worse, be context-dependent in a more complex way. This is going to make querying more complicated, whether you are using SQL or some replacement for it, and while one or two cases may seem expedient and harmless, these are the sort of accommodations that, as they accumulate, lead to programs becoming hard to understand and brittle.
At this point, I am unsure whether it would be acceptable, within the relational model, to have types that are, structurally, a union together with a flag disambiguating it. On the one hand, this would avoid the problem of disambiguation I mentioned above, but if it were implemented in such a way that the flag value is independently queryable and/or settable, that would seem to open a back door to let in all the seems-expedient-but-ends-badly design choices that raw union types would facilitate.
Increment on SQL, write a translation layer, and see if people adopt it. Maybe 10 years from now your idea will be more popular than standard SQL. Most likely your idea sucks though and you will stay in the easy land of criticising things.
The front-end is infinitely more complex than SQL on the backend. I write fairly common web applications and the SQL part is maybe 10% of my time, and very easy. React is where I spend most of my time. I don't have any problem that really needs to be solved. SQL works for me even though it isn't perfect. Any imperfections can most likely be incrementally fixed. I use tagged templates in JavaScript to deal with parameters, composability, and reusability.
The fact the the author highlights GraphQL as supposedly the great alternative shows how ridiculous the proposition is. GraphQL does basically nothing. It is 10% of the functionally of SQL.
I'm curious how much work has been done on optimizers for Tutorial D or other D variants. It looks way nicer to use, but I wonder if it is easier to stumble into pathological cases.
https://clickhouse.tech/docs/en/sql-reference/data-types/nul...
This approach doesn't resolve all of the author's complaints but it does solve many.
Disclaimer: I'm the author of Opaleye. Rel8 is built on Opaleye. Other relational query EDSLs are available.
[1] https://github.com/tomjaguarpaw/haskell-opaleye/ [2] https://github.com/circuithub/rel8/
Truly it is blind men evaluating an elephant.
Given SQL's roots as a human-friendly declarative interface, the only thing I see completely replacing it in the near future is a Copilot-style neural implant where you just think of the results you want.
I agree with the desire for a data-based language, rather than text-based one as SQL is. A classic example of this is MongoDB: you can add a new filter by just adding a new entry to a dict in Python or object in JS etc. I think 99% of the reason MongoDB was successful, at least in the early days, was because of its data based API. (Polite request to all: please don't reply to this comment with pros/cons of MongoDB except it's query language.)
I especially agree with the point about having to trick query planners into using the indices you wrote. I get that sometimes it's nice to let the database engine cleverly choose the best strategy (dynamically building queries with a data-based API would be a case in point). But in other situations you'll have carefully designed the tables and indices around one or more specific queries, and then it's frustrating not being able to directly express that design in the code.
I don't have any experience with live migration of production databases (thankfully!) so that was interesting, especially the conclusion that MySQL is best for this, which I didn't expect. The idea of separating out the type system into lower-level "storage types" and higher-level "semantic types" was also food for thought.
F1, Google's SQL database, uses Protocol Buffer in an interesting way: https://storage.googleapis.com/pub-tools-public-publication-...
MySQL is strange choice but think I understand why the author picked it - from his other critique he seems to look at databases as a building block of hyper-scaleable applications, not as a tool for humans to do often-ad hoc things with data.
I would never recommend MySQL for "business data" - it had and possibly still has way too many footguns with regards to number behavior, character encodings and Unicode, date and timestamp handling, and so on - hell, it doesn't have a proper MERGE and it only got CTEs in the latest version.
But if you're using it as persistence store that barely more than key-value store, why not? I have no problem believing the author that that kind of use is more common.
In most CRUD's we currently have on the backend layers and layers of software with ORMS, frameworks etc, and it all boils down to "Writing/Generating the correct(good-enough) SQL"
We now have added stuff like GraphQL, which if you squint hard enough (ok very hard) can be seen as being a SQL alternative(Language to get the actual data).
Maybe SQL + "GraphQL-Like" Layers should "evolve" into ONE common "data scripting language" ?
Maybe we have something like "ClientSide-SQL" - which can be a subset of ServerSide-SQL ?
We need the "TypeScript" of "data-querying" which can be run on the server,client, moon and my device, where one can also only define any "Types" ONCE.
Anywhoo - I think there is still a lot to be done, researched and discovered in this section of CS :)
I'm not sure you need a full SQL implementation on the frontend though, as the data is not going to get all that large to need the optimizations it affords, but it would be nice to be able to use the same queries on your browser DB as your backend DB.
Anything SQL that can be made simpler via dynamic generation (which is safe as long as you use proper parameters for user inputs) is favored over creating logical branches in queries. Anything that can be processed further quickly in memory in the app (mapping operations, string ops, ordering/filtering predictably small data sets, etc.) we tend to offload from SQL into something more suitable.
And we tend to solve a class of problem in our data layer and reuse those generalized patterns heavily. This makes our codebase predictable even when dealing with unfamiliar subject matter.
Of course there are always places where some complex query is necessary (especially when building reports), but if it’s status quo then you’re doing something wrong—-it’s only a matter of time until you end up with a performance nightmare on your hands.
I do agree with this - after all SQL is supposed to be the data layer - why should we think that data processing shouldn't happen there?
Can you give an illustrative example of one. I suspect that framing it this way biases designs away from 'poor ones that use complex queries' into one that foregoes other good aspects such as normalization. Sometimes the best design uses a complex query for something other than a report.
Design is not something that should be done by application of dogma and avoiding smells.
In reality most tables and queries start out simple enough, and poor schema choices are usually accompanied by poor architectural choices. It can be painful to come up with a decent migration scheme when the business needs change, especially if there’s fear/pressure involved, but often that’s going to be better than trying to keep the data layer the same/similar and shoehorning in data to represent new scenarios. This is what leads to a fragmented design IMO and allows the schema to diverge from the actual goal of efficient data storage/retrieval.
If we ever do get a replacement, I hope it retains the declarative set theory approach of SQL while addressing the warts.
You can use NATURAL JOIN
select * from foo natural join bar
Works as long as the keys are named the same. However, a lot of people have a habit of naming keys differently in the two tables.
A foreign key is effectively a reference to another column, but to de-reference it you have to tell the database which table and column it's a reference to. Every time. Even when this information is already specified in a foreign key constraint.
The author is talking about (not) being able to specify the "from" table and column without having specify the "to" table and column (i.e. tell the database how to de-reference it) on each query. A natural join removes the need to specify the columns, but still requires specifying both tables. So besides requiring a de facto single namespace for columns across tables and generally seeming like a footgun, it doesn't achieve the same thing.
This makes naming key columns differently a defence technique, so you stop people from using natural joins.
select * from foo join bar on (foo.x = bar.y) if the columns have a different name.
I tend to write my joins first, then use where clauses as filters. A select * from foo left inner join on (foo.x = bar.y) is semantically equivalent to foo, bar where foo.x = bar.y, but keeping the joins separate from the filters makes the query more clear.
>the only solution is to change half of the lines in the query
How about adding a second subquery for the salary.
> You can use as to name scalar values anywhere they appear. Except in a group by.
-- can't name this value
> select x2 from foo group by x+1 as x2;
ERROR: syntax error at or near "as"
LINE 1: select x2 from foo group by x+1 as x2;
-- sprinkle some more select on it
> select x2 from (select x+1 as x2 from foo) group by x2;
?column?
----------
(0 rows)
Looking at that first one I'm just kinda like "well duh, there's nothing special there" - it doesn't work with ORDER BY either, you use that to rename columns (on SELECT) or tables (on FROM and JOIN).And then it goes on to show ways to work around that:
> Rather than fix this bizaare oversight, the SQL spec allows a novel form of variable naming - you can refer to a column by using an expression which produces the same parse tree as the one that produced the column.
Instead of just... using the renamed column?
select x+1 as x2 from foo group by x2;Also agree that GraphQL is a pretty fantastic language for working with graphs. And that relational databases are essentially graphs. Hasura is neat.
Of course they are incompatible. That's just par for the course when it comes to SQL.
20% longer to write than what alternative? And how is this being measured?
And.. am I missing something?
By far the most common case for joins is following foreign keys. SQL has no special syntax for this:
select foo.id, quux.value
from foo, bar, quux
where foo.bar_id = bar.id and bar.quux_id = quux.id
Why can't this be expressed as an INNER JOIN?And can't some of these subqueries be written using a WHERE EXISTS or a windowing function?
select foo_id, quux.value
from foo natural join bar natural join quux
Unfortunately this doesn't actually use the foreign key relation; it matches on the same element name. So you have to have `foo.bar_id` and `bar.bar_id`, as well as `bar.quux_id` and `quux.quux_id`. But I find that actually makes queries more readable.`from foo, bar, quux` is an inner join, it's a shorthand syntax. He's lamenting that he has to keep specifying and matching ids, when the database can figure it out on its own from the foreign keys.
As someone who only uses SQL a couple of times a year, I feel that SQL shares the same fate as everything in IT: invented almost 50 years ago, not with today's world in mind, it has been blown up somewhat. Reminds me a bit of JavaScript: everything that can be done in JavaScript, will be done in JavaScript.
Like after C followed C++ and here Java and others there will be new DSL and techniques on top of SQL.
The article has its merits. Better abstractions for different use cases.
To get that level of applicability, of course, you have to make your problem match the form SQL needs. For applications on its home turf, for example a simple inventory system, this can be both easy to do and beneficial (since if you're on SQL's home turf and you want to do something you can't do, there's probably a Very Good Reason). Unfortunately, this is not always easy to do, or even possible at all, and even when it is possible you usually need to know some basic relational algebra. (Tangentially, I am convinced that many of SQL's critics would be quieter if they knew a bit of relational algebra themselves, though I don't think that applies to this article.)
As you say, though, trying to make a tool do something it just shouldn't is the road to madness. The article's discussion of JSON in SQL is a pretty decent indicator of how that goes wrong even when it goes right. For further snapshots of the road to madness, the interested reader might examine C++, JavaScript, and a competent psychiatrist. Sometimes it really is time to move on, or at least add on.
Most languages have something they are the best at. SQL is probably THE language with the strongest value proposition - relational databases are even more important and ubiquitous than web browsers. But why doesn't SQL have any competition?
I would love to see an alternative to SQL in the style Jamie suggests. Maybe SQL would immediately not be the best anymore?
The other half is just having no meaningful competition in that one domain. So I agree with the author (and you presumably) that building something better on top of the relational algebra should be a priority for the profession.
It is not that we didn't try to replace it, but just as other comments have said, SQL was good enough, and already has the biggest mind share.
More details about "Why functions and column-orientation" (as opposed to sets) can be found in [2]. Shortly, problems with set-orientation and SQL are because producing sets is not what we frequently need - we need new columns and not new table. And hence applying set operations is a kind of workaround due the absence of column operations.
This approach is implemented in the Prosto data processing toolkit [0] and Column-SQL[1] is a syntactic way to define its operations.
[0] https://github.com/asavinov/prosto Prosto is a data processing toolkit - an alternative to map-reduce and join-groupby
[1] https://prosto.readthedocs.io/en/latest/text/column-sql.html Column-SQL (work in progress)
[2] https://prosto.readthedocs.io/en/latest/text/why.html Why functions and column-orientation?
EdgeQL, indirectly linked at the end of the article, looks at a glance like it might score well. EdgeDB's blog post [1] criticizing SQL and introducing EdgeQL seems to cover the same concepts (inexpressive, incompressible, non-porous) with slightly differing language in some cases (e.g.. system cohesion for porousness).
Noticed after posting this comment that there's a post today about EdgeQL. [2]
[1] https://www.edgedb.com/blog/we-can-do-better-than-sql [2] https://news.ycombinator.com/item?id=27793398
If you want to fix SQL, contribute to the next version of the standard, or provide example by implementing what you want to see out there.
Complaining about SQL is the easy part. Actually, it's the first skill most new SQL developers truly master.
I'm waiting for the viable alternative. There are a lot (a LOT) of solutions that handle some cases, but inevitably you need to get into the SQL anyway because that's the DBMS' native API (and now you also need to fight your way through the abstraction, oh and since there are a LOT of solutions a different one is used every chance someone gets, so you need to relearn how to fight through the abstraction all the time).
I doubt it's going to change. There's actually no significant reason. SQL (actually, the set of mutually incompatible SQL variants) is thoroughly entrenched and a small problem... that is, it's rarely the dominant reason a project/product succeeds or fails, or takes too long, or becomes unmaintainable, etc.
I've used and even written SQL databases for much of my career. SQL is pretty satisfactory for what it was designed to do. I view SQL like classic inheritance-based OOP; it works well for the problem domains for which it was originally designed, but is poor for efficiently expressing problem domains that are better expressed in a composition-based or functional way. Yet it worked so well in its original domain that we try to apply it everywhere. The diversity of data models and the kinds of operations we want to do with them today is far greater than was considered when SQL crystallized into its current form.
The limitation of most nominal SQL replacements I've seen is that they commit the same sin of SQL originally: overfitting for a problem domain that the designer was most interested in. There is an appetite for a really good SQL replacement if done well, and in principle anything SQL can do could be directly translated into a new language for compatibility.
That's because there can be more than one FK relationship between the same two tables. For example, if we model a binary tree, there could be references to left, right and parent nodes.
shopify from pyspark -> sql
https://shopify.engineering/build-production-grade-workflow-...
Once that’s in place I don’t know which features I’d want first… but there’s a lot of them!
SQL is very well-established, but it's also old, and it shows its age. It's kinda weird how easily we jump from one programming language to another, and yet we can't seem to move on from our main relational query language.
There is an internal software where I work where to create a tmp table you just assign the result of the query to a variable. It is so much nicer. So for instance creating a tmp table becomes as simple as the below, no need to declare each columns, to do an insert, to drop the table in the end:
@t = select colA, colB from tbl
select top 10 * from @t order by colB> There is an internal software where I work where to create a tmp table you just assign the result of the query to a variable. It is so much nicer. So for instance creating a tmp table becomes as simple as the below, no need to declare each columns, to do an insert, to drop the table in the end:
> @t = select colA, colB from tbl
> select top 10 * from @t order by colB
Unless I am misunderstanding what you are looking for, 'SELECT INTO' works the way you want: https://www.postgresql.org/docs/9.1/sql-selectinto.html
It's on every RDBMS I've used, IIRC.
I have other issues with SQL:
The linear way resources are needed with the amount of data but no built in way to handle it.
That integer ids are way overused and basically locking every database to a specific environment.
The index tweaking.
The workarounds for write speed.
The fact that you can do anything in SQL and people know it.
Another example complaint hidden behind a ominous-sounding word boils down to "Using a table expression inside a scalar expression is generally not possible, unless the table expression returns only 1 column and either a) the table expression is guaranteed to return at most 1 row or b) your usage fits into one of the hard-coded patterns such as exists."
Uh, great I've never needed to do that in my career, and so if you care so much make a PR, but suggesting that SQL itself is somehow the problem is laughable. It would be orders of magnitude more effort to try to standardize the industry on a new query language than to patch table expressions. I can scarcely imagine what a productivity loss it would be to the industry of SQL standardization were dropped, it would be much worse than python 2/3 debacle.
Also "incompressible" - Sounds like the author doesn't use views/materialized-views.
Finally the "fragile" example is just the author writing a bad query. The example here is performant and less fragile: https://stackoverflow.com/questions/612231/how-can-i-select-...
etc.
This example has same amount of semantic entities as in SQL. Also there is USING. Also why author needs a strict modeling over json when one can model in native types? It's a very strange article.
That said, in the case of wanting to abstract out or reuse joins, just write a view, I guess. And I get a lot of mileage in Postgres from just writing functions to abstract out predicates, because it allows you to write things like `select * from order where order.is_complete` instead of `where is_complete(order)`
Knowing your schema design is just as important as knowing sql.
In PostgreSQL [2], foreign key contraint names only need to be unique per table, which allows using the foreign table "as is" as the constraint name, which allows for nice short names. In other databases, the names will just need to be a little longer.
Given this schema:
CREATE TABLE baz (
id integer NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE bar (
id integer NOT NULL,
baz_id integer,
PRIMARY KEY (id),
CONSTRAINT baz FOREIGN KEY (baz_id) REFERENCES baz
);
CREATE TABLE foo (
id integer NOT NULL,
bar_id integer,
PRIMARY KEY (id),
CONSTRAINT bar FOREIGN KEY (bar_id) REFERENCES bar
);
We could write a normal SQL query like this: SELECT
bar.id AS bar_id,
baz.id AS baz_id
FROM foo
JOIN bar ON bar.id = foo.bar_id
LEFT JOIN baz ON baz.id = bar.baz_id
WHERE foo.id = 123
I suggest adding a new binary operator,
allowed anywhere where a table name is expected,
taking the table alias to join from as left operand,
and the name of the foreign kery contraint to follow as the right operand.Perhaps "->" could be used for this purpose, since it's currently not used by the SQL spec in the FROM clause.
This would allow rewriting the above query into this:
SELECT
bar.id AS bar_id,
baz.id AS baz_id
FROM foo
JOIN foo->bar
LEFT JOIN bar->baz
WHERE foo.id = 123
Where e.g. "foo->bar" means: follow the foreign key constraint named "bar" on the table/alias "foo"
If the same join type is desired for multiple joins,
another idea is to allow chaining the operator: SELECT
bar.id AS bar_id,
baz.id AS baz_id
FROM foo
LEFT JOIN foo->bar->baz
WHERE foo.id = 123
Which would cause both joins to be left joins. SELECT
bar.id AS bar_id,
baz.id AS baz_id
FROM foo
LEFT JOIN foo->bar->baz
WHERE foo.id = 123
[1] https://scattered-thoughts.net/writing/against-sql/Alternatively there are still the NATURAL JOIN and USING syntaxes that have been standard like forever.
I think it would be a nice feature to add to the SQL standard.
But it exists and is adequate. And, as Gabriel’s famous essay says, Worse is Better.
A lot of the rest of it reads like the author started with this conclusion and then went looking for justification.
Example: the author states it's hard to return more than one column with a correlated subquery. That's what with clauses or join with queries are for. The author later mentions with statements so is aware of them.
As for JSON, I honestly don't think anybody needs that. Either return a JSON blob (generally bad idea IMHO) or you need to construct it in code.
The example of join verbosity has issues too. First, abbreviated syntax would need to express what kind of join to do (eg inner vs outer). Second, I find this fairly natural:
SELECT ...
FROM a
JOIN b ON a.id = b.a_id
LEFT OUTER JOIN c ON b.id = c.b_id
The author instead used this syntax: SELECT
FROM a, b, c
WHERE a.id = b.a_id
AND b.id = c.b_id
The also leaves the join type unexpressed. In some SQLs you say: AND b.id = c.b_id (+)
But that's kind of ugly and old-fashioned. The first syntax is preferable and clear.On "compressability", SQL has this. They're called views. GraphQL has a notion called fragments that SQL doesn't. This is one of those things that sounds like a good idea but probably isn't. It makes queries much harder to read and I've seen this reach the point where a fragment is so widely used changing it is expensive (eg generated code) and removing anything is impossible. Plus a lot of users end up querying things they don't need.
Poor optimization and error messages of with clauses aren't really an argument against SQL. They're an argument against particular implementations. Extracting an anonymous query into a WITH clause should be a no-op to performance for any half-decent query optimizer/executor.
Writing extensions (eg functions) should be discouraged. It's harder to deploy and debug and the last thing you want is a badly written C function crashing your database.
Years ago we also had stored procedures (eg Oracle PL/SQL) and nobody does that anymore because it's terrible. You don't want that.
There's a lot in there about pathological corner cases that I honestly don't really care about.
I do agree that ORMs are generally a disaster.
Lastly, it's worth noting that SQL unless a lot of alternatives has a solid theoretical basis and that is relational algebra. SQL wasn't created in a vacuum. SQL is just a way to express those constructs.
I will say that SQL got the order of clauses wrong whereas LINQ got this right. SQL should actually look more like this:
FROM a
WHERE a.foo = 'bar'
SELECT id, col1, col2
Honestly though, SQL just isn't "broken". That's why it's endured so long despite the NoSQL fad and various efforts to replace it.----
The comparison of SQL vs. flink windowing ("kernel space" vs "user space") reminds me of the this 2013 call to change the design of browsers feaetures:
https://extensiblewebmanifesto.org/
Basically there's a lot of stuff implemented stuff in the C++ layer of the browser that's impossible to emulate in JavaScript, and that's a bad design.
It is indeed alarming how much syntax SQL has. It reminds me of shell, where every string manipulation function like stripping a prefix has custom syntax like ${x//pat/replace} or ${x%%prefix}. Oil (https://www.oilshell.org/) will simply have functions for this, like x.sub('pat', 'replace').
----
I also wonder if the author has worked with dplyr and the tidyverse at all? He mentions Pandas, but IMO it's a clunkier imitation of those ideas (and I'm saying that as a Python programmer).
Tidy data was my intro to the design of dplyr: http://vita.had.co.nz/papers/tidy-data.html
It's very inspired by the relational model, but it has a few more operations like "gather" and "spread" which turn "long" format into "wide" format and vice versa.
It has a clean and expressive API: https://www.rstudio.com/wp-content/uploads/2015/02/data-wran...
It composes like regular code, so you can write stuff like:
bin_sizes %>%
select(c(host_label, path, num_bytes)) %>%
left_join(bytecode_size, by = c('host_label')) %>%
mutate(native_code_size = num_bytes - bytecode_size) ->
sizes
Good comparison of the relational model and data frames: Is a Dataframe Just a Table? https://plateau-workshop.org/assets/papers-2019/10.pdfI link all of these in What is a Data Frame? (In Python, R, and SQL) https://www.oilshell.org/blog/2018/11/30.html
Most likely because there isn't cargo for SQL, everyone has to make do with a default install offers, and most big boys databases offer FFI to Java, .NET and C.
> This works for data modelling (although it's still clunky because you must try joins against each of the tables at every use site rather than just ask the value which table it refers to)
Only if one never learned what views are for, and the various flavours they come in.
> By far the most common case for joins is following foreign keys. SQL has no special syntax for this:
select foo.id, quux."value"
from foo
inner join bar on foo.bar_id = bar.id
inner join quux on bar.quux_id = quux.id
Really, how much time was spent learning SQL before complaining? SELECT foo.id, quux.value
FROM foo, quux, bar
WHERE foo.bar_id = bar.id AND bar.quux_id = quux.id
I couldn't find anyone telling me the difference between those 2 ways to write a query, do someone know more about this?The way you have written it (ANSI-89)used to be the only way joins could be written.
The second one (ANSI-92) was introduced to allow for composability since the entities being joined and the join condition are next to each other in the code and multiple joins can be generated one after the other.
IMO it also enhances developmemt quality of life since you can understand a new-to-you query faster (especially complex ones), you can just comment out a join in one line when testing replacement, cut and paste between queries easier, etc.
An SO question on the topic
https://stackoverflow.com/questions/334201/why-isnt-sql-ansi...
I could write a point by point rebuttal but I'll just pick one point, compressibility: VIEWs.
But just give up, you hit the wall. Many people tried for decades to make this argument, but every time it was raised you’d see this thread full of people who see no problems. There is a lot of powerful, unrepeatable-in-your-lifetime software behind stupidest frontends that you can’t bypass, because most people write only straightforward code with no need for any abstractions beyond what was given to them.
But you need to separate the data and the index so you can compress the data while still searching the index, and none of the SQL databases do that because they don't have one file per value (for obvious disk-size reasons).
We need to approach the database as files, even add features to our filesystems to accomodate that.
In my distributed HTTP/JSON database I use ext4 type small to not run out of inodes before disk space.
What happened?
What happened is many of those NoSQL products started adding SQL syntax and features to their databases, others disappears, and yet others specialized into niches where they don't compete with SQL RDBMS at all, which remains the primary database paradigm and language.
So those are the facts. If someone still believes they know better, put up or shut up.
In contrast, users of, for example, the lingo where object minus object equals NaN are terrified when suddenly exposed to type zoo like https://www.postgresql.org/docs/9.5/datatype.html (Disclaimer: a relatively randomly chosen example, neither endorsement nor preference of particular RDBMS/dialect). And let's keep in mind what types above form a structures and these structures getting manipulated en mass as intrinsically unordered sets (which are data types too!). That is, a leap from barely existing concept of data types to circa 30% of DDL/DML keeps scripters out of SQL.
So the reason behind that endless «SQL bad» teeth gnashing turns out to be very simple.
What if someone told you: build an app, but only use b-trees? Then you start complaining about all the shortcomings of b-trees.
The point is that you have relational tables / SQL, along with many other persistence , storage & indexing mechanisms: distributed hashtables, queues, lists, etc.
All the apps I've worked on have mixed SQL with all of the other data structures with consistent or inconsistent replication among them depending on the use-case.
One way to manage this is a key-value online tier and a relational offline tier, with inconsistent replication online to offline.
SQL & RDMBS are very powerful, but like any tool, limited to the designated use case. Stop trying to make it do everything.
function(arg, arg, arg)
It is strange that "SELECT a, b, c FROM schema.table" keeps any aura of respectability. That is legitimately outdated syntax, people don't write languages that way any more. It was a 70s era experiment and what was learned from that experiment is that the style has no upside and comes with downsides. It should be 2 or 3 functions, with brackets.With full knowledge of SQL, the successful languages that followed it were C/Python/Java/Javascript that use lots of functions and a smattering of special syntax for control structures.
SELECT foo, bar FROM baz WHERE zig = 7
You can write db.baz.Where(baz => baz.zig == 7).Select(baz => new { baz.foo, baz.bar });
It does have some nice properties compared to SQL, but it also very quickly becomes incomprehensible. E.g. the join syntax in SQL: SELECT baz.foo, wawa.bar FROM baz JOIN wawa ON baz.id = wawa.baz_id
looks like: db.baz.GroupJoin(db.wawa, baz=>baz.id, wawa=>wawa.baz_id, (baz, wawa)=>new { baz.foo, wawa.bar} );
I don't think anybody would find this easier, and C# actually added additional custom syntax, so you could use more SQL-like syntax instead of the method-based syntax.So they wanted it to be easy for non programmers, more natural language like, so functions and brackets are quite the opposite.
CREATE [OR REPLACE] FUNCTION function_name (arg, arg, arg)
RETURN return_type
IS
---
END;
Then SELECT function_name (arg, arg, arg...) FROM dual;
SELECT columns FROM xpt where xpt.id = function_name (arg, arg, arg...);
IF function_name (arg, arg, arg...) = ... THEN ...