Examples: GRANT TO, REVOKE FROM, DENY TO (oh yes, what's the difference between REVOKE and DENY? and did you know that you can REVOKE a DENY?), arbitrary requirements for punctuation (e.g., = in BACKUP DATABASE AdventureWorks2012 TO DISK = 'X:\SQLServerBackups\AdventureWorks1.bak'), dubious context-sensitive "magical identifiers" (e.g., SELECT DATEPART(year, @t)), non-composability (how do you dynamically give a part specification to DATEPART?), etc, etc, etc.
It's possible to write a novel about just how unstructured and irregular "structured" query language is.
In my experience <20% of developers are good enough to be dangerous with SQL, and maybe 5% what I'd consider adept.
The rest range from "SELECT * is as far as I'll go; where's the ORM" to "why the hell are you using cursors to aggregate"
SQL is powerful, elegant, and reliable. With modern DB support of JSON, ARRAY, statistical functions, and much more, SQL is probably the #1 most underutilized/improperly leveraged tool today. SQL-killers have been coming out for 40 years now and (for its domain!) SQL's lead is pulling farther away if anything.
*yes there are some questionable implementations, so please replace "SQL" with "PostgreSQL" if nonstandard SQL implementations have caused nightmares for you in the past.
Even for use-cases like graph based models you still find Twitter and Facebook using MySQL to build the graph on top of. It’s simply heavily abstracted for the majority of engineers at those companies (where I'd wager the <20% proficiency in set-based thinking holds true) but it still fundamentally relies on SQL.
SQL does implement the set operations of UNION, INTERSECT, and EXCEPT, but I meant "thinking in sets" more colloquially.
edit: this is more practical than theoretical but the author here actually does a nice job of discussing "set-based" thinking vs. procedural https://www.itprotoday.com/open-source-sql/fangraphs-tags-cl...
Fundamentals of Database Systems by Elmasri & Navathe. Part 2.
When it comes to prototyping, I'm not going to fuck with something like Java-- I'm going to reach for Python. If I don't know what I'm doing to begin with, I don't need something telling me all the ways in which I'm doing it wrong.
Same goes for SQL/NoSQL. I loosely know what I need to model and may revise it arbitrarily. SQL does not lend itself to this. NoSQL was designed for it.
NoSQL is the "fuck you dad you can't tell me what to do" of the database world. SQL is your annoying dad that tries to institute curfews and won't let you go to underage drinking parties. In the end, it's the latter you're going to be calling from a holding cell, but there's a lot more fun you can have with the former.
Additionally, I think drafting a DDL schema is a great way to learn about a new application domain. It forces me to ask hard questions. And that improves my understanding of the domain. I guess that is similar to some people who like to prototype using Haskell type signatures.
[1] Unless you mean specifically a document store without schemas.
This is unrelated to this conversation, but this is my main beef with Rust. I love Rust (like a lot), but it's just not good for prototyping. End of non-pertinent rant.
But then I'm trying to do relatively simple things in Pandas or R, like changing column values based on multiple field value conditions, and it is a struggle requiring at least 17 brackets, while in SQL it would be a simple statement that anyone can understand after reading it once.
Doing regex is horrendous, but doing it on SQL in a modern database is no more difficult than in a full-fledged programming language. Most modern DBs have strong JSON support and built-in regex functions
1. JSON_CAST/JSON_PARSE your data
2. REGEXP_EXTRACT() on the result, here's several date validator regex from a SO post (https://stackoverflow.com/questions/15491894/regex-to-valida...)
And that's it. In fact in many cases it's probably faster to do it natively in SQL than to export it to python or R and parse there.
With dplyr, wouldn’t this be mutate(x = case_when(… ~ y, TRUE ~ x)) or the same but with ifelse?
CASE WHEN
SUM(daily_revenue) OVER (PARTITION BY department, TRIM(SUBSTR(region, 5)) IN ('North','West','Misc')) >
AVG(revenue) OVER (ORDER BY sale_time ASC rows BETWEEN 28 PRECEDING AND CURRENT ROW)
AND NOT COALSECE(had_prev_month_party, FALSE)
THEN pizza_party_points + 5
WHEN <above> AND had_prev_month_party THEN pizza_party_points + 3
WHEN MIN(sale_time) over (PARTITION BY department) = DATE_TRUNC('month', current_date) then 5
ELSE GREATEST(pizza_party_points - 1, 0)
END as pizza_party_performance_points_current
this example may be a bit esoteric but it actually draws from lots of real-world cases (comparing a partitioned sum with an overall average, checking conditions on other columns, messy data parsing and manipulation, implicit waterfall nature of CASE WHEN vs. explicit elseif, etc)For example, one of the biggest gripes I have with SQL is the fact that the table is the only output format, it doesn't allow outputting data with different cardinalities in a single query. The internal relational model of a RDBMS is extremely rich and powerful, but when extracting that data using SQL you have to leave all the richness behind and flatten the data into a simple dumb table. So as result we have to do much back and forth with separate queries based on results of previous ones (or even ugly hacks like concatenate higher cardinality data into a single field to then re-separate it in the application). A suitable successor for SQL should have a way to output a subset of the relational model with the relational part intact.
SQL is essentially the assembly language of the database. That makes it very powerful, but sometimes you just want a language that gives you a "garbage collector" for free, while still being able to drop down to assembly when you need additional power.
There is no technical reason why an RDBMS can't support different modes for accessing data. We are just so used to putting that work into the application that it has become a hard sell to want to move it to the right place.
With MSSQL at least you can return multiple result sets. Not sure about other database vendors.
Our layers of abstractions atop SQL along with approaches taken by DBMSes outside of the common relational names have shown that there is room for improvement within those tasks, able to be done natively by the DBMS, and it does not have to come at the expense of losing SQL when you need to be able to describe lower level questions. Different tools for different jobs, but little will within the RDMBS space to explore those other tools.