SELECT a.id, COUNT(b.*) FROM a JOIN b ON b.a_id = a.id GROUP BY a.id
is not permitted in Postgres.Sure, I could just use COUNT(b.a_id) since that's what I join on, but a more complicated example might not allow for that. For instance if it was a virtual table.
SELECT a.id, COUNT(b.*), COUNT(c.*)
FROM a JOIN b ON b.a_id = a.id JOIN c ON c.a_id = a.id
GROUP BY a.id
I want to know how many occurrences a_id has in both table b and c. Again in this simple example, I could just count on b.a_id and c.a_id, respectively, but imagine if b and c were complex virtual tables: JOIN (SELECT NULL AS foo, 1 AS bar
UNION SELECT 1 AS foo, NULL AS bar) b ON b.foo = a.id OR b.bar = a.id
This would be useful if we are aggregating data together, where essentially, there are two ways to join the data with the main table, and both columns can be null.Of course, in this example, you could count by going COUNT(b.foo) + COUNT(b.bar), but that's a bit awkward, or a column in table b you know to never be null. But what if you don't? And still have table c next to it?
Yes, in all cases, there would be a way out. In the extreme case, you could wrap it in a virtual table, where you add a column that is just always 0 (not null), so you can count on it. It would just be neat if b.* was possible.
SELECT a.id, count() FROM ...
On a related note, is the MCSE the gold standard for SQL education? Have been looking for a way to brush up and formalize my SQL skills.
for reference: https://sudonull.com/posts/998-Important-changes-in-the-CTE-...
Generally speaking, some clarification would be helpful!
Before Postgres 12 CTEs were always materialized so you did not get any query optimization benefits of CTEs acting like inline subqueries.
After Postgres 12 all CTEs default to NOT MATERIALIZED if only referenced once or MATERIALIZED if referenced more than once. You can override via MATERIALIZED or NOT MATERIALIZED when defining the CTE.
Their example is showing that you can let Postgres (before 12) optimize a CTE for you by writing it as an inline subquery instead of a CTE:
SELECT *
FROM (
SELECT *
FROM sale
) AS inlined
WHERE created_by_id = 1
But with Postgres 12 their "don't" example would result in an index scan without refactoring to the "do" example. Basically their advice on do vs don't applies to before Postgres 12.https://www.postgresql.org/docs/12/queries-with.html is pretty thorough on this
I seriously take issue with "Reference Column Position in GROUP BY and ORDER BY" though. If it is restricted to ad-hoc (AKA messing-about) queries I'd be fine with it, but it won't be. Just don't do it.
# select cust_id as c, sum(avail_balance) as b from account group by c order by b;not indexing, most often, tables are not or poorly indexed.
Implicit conversion can generate a lot of io/leads to poor perf or just not using indexes.
Sql function:sorry but they are most often crap and useless, better to in-line or use TVF, and no its not code logic duplication.
Read uncommitted unless you enjoy not reading rows, multiple times or half of a value (page split and/or LOB values)
Functions have helped me tremendously in SQL server, but you do have to know the issues, which can be taken advantage of to some degree.
Code reuse is the obvious use case, but due to lack of inlining up to SQL Server 2019 meant you could reduce performance compared to hand inlined case statement or whatever. Hopefully now that functions can be inlined in 2019 this will be a non issue going forward
They are an optimization barrier which can be a good thing. I have used to this my advantage to stabilize tricky queries that where using views for code reuse. The performance becomes consistent and predictable rather than going pathological on some databases even though it may be slightly slower on others.
But I agree that implicit conversion is the root of a lot of evil
Please elaborate.
The first time I experienced the lob issue was in 2009. At that time, I didn't know how to really reproduce the issue due to my lack of knowledge (which triggered also deep diving will to sql server internals _not my article,nor his but more generally Paul White /sql kiwi has published a load of great articles on internals, maybe some of the best technical articles I have read. )
I'm not familiar with MS SQL (I've only worked with MySQL / PostgreSQL), can someone explain me how it works?
If you are asking about the timezone shift applied to a date, I think the engine converts the date to 00:00:00 timestamp and then does the timezone conversion.
SELECT
col1
,col2
,COUNT(col3)
FROM t1
JOIN t2 ON ta.pk = t2.fk
WHERE col1 = col2
AND col3 > col4
GROUP BY col1
,col2
HAVING COUNT(col3) > 1This "fix" only helps with the case where new items are inserted at the end. By using alphabetic ordering you increase the chances that a new item will be inserted at the beginning or the middle, in which case it makes no difference where you put the commas.
It only really helps at all because inserting new items at the end is common, whereas removing items from the start is rare, at least in SQL (all it really does is displace the dangling comma problem from the former case to the latter). However, always inserting new items at the end tends to lead to unintuitive ordering, which in turn leads to additional VCS churn when that becomes seen as technical debt.
Commas at the start does give objectively better readability in one sense: it ensures they are aligned vertically. That makes it easier to spot errors at a glance. You might call this "ease of formal reviewability".
However, in practice it seems to be worse for general readability, for the entirely subjective reason you already pointed to. Since code is typically read more often than it is written, it's important to optimise for that first.
Since an error here will always cause a hard fail, it falls into a different category than say, omitting braces in a C-style if statement, which can introduce subtle bugs through bad merges. In the latter case, ease of formal reviewability has to take precedence over subjective aesthetics.
WHERE foo
AND bar
Or with long, complex ones: WHERE
foo
AND
barI did not learn SQL this way. I was shown these tricks and their adoption was near immediate BECAUSE it just made more sense to my brain.
This isn't to devalue/argue against your comment. I just find it interesting that the common denominator for OCD-ish compulsions is that the "right" way for an individual OCD brain is usually specific to that individual brain.
Select A.value, B.valuue
from tableA A on A.id = 77
join tableB B on B.id = A.bId
What abomination of a date format is this? I can only assume this is a bug, a typo, or an easter egg for those paying attention. Please let it be one of those. The last thing the world needs is people pushing yet another crazy date format into use.
[1] https://www.ibm.com/support/knowledgecenter/en/SSS28S_8.1.0/...
An ORM only works until the point where you need to join tables. As soon as that's needed the ORM just causes you endless trouble.