Time for a WTF MySQL Moment (opens in new tab)

https://www.postgresql.org/docs/13/arrays.html

I am also a big fan of Postgres, and tend to have a bit of fun picking on MySQL having been scarred by it in a past life. But since we're picking at Postgres warts...

One which bit me recently, and is still utterly baffling to me, is that a column defined as an array type will accept values of that array's type in any number of dimensions greater than that specified for the column. In other words, `{{{{{{text}}}}}}` can be inserted into columns of the following types:

- `TEXT[]`

- `TEXT[][]`

- `TEXT[][][]`

- `TEXT[][][][]`

- `TEXT[][][][][]`

- `TEXT[][][][][][]`

The inverse is true as well! A column specified `TEXT[][]` (and so on) will accept `{text}`. Of course, none of this (as far as I've been able to find) is documented.

But wait, there's more!

`UNNEST` does not allow you to specify depth, it always unnests to the deepest dimension. This, too, is undocumented. In fact, it's anti-documented. The documents provide an example function to unnest a two-dimensional array that is wholly unnecessary (and likely performs worse than the built-in `UNNEST`, but I'm just guessing). Said documentation would seem to imply that the depth of `UNNEST` is 1, but of course that's not the case.

But wait, there's more still!

What if you want to get at a nested array? Idk, I'm sure it's possible, but if you thought `SELECT that_array[1]` is the way to do it, look under your seat because you're getting a `NULL`!

- - -

Postscript: I discovered the first part of this in a production system where a migration had incorrectly nested some data, and where that data was in turn causing certain requests to unexpectedly fail. Of course, given that this was in production, I didn't have a lot of time to research the issue. Found the problem, fixed it, moved on with my day. In the course of fixing it, I discovered the `UNNEST` issue, which... okay fun, fix it a slightly different way than I expected.

So in the course of verifying the particulars to write this comment, I played around with some things, and discovered the `NULL` issue.

At least when Postgres has wildly unexpected behavior, it's exceptionally unexpected behavior.

wnoise5y ago

All of the array types are basically the same. The docs actually do mention this, but only in passing as a current limitation.

> The current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring the array size or number of dimensions in CREATE TABLE is simply documentation; it does not affect run-time behavior.

Similar text extends at least all the way back to the documentation for 7.1.

[1]: https://stackoverflow.com/questions/715432/why-is-sql-server...

Also while we're picking on other DBs, another fun WTF I've encountered (this time in an external system): SQL Server stores timestamps to ~1/300th of a second resolution. This[1] StackOverflow question describes different behavior than I saw (it rounded differently), so apparently it's not even consistent. I'd assume across versions? IDK, never had time to look too deeply into this one either.

[0]: https://www.postgresql.org/docs/current/xfunc-sql.html#XFUNC... [1]: https://www.postgresql.org/docs/current/bookindex.html

masklinn5y ago

> I'm a big fan of Postgres too for a number of reasons, but this issue is pretty clearly documented

Of course it is, the documentation is where TFAA got the information in the 4th paragraph of the story, out of 15 or so.

The range itself is what nerd-sniped the author and led them to try and find out why mysql had such an odd yet specific range.

benesch5y ago

> I hit in Postgres recently that is terribly documented.

I'm going to have to disagree with you there. This issue is quite well documented in the "SQL Functions Returning Sets" section [0]. The relevant bit starts thusly:

> ...Set-returning functions can be nested in a select list, although that is not allowed in FROM-clause items. In such cases, each level of nesting is treated separately, as though it were a separate LATERAL ROWS FROM( ... ) item...

And there's even a note about the crazy behavior pre-PostgreSQL 10:

> Before PostgreSQL 10, putting more than one set-returning function in the same select list did not behave very sensibly unless they always produced equal numbers of rows. Otherwise, what you got was a number of output rows equal to the least common multiple of the numbers of rows produced by the set-returning functions. Also, nested set-returning functions did not work as described above; instead, a set-returning function could have at most one set-returning argument, and each nest of set-returning functions was run independently. Also, conditional execution (set-returning functions inside CASE etc) was previously allowed, complicating things even more. Use of the LATERAL syntax is recommended when writing queries that need to work in older PostgreSQL versions, because that will give consistent results across different versions.

I agree that allowing SRFs in the SELECT clause is a wart that should never have been permitted, but I think the PostgreSQL docs do a pretty great job describing both the old behavior and the new behavior that has to balance backwards compatibility with sensibility.

(And, indeed, the 9.6 docs have this to say on the behavior of SRFs in the SELECT list: "The key problem with using set-returning functions in the select list, rather than the FROM clause, is that putting more than one set-returning function in the same select list does not behave very sensibly.")

I do think one notable defect with the PostgreSQL docs is that they were designed in a time before modern search engines. They are better understood as a written manual in electronic form. Almost always the information you need is there, but possibly not in the chapter that Google will surface. But there are all sorts of tricks you can use if you update your mental model of how to read the PostgreSQL docs. For example, there's an old-style index! [1]

thaumasiotes5y ago

> I do think one notable defect with the PostgreSQL docs is that they were designed in a time before modern search engines. They are better understood as a written manual in electronic form. Almost always the information you need is there, but possibly not in the chapter that Google will surface. But there are all sorts of tricks you can use if you update your mental model of how to read the PostgreSQL docs. For example, there's an old-style index!

I think of the Postgres docs as significantly better than most other documentation for this reason. The information is there and it's organized in a way that makes sense.

munk-a5y ago

I hadn't seen the index - that's neat.

In terms of the documentation quality I do agree that it's documented but not obviously, you mentioned that it's in the "SQL Functions Returning Sets" section however this section isn't indicated to by the Array functions and operators page - additionally while, if you know that you're dealing with an issue related to SRFs it's pretty easy to find - attempting to get at that information via google isn't going to get you anything unless you specifically hone in on set returning functions. I ended up finding that doc page after finding a SO answer that mentioned SRFs while searching for "postgres unnest cartesian product".

The information is in the documentation, but the documentation isn't always super good at linking to other relevant portions of the documentation and, honestly, reading the documentation about the feature you need is reasonable, but I don't expect most people are reading the full postgres docs before starting to play around with it. So I don't disagree that the information is there, but I do think it is mostly inaccessible due to the structure of the documentation.

simias5y ago· 3 in thread

> This format is even less wieldy than the current one, requiring multiplication and division to do basically anything with it, except string formatting and parsing – once again showing that MySQL places too much value on string IO and not so much on having types that are convenient for internal operations and non-string-based protocols.

Not necessarily an odd choice in the Olden Days, after all BCD representation used to be pretty popular. By modern standards it's insane, but at a time where binary to decimal conversions could be a serious performance concern it might have made sense. For instance if you had a date in "hours, minutes, seconds" and wanted to add or subtract one of these TIME values, you could do it without a single multiply or divide.

Now I was 8 when MySQL first released in 1995, so I can't really comment on whether that choice really made sense back then. 1995 does seem a bit late for BCD shenanigans, but maybe they based their design on existing applications and de-facto standards that could easily go back to the 80's.

kstrauser5y ago

I'll comment: it absolutely didn't make sense back then, either. We didn't use BCD for pretty much anything in '95. If anything, all timestamps were 32 bit signed ints.

Edit: plenty of things still stored dates as strings where the emphasis of the app was on displaying information. Int and float types carried the day whenever any kind of math was going to be used, or when you wanted to output the data in multiple formats.

Doctor_Fegg5y ago

BCD was outdated when I got my first Amstrad CPC (1984). The assembly language textbooks all said “this is a weird holdover from the 8080, don’t bother with it”.

https://en.wikipedia.org/wiki/Binary-coded_decimal#Packed_BC...

Roboprog5y ago

Earlier than the 60s. Had to work on a system in the 90s to “transpile” IBM minicomputer code to C.

Useful for accounting since what you see is what you get as far as cents and other decimals go. 1/5, and thus 1/10, can only be approximated in floating point.

Not that I’m defending MySQL

njharman5y ago· 3 in thread

This is going to sound insulting, maybe it is, sorry. It's definitely subjective.

The reason I reach for Postgres over MySQL isn't features or technical superiority. Although those result from the reason. Which is, PG devs consistently have "taste", they have "good" style. They make good choices. MySQL devs are not consistently strong in these areas. I'm guessing that MySQL is now so full of tech and design debt (like OP issue) that they're just stuck, without choice.

kaslai5y ago

In the MySQL vs PgSQL comparison, it basically feels like MySQL tried to obtain fast performance first, then worked towards correct and useful behavior later, where PgSQL went for correct and useful behavior first, then fast performance later. While the end result after decades of development is comparable, the echos of those very different beginnings remain in the current products.

smitty1e5y ago

MySQL and PHP are a dynamic duo that never fail to surprise.

And not in a desirable way.

dancemethis5y ago

I mean, with the exception of the name, of course.

"postgre" feels and sounds like a plate full of very wet, oily rice and the subsequent addition of clear vomit to the mix.

xeeeeeeeeeeenu5y ago· 10 in thread

A list of MySQL WTFs: https://grinnz.com/stuff/lolmysql.txt

kochthesecond5y ago

Nice

These three points has made me raving mad from working with mysql:

- The default 'latin1' character set is in fact cp1252, not ISO-8859-1, meaning it contains the extra characters in the Windows codepage. 'latin2', however, is ISO-8859-2. - The 'utf8' character set is limited to unicode characters that encode to 1-3 bytes in UTF-8. 'utf8mb4' was added in MySQL 5.5.3 and supports up to 4-byte encoded characters. UTF-8 has been defined to encode characters to up to 4 bytes since 2003. - Neither the 'utf8' nor 'utf8mb4' character sets have any case sensitive collation other than 'utf8_bin' and 'utf8mb4_bin', which sort characters by their numeric codepoint.

utf8 being effectively alias of utf8mb3 has cost us so much work its not even funny.

Dylan168075y ago

> utf8 being effectively alias of utf8mb3 has cost us so much work its not even funny.

An extra warning about that mess: mysqldump in many configurations will silently convert utf8mb4 down to utf8mb3. So when you're testing your backups or migrations, do an extra check to make sure that emoji and rarer characters didn't get eaten!

speeder5y ago

I am currently trying to fix a program that was made by a person that didn't knew those details of MySQL...

Most weirdly, the fact that the default collation is SWEDISH.

It is a complete freak show, the users kinda got used to it, butchering our language (portuguese) to use only characters valid in english, hoping MySQL won't barf spetacularly on them.

3 more replies

lmm5y ago

> - The default 'latin1' character set is in fact cp1252, not ISO-8859-1, meaning it contains the extra characters in the Windows codepage.

Oh you sweet summer child. No it isn't. It's somewhat like Windows CP1252, but it also defines 8 other extra characters that are not in cp1252.

jcranmer5y ago

> - The default 'latin1' character set is in fact cp1252, not ISO-8859-1, meaning it contains the extra characters in the Windows codepage.

Actually, it's generally saner to assume that people mean Windows-1252 when they say ISO-8859-1. Charset labeling is frequently incorrect, and C1 characters are so infrequently used that seeing one pop up probably means you actually wanted Windows-1252 instead.

https://bugs.mysql.com/bug.php?id=11472

jakobdabo5y ago

Another one: if you happen to keep big numerical values in a VARCHAR field, be careful, the engine will convert them to DOUBLE when comparing to a big integer, and interesting things will happen. For example, "SELECT id, clmn FROM tbl WHERE clmn = 999999999999999999" may return records where `clmn` is '999999999999999999', also where `clmn` is, for example, '999999999999999998' (because those big integers are too big for DOUBLE and, when converted, they have the same representation).

So the correct query is "SELECT id, clmn FROM tbl WHERE clmn = '999999999999999999'"

justin_oaks5y ago

Those are certainly valid WTFs. I've been bitten by the GROUP_CONCAT issue before.

This list is missing the WTF that cascaded deletes/updates don't cause triggers to fire on child tables:

arriu5y ago

Alright, seriously, thank you for this list! You just solved two mysteries that had my team going in circles.

l0b05y ago

Is there something similar for Postgres? I realize it might be a lot shorter, but any system big enough has its share of gotchas.

bmn__5y ago

https://sql-info.de/postgresql/postgres-gotchas.html

https://wiki.postgresql.org/wiki/Don't_Do_This

eska5y ago· 5 in thread

This gives me flashbacks to the DOS filesystem timestamps. It's exactly the same mistake. By splitting the date into multiple fields, bits are wasted. If they hadn't tried to be smart and just made it one number, it would've been more precise with a wider range.

Dylan168075y ago

I don't really see it as the same thing. Using multiple fields of bits works fine here. 60 possibilities for seconds and minutes fit very well into 6 bits each. A raw 24-bit integer gives you plus or minus 2330 hours, and if you don't reserve a bit the format they actually went with gives you plus or minus 2047 hours. Since it just needs to represent 999 hours and change, that's fine.

The problem is the weird non-bit packing they did before.

DOS is different, partly because there are fields that fit worse, and partly because 32 bits is just barely enough to store seconds in the first place. If you did a DOS-style packing with 34 or 38 bits it would work fine. And it would be able to represent leap seconds, making it arguably better than unix timestamps!

falcolas5y ago

FWIW, even keeping an interval in a single number still imposes limits. As proven by the epoch's 2038 problem.

saturn_vk5y ago

That's not really a problem though, at least from what i understand. even on 32bit linux systems, time_t is still 64bit. on windows it also seems to be 64bit

https://people.cs.nctu.edu.tw/~tsaiwn/sisc/runtime_error_200...

mackal5y ago

Which MySQL still isn't safe for :P (2038 that is, for where they use it)

SoSoRoCoCo5y ago

An oldie but a goodie:

jarym5y ago· 3 in thread

MySQL has come a long long way indeed but not enough to make me turn away from Postgres.

Florin_Andrei5y ago

I used to see literally this same comment as far back as 10 years ago, maybe more.

Just saying - I have no horse in this race.

jarym5y ago

Well I’m not wed to anything - maybe sometime we’ll see an article about reasons TO switch.

pippy5y ago

For me MySQL isn't even worth touching. Just this week their 'stable' binaries from their website threw seg faults while executing large SQL batches on my dev box. I replaced it with mariaDB and had no issues.

josefx5y ago· 3 in thread

> 1 bit sign (1= non-negative, 0= negative)

First time I ever saw a number where the leading sign bit has to be set to 1 to indicate non-negative.

js25y ago

https://en.wikipedia.org/wiki/Offset_binary

jnwatson5y ago

Sorts better that way.

karmakaze5y ago

It would binary sort weird with negatives < 0 < positives but would put the largest magnitude negative closest to 0.

Also not Offset binary which uses 0111 for -1 rather than MySQL's 0001.

nitramt5y ago· 1 in thread

That's yet another incompatibility with the ISO/ANSI SQL specification. In the specification, the TIME type is defined as containing HOUR, MINUTE and SECOND fields, representing the "hour within day", "minute within hour" and "second within minute" values, respectively, so, the valid range for that type supposed to be "00:00:00:00.00000..." to "23:59:59.99999...". It's not intended to represent an interval, although that seems to be the intended semantics for MySQL's TIME type:

(from https://dev.mysql.com/doc/refman/8.0/en/time.html):

> but also elapsed time or a time interval between two events (which may be much greater than 24 hours, or even negative).

For representing temporal intervals, the specification defines two kinds of INTERVAL types (year-month and day-time). Year-month intervals can represent intervals in terms of years, months, or a combination of years and months. Similarly, day-time interval, can represent intervals in terms of days, hours, minutes or seconds, or combinations of them (e.g, days+hours, days+hours+minutes, hours+minutes, etc.)

As a sidenote, the TIME and DATE types are related to the TIMESTAMP type in that TIMESTAMP can be thought of as combination of a DATE part (year, month, day) and a TIME (hour, minute, second) part.

hans_castorp5y ago

Well MySQL has a history of ignoring the SQL standard even for the most simple things (even if those were defined before MySQL even existed), so it's not really surprising they got the TIME data type wrong as well.

gigatexal5y ago· 1 in thread

Here’s a Redshift oddity that I don’t think is documented:

select sum(y.a), count(y.a) from(select distinct x.a from ( select 1 as a union all select 2 as a union all select 1 as a)x)y

sum | count -----+------- 4 | 3

Sqlite3 returns the correct results of sum of 3 count of 2.

To fix this don’t use subqueries.

hans_castorp5y ago

Interesting, Postgres does this correctly:

https://dbfiddle.uk/?rdbms=postgres_12&fiddle=dc4ce40bd52695...

TedShiller5y ago· 2 in thread

I gave up on MySQL a long time ago, when I realized that I had to activate special types of settings just to make Unicode characters work in tables.

In Postgres, it just works out of the box.

chrisan5y ago

Must have been more than 10 years ago when utf8mb4 was added where you don't have to activate any kind of special settings.

I was using MSSQL prior to 2010 so I have no idea of MySQL unicode handling before that

lmm5y ago

> Must have been more than 10 years ago when utf8mb4 was added where you don't have to activate any kind of special settings.

Less than 10 years ago; MySQL 5.5 went GA in December 2010.

sfilargi5y ago· 3 in thread

“I am struggling to imagine the circumstances where ..... can break ....”

If only I had a penny for everyone I heard this argument and we ended up breaking regression tests or something really obscure in the qa or customer setup

nullsense5y ago

I always catch myself when I'm saying it and try to make sense of the cognitive dissonance of saying nothing will break and being somewhat convinced that something, somewhere will.

forcemajeure5y ago

Fear the unknown unknowns!

sfilargi5y ago

everytime* not everyone

And yes, I have done it so many times myself too

smeeth5y ago· 1 in thread

I have yet to find a situation where using a native datetime format made more sense than using a unix timestamp in integer fields.

castorp5y ago

https://blog.sql-workbench.eu/post/epoch-mania/

crazygringo5y ago

Ultimately this is a bit of a rant about why MySQL didn't bother changing the TIME type to support an elegant maximum value of 1,024 hours instead of 838.

But, seriously? Who cares? It's not even close to an extra order of magnitude of range. The type is obviously meant to be used for time values that have a context of hours within a day, supporting a few days as headroom... so supporting 1,024 instead of 838 is pointless -- if you're getting anywhere even close to the max value, you probably shouldn't be using this type in the first place.

And yes, it's probably best not to change it for backwards compatibility. Can I imagine a case where it could break something? No, not off the top of my head. But it probably would break some application somewhere. And for such a widely deployed piece of critical foundational infrastructure, being conservative is the way to go.

Nothing about this seems WTF at all, except for the author's seeming opinion that elegant, power-of-two ranges ought to trump backwards compatibility with things that probably made sense at the time.

jmnicolas5y ago

IIRC there was also another problem with MySQL and .NET: MySQL used the year 0 as an uninitialized DateTime but .NET starts at year 1.

I think the workaround was to pass some parameter in the connection string.

mikorym5y ago

TL;DR

So the answer is just it's for backwards compatibility with MySQL 3?

I was kind of hoping for more.

falcolas5y ago· 3 in thread

So, mysql is preserving backwards compatibility? Good. A flag to break this backwards compatibility and offer a larger range is probably warranted, but until that's implemented this behavior is fine.

"I think this is stupid" is a really poor reason to break backwards compatibility, despite how many other software projects use this reasoning.

But of course, MySQL bad, PostgreSQL good.

Sebb7675y ago

> "I think this is stupid" is a really poor reason to break backwards compatibility, despite how many other software projects use this reasoning.

True. But his argument seems to be in the direction of "this is highly unexpected behaviour" and I tend to agree. The number of applications broken by extending the date range probably dwarves to the number of bugs avoided by not having the time span break at such a strange length.

falcolas5y ago

> "this is highly unexpected behaviour"

The behavior in question is having an upper and lower bound on a time interval. This strikes me as highly expected behavior.

The maximum on the interval is lower than the author expected; and reading the docs quickly cleared up what the interval maximum is.

The entire rant boils down to "MySQL's choice to keep backwards compatibility is stupid, because I think this interval limit should be larger."

1. https://stackoverflow.com/questions/50364475/how-to-force-po...

hans_castorp5y ago

MySQL wouldn't be in that situation if they had paid attention to the definition of the TIME (and INTERVAL) data types specified in the SQL standard long before MySQL was created

hackbinary5y ago· 2 in thread

I learned the semi hard way on PoC system where the Mysql index corrupted.

I was lucky and could simply redeploy my application, but I have never used Mysql since.

falcolas5y ago

How does a corrupted index and a TIME type come into play with each other?

hackbinary5y ago

Because the way MySQL does other things seems has been troublesome. IIRC, MySQL essentially generates incremented integer table and column ids, so rebuilding tables relationships meant that you had to manually regenerate the table schema, then update all the old table files with the new table and column ids which were incremented on from what they were previously.

With Postgres, you can just drop the binary table data files in, restart the database and pg rebuilds the indexes and relationships.

From a quick google, it now looks like MySQL is now more robust in this regard.

jspaetzel5y ago

How'd this get voted so many points? RTFM and use a more appropriate datatype.

j / k navigate · click thread line to collapse

114 comments

65 comments · 18 top-level

munk-a5y ago· 7 in thread

https://www.postgresql.org/docs/13/arrays.html

I am also a big fan of Postgres, and tend to have a bit of fun picking on MySQL having been scarred by it in a past life. But since we're picking at Postgres warts...

- `TEXT[]`

- `TEXT[][]`

- `TEXT[][][]`

- `TEXT[][][][]`

- `TEXT[][][][][]`

- `TEXT[][][][][][]`

The inverse is true as well! A column specified `TEXT[][]` (and so on) will accept `{text}`. Of course, none of this (as far as I've been able to find) is documented.

But wait, there's more!

But wait, there's more still!

What if you want to get at a nested array? Idk, I'm sure it's possible, but if you thought `SELECT that_array[1]` is the way to do it, look under your seat because you're getting a `NULL`!

- - -

So in the course of verifying the particulars to write this comment, I played around with some things, and discovered the `NULL` issue.

At least when Postgres has wildly unexpected behavior, it's exceptionally unexpected behavior.

wnoise5y ago

All of the array types are basically the same. The docs actually do mention this, but only in passing as a current limitation.

Similar text extends at least all the way back to the documentation for 7.1.

[1]: https://stackoverflow.com/questions/715432/why-is-sql-server...

[0]: https://www.postgresql.org/docs/current/xfunc-sql.html#XFUNC... [1]: https://www.postgresql.org/docs/current/bookindex.html

masklinn5y ago

> I'm a big fan of Postgres too for a number of reasons, but this issue is pretty clearly documented

Of course it is, the documentation is where TFAA got the information in the 4th paragraph of the story, out of 15 or so.

The range itself is what nerd-sniped the author and led them to try and find out why mysql had such an odd yet specific range.

benesch5y ago

> I hit in Postgres recently that is terribly documented.

I'm going to have to disagree with you there. This issue is quite well documented in the "SQL Functions Returning Sets" section [0]. The relevant bit starts thusly:

And there's even a note about the crazy behavior pre-PostgreSQL 10:

thaumasiotes5y ago

I think of the Postgres docs as significantly better than most other documentation for this reason. The information is there and it's organized in a way that makes sense.

munk-a5y ago

I hadn't seen the index - that's neat.

simias5y ago· 3 in thread

kstrauser5y ago

I'll comment: it absolutely didn't make sense back then, either. We didn't use BCD for pretty much anything in '95. If anything, all timestamps were 32 bit signed ints.

Doctor_Fegg5y ago

BCD was outdated when I got my first Amstrad CPC (1984). The assembly language textbooks all said “this is a weird holdover from the 8080, don’t bother with it”.

https://en.wikipedia.org/wiki/Binary-coded_decimal#Packed_BC...

Roboprog5y ago

Earlier than the 60s. Had to work on a system in the 90s to “transpile” IBM minicomputer code to C.

Useful for accounting since what you see is what you get as far as cents and other decimals go. 1/5, and thus 1/10, can only be approximated in floating point.

Not that I’m defending MySQL

njharman5y ago· 3 in thread

This is going to sound insulting, maybe it is, sorry. It's definitely subjective.

kaslai5y ago

smitty1e5y ago

MySQL and PHP are a dynamic duo that never fail to surprise.

And not in a desirable way.

dancemethis5y ago

I mean, with the exception of the name, of course.

"postgre" feels and sounds like a plate full of very wet, oily rice and the subsequent addition of clear vomit to the mix.

xeeeeeeeeeeenu5y ago· 10 in thread

A list of MySQL WTFs: https://grinnz.com/stuff/lolmysql.txt

kochthesecond5y ago

Nice

These three points has made me raving mad from working with mysql:

utf8 being effectively alias of utf8mb3 has cost us so much work its not even funny.

Dylan168075y ago

> utf8 being effectively alias of utf8mb3 has cost us so much work its not even funny.

speeder5y ago

I am currently trying to fix a program that was made by a person that didn't knew those details of MySQL...

Most weirdly, the fact that the default collation is SWEDISH.

It is a complete freak show, the users kinda got used to it, butchering our language (portuguese) to use only characters valid in english, hoping MySQL won't barf spetacularly on them.

3 more replies

lmm5y ago

> - The default 'latin1' character set is in fact cp1252, not ISO-8859-1, meaning it contains the extra characters in the Windows codepage.

Oh you sweet summer child. No it isn't. It's somewhat like Windows CP1252, but it also defines 8 other extra characters that are not in cp1252.

jcranmer5y ago

> - The default 'latin1' character set is in fact cp1252, not ISO-8859-1, meaning it contains the extra characters in the Windows codepage.

https://bugs.mysql.com/bug.php?id=11472

jakobdabo5y ago

So the correct query is "SELECT id, clmn FROM tbl WHERE clmn = '999999999999999999'"

justin_oaks5y ago

Those are certainly valid WTFs. I've been bitten by the GROUP_CONCAT issue before.

This list is missing the WTF that cascaded deletes/updates don't cause triggers to fire on child tables:

arriu5y ago

Alright, seriously, thank you for this list! You just solved two mysteries that had my team going in circles.

l0b05y ago

Is there something similar for Postgres? I realize it might be a lot shorter, but any system big enough has its share of gotchas.

bmn__5y ago

https://sql-info.de/postgresql/postgres-gotchas.html

https://wiki.postgresql.org/wiki/Don't_Do_This

eska5y ago· 5 in thread

Dylan168075y ago

The problem is the weird non-bit packing they did before.

falcolas5y ago

FWIW, even keeping an interval in a single number still imposes limits. As proven by the epoch's 2038 problem.

saturn_vk5y ago

That's not really a problem though, at least from what i understand. even on 32bit linux systems, time_t is still 64bit. on windows it also seems to be 64bit

https://people.cs.nctu.edu.tw/~tsaiwn/sisc/runtime_error_200...

mackal5y ago

Which MySQL still isn't safe for :P (2038 that is, for where they use it)

SoSoRoCoCo5y ago

An oldie but a goodie:

jarym5y ago· 3 in thread

MySQL has come a long long way indeed but not enough to make me turn away from Postgres.

Florin_Andrei5y ago

I used to see literally this same comment as far back as 10 years ago, maybe more.

Just saying - I have no horse in this race.

jarym5y ago

Well I’m not wed to anything - maybe sometime we’ll see an article about reasons TO switch.

pippy5y ago

josefx5y ago· 3 in thread

> 1 bit sign (1= non-negative, 0= negative)

First time I ever saw a number where the leading sign bit has to be set to 1 to indicate non-negative.

js25y ago

https://en.wikipedia.org/wiki/Offset_binary

jnwatson5y ago

Sorts better that way.

karmakaze5y ago

It would binary sort weird with negatives < 0 < positives but would put the largest magnitude negative closest to 0.

Also not Offset binary which uses 0111 for -1 rather than MySQL's 0001.

nitramt5y ago· 1 in thread

(from https://dev.mysql.com/doc/refman/8.0/en/time.html):

> but also elapsed time or a time interval between two events (which may be much greater than 24 hours, or even negative).

As a sidenote, the TIME and DATE types are related to the TIMESTAMP type in that TIMESTAMP can be thought of as combination of a DATE part (year, month, day) and a TIME (hour, minute, second) part.

hans_castorp5y ago

gigatexal5y ago· 1 in thread

Here’s a Redshift oddity that I don’t think is documented:

select sum(y.a), count(y.a) from(select distinct x.a from ( select 1 as a union all select 2 as a union all select 1 as a)x)y

sum | count -----+------- 4 | 3

Sqlite3 returns the correct results of sum of 3 count of 2.

To fix this don’t use subqueries.

hans_castorp5y ago

Interesting, Postgres does this correctly:

https://dbfiddle.uk/?rdbms=postgres_12&fiddle=dc4ce40bd52695...

TedShiller5y ago· 2 in thread

I gave up on MySQL a long time ago, when I realized that I had to activate special types of settings just to make Unicode characters work in tables.

In Postgres, it just works out of the box.

chrisan5y ago

Must have been more than 10 years ago when utf8mb4 was added where you don't have to activate any kind of special settings.

I was using MSSQL prior to 2010 so I have no idea of MySQL unicode handling before that

lmm5y ago

> Must have been more than 10 years ago when utf8mb4 was added where you don't have to activate any kind of special settings.

Less than 10 years ago; MySQL 5.5 went GA in December 2010.

sfilargi5y ago· 3 in thread

“I am struggling to imagine the circumstances where ..... can break ....”

If only I had a penny for everyone I heard this argument and we ended up breaking regression tests or something really obscure in the qa or customer setup

nullsense5y ago

I always catch myself when I'm saying it and try to make sense of the cognitive dissonance of saying nothing will break and being somewhat convinced that something, somewhere will.

forcemajeure5y ago

Fear the unknown unknowns!

sfilargi5y ago

everytime* not everyone

And yes, I have done it so many times myself too

smeeth5y ago· 1 in thread

I have yet to find a situation where using a native datetime format made more sense than using a unix timestamp in integer fields.

castorp5y ago

https://blog.sql-workbench.eu/post/epoch-mania/

crazygringo5y ago

Ultimately this is a bit of a rant about why MySQL didn't bother changing the TIME type to support an elegant maximum value of 1,024 hours instead of 838.

Nothing about this seems WTF at all, except for the author's seeming opinion that elegant, power-of-two ranges ought to trump backwards compatibility with things that probably made sense at the time.

jmnicolas5y ago

IIRC there was also another problem with MySQL and .NET: MySQL used the year 0 as an uninitialized DateTime but .NET starts at year 1.

I think the workaround was to pass some parameter in the connection string.

mikorym5y ago

TL;DR

So the answer is just it's for backwards compatibility with MySQL 3?

I was kind of hoping for more.

falcolas5y ago· 3 in thread

So, mysql is preserving backwards compatibility? Good. A flag to break this backwards compatibility and offer a larger range is probably warranted, but until that's implemented this behavior is fine.

"I think this is stupid" is a really poor reason to break backwards compatibility, despite how many other software projects use this reasoning.

But of course, MySQL bad, PostgreSQL good.

Sebb7675y ago

> "I think this is stupid" is a really poor reason to break backwards compatibility, despite how many other software projects use this reasoning.

falcolas5y ago

> "this is highly unexpected behaviour"

The behavior in question is having an upper and lower bound on a time interval. This strikes me as highly expected behavior.

The maximum on the interval is lower than the author expected; and reading the docs quickly cleared up what the interval maximum is.

The entire rant boils down to "MySQL's choice to keep backwards compatibility is stupid, because I think this interval limit should be larger."