This is a bold claim and doesn't match my experience at all. UUIDv4 is all I see, everywhere, everyday.
That's also a big enough caveat to put in the title: if you have a beef with UUIDv1, say UUIDv1 is obsolete.
Postgres only offers random uuid generation (https://www.postgresql.org/docs/15/functions-uuid.html).
The `uuidgen` CLI tool, at least for modern versions (I have not checked historically), says (from https://man7.org/linux/man-pages/man1/uuidgen.1.html): "By default uuidgen will generate a random-based UUID if a high-quality random number generator is present." (later it lists /dev/random as such a generator, present on almost all systems)
What's an example of a system that generates v1 uuids by default?
Although plenty of UUIDs are passed as strings in eg JSON, I was under the impression that where performance really matters (like db indexes) they were stored and compared as 128 bit fields. To be fair, the points about word sizes and ordering make sense.
MySQL. One of many reasons to avoid it.
Kludgy, non-cryptographically-safe UUID4 implementation in MySQL: https://stackoverflow.com/a/32965744
2. UUID strings are awful for storage -- don't use them. Yes there are databases that support UUIDs natively, why is whether or not a UUID fits into a machine word relevant? You use UUIDs for its other properties that 64-bit integers cannot offer. KSUIDs are touted as fixing all the aforementioned issues but they're even bigger than UUIDs.
3. Both KSUIDs and UUIDs are hard for humans to read compared to 64-bit integers.
4. You don't have to encode UUIDs as hexadecimal numbers plus dashes. You can choose any binary encoding you want, I am partial to Crockford Base32 because of how general-purpose it is (no vulgarities, case insensitive so it works on Windows filesystems).
5. I still consider time-sortable UUID alternatives (like ULID) to be UUIDs. This article should have explicitly mentioned UUIDv1 and UUIDv4 in the title and it wouldn't have been so flamebait.
9,223,372,036,854,775,807 is as nasty as a UUID to remember and type
Is my knee-jerk judgement that this advice borders on nonsense, unwarranted?
No, the advice is nonsense. URIs in what scheme?
I mean, since URN has a URI scheme and UUID is a URN namespace, so urn:uuid:<uuid-value> is a URI, “use a URI” is not really a mutual-exclusive alternative to using UUID, its just much less specific.
I’m just a bit confused, a UUID is made up of hexadecimal digits, so why would it be stored as a string? It’s also 128 bits long, so it should fit into two words, excluding whatever overhead the DBMS puts on the data type, which is really their problem to worry about.
You are correct that a UUID is a 128 bit identifier, and so, fits in 128 bits.
It's unlikely to happen but still possible and it has brought down some of our parallel worker pool because once you have a collision, you are bound to keep generating the same id sequence until you restart your whole process to randomize the counter again.
However, with all the things already supporting UUID, I also don't see any reason to switch from UUIDv4 to anything else. I don't see how UUID, in general is obsolete, with the support it has from different libraries, and databases.
If you use incremental numbers, every table has 1, 2, 3.
Okay, so, not all UUIDs, just v1. And, for some anecdata, I've actually only interacted with UUID v4 in my entire career; I don't know what the actual norm is, but I'm surprised to hear that it might still be v1.
> The only other practical option is version 4 – the random UUID – but random is intuitively worse, right? Read on to find out.
Oh… how is it worse?
> * They are awful as keys – being strings, comparisons are dramatically slower than with integers. And even if your database has a UUID type, it’s still worse because the identifier doesn’t fit into a machine word.
> * They are excessively long – each character of a UUID only encodes 3.5 bits of information if you count the dashes. That’s twice as less compared to 6 bits of Base64.
Sorry, UUIDs are not strings, they're 128-bit integers. They have a standardized string representation, but if you're storing a UUID as a string, you're either being required to because your language/db/tools/etc. don't support UUIDs correctly, or you're doing it wrong.
> * They are not time-ordered – despite containing a timestamp, its bits are mixed up within the UUID: the top bytes of the UUID contain the bottom bytes of the timestamp. Databases do not like an unordered primary key – it means that freshly inserted rows can go anywhere in the index. And you can’t use UUIDs for ad-hoc time sorting by time, either.
This is definitely a drawback when using a UUID as a primary key, and there are alternatives for this specific use-case. However, I think the best solution I've seen to this is to use a typical 64-bit integer for the primary key, but a UUID for a user-visible ID (so that you don't leak information about the primary keys to users); this makes joins and indexes fast, but avoids the leak to the end-user.
> * They are bad for human comprehension – UUIDs tend to look alike, and it’s hard to visually seek and compare them. This comes from experience.
This is exactly why they shouldn't be used as an Id anywhere that a human needs to interact with one. In the above solution I mentioned, the most common ID for which you'd want to use a UUID is the user's id—the user specifically has no reason to ever refer to their or anyone else's id; they'll use the human-readable username/handle equivalent instead. And developers don't need to care about UUIDs ever because inside the db, you'd have the integer primary key that you use for joins. This seems to solve all the problems?
> I kindly suggest that UUIDs are never the right answer.
Honestly, I think you've only convinced me that UUID v1 is never the right answer… and I think that's mostly been true since v4 came about.
All the best,
-HG
TLDR on the article: don't use UUIDv1.
Lastly, even with the best and most randomized generation, it still doesn't protect you from copy pasting: https://news.ycombinator.com/item?id=22354449
TLDR: Don't use UUID v1, since its entropy is based on the Mac address, if your cloud provider is generating the same mac addresses for all your containers.
To say not use UUID's it makes no sense. Use UUIDv7, use them in postgres https://github.com/fboulnois/pg_uuidv7 have fun :)
(It's fine to make a new format / it's not terrible approach for making a random ident, though you might want to peek into, e.g., ksuid from the OP for some interesting points about why you might not want to do that, plus some advice about getrandom() over /dev/random.)