ULIDs are sortable (time component), short (26 chars) and nearly human readable, and good enough entropy/randomness for everything I'd ever be working on.
Does anyone have any criticisms of ULIDs? I can't see how they don't take over general purpose use of unique ids in the future except where a more guarantee of uniqueness is needed. (ie, bajillion records a second unique...)
https://github.com/segmentio/ksuid
https://segment.com/blog/a-brief-history-of-the-uuid/
Same advantages of ULIDs, but I prefer the base62 to the base32 encoding (more compact; no need to bikeshed about upper versus lower case), it's been tested at scale, and the decisions made are sensible.
[1]: Specifically, they try and guarantee absolute monotonicity. The way they do this is that if you ever try and generate more than on ULID per millisecond, you increment the least significant bit of the random component. In other words, we have a key that's basically <timestamp>-<random-int>, and if you generate more than one key per timestamp, you just increment the random number. If the random number would overflow, by the spec, you have to just throw an exception; no wraparound. There's a lot of issues here. For one thing, none of this can possibly work if you're generating your IDs in a distributed fashion; it assumes a single, central, consistent key generator. For another, our key generator now has state, since it needs to know if any keys have been generated earlier, and if so, what they were. Doable, but...potentially a lot of work depending on your environment. Also, why are we even trying to force strict monotonicity? What does that possibly gain you? Why would we want a spec that, by design, has a chance of sometimes not letting you generate a key? The whole thing feels like the result of someone really wanting an auto-incrementing primary key, hearing that UUIDs were cool, and trying to make a auto-incrementing primary key that looks like a UUID, ending up without the advantages of either. Of course, you could ignore the spec (and several implementations do), but at this point it's worth asking what you're gaining from ULID. It's a weird feature that basically only works if you don't need it (since realistically, anyone generating many keys per millisecond would of course need to generate the keys in a distributed fashion).
I find the argument that ULID won't work under extreme and harsh conditions proof that it's just fine for many of us that simply do not work on systems with that kind of load/requirements.
I appreciate seeing the weaknesses of ULID, as this helps me choose whether or not I can live with them. (which I can)
Again, thanks for the detailed reply, it was very helpful.
Thus the quest for the perfect identifier continues.
NB: generating many unique IDs per millisecond for a long period may be a hallmark of a large distributed system, but even a small application may want this for a brief time e.g. importing bulk customer data.
So,for me, your criticism boils down to "this is weird because it has a feature I don't need". Why would you care?
They're much easier for humans to differentiate than the usual long string of hex characters (even 26 characters is too long to reliably compare when a single mismatched character might make all the difference).
Examples of randomart:
Generating public/private rsa key pair.
The key fingerprint is:
05:1e:1e:c1:ac:b9:d1:1c:6a:60:ce:0f:77:6c:78:47 you@i
The key's randomart image is:
+--[ RSA 2048]----+
| o=. |
| o o++E |
| + . Ooo. |
| + O B.. |
| = *S. |
| o |
| |
| |
| |
+-----------------+
Generating public/private dsa key pair.
The key fingerprint is:
b6:dd:b7:1f:bc:25:31:d3:12:f4:92:1c:0b:93:5f:4b you@i
The key's randomart image is:
+--[ DSA 1024]----+
| o.o |
| .= E.|
| .B.o|
| .= |
| S = .|
| . o . .= |
| . . . oo.|
| . o+|
| .o.|
+-----------------+
[1] - http://www.dirk-loss.de/sshvis/drunken_bishop.pdf[2] - https://www.man7.org/linux/man-pages/man1/ssh.1.html
[3] - https://superuser.com/questions/22535/what-is-randomart-prod...
ssh user@host -o VisualHostKey=yes
To see the randomart of your own key, or your known hosts: ssh-keygen -lv -f ~/.ssh/mykey
ssh-keygen -lv -f ~/.ssh/known_hostsI'm not a fan of the Crockford encoding, since it supports noncanonical forms that'll silently trash the lexicographic sorting assertion when present, and the exclusion of "U" as some kind of profanity filter is both prissy and ineffective.
base58 seems better, vs my own fat fingers at any rate.
Excluding ILO doesn't seem a bad idea - but leaving out U for that particular reasons seems downright weird:
In general, if you want to encode info in your IDs, you can do that. Just make sure you want to do that, and that you don't run out of entropy.
I view random UUIDs as a silver-bullet type solution to assigning IDs, without overlap. (8 bytes should be enough to just assign them at random)
[1] https://tools.ietf.org/html/draft-peabody-dispatch-new-uuid-... [2] http://gh.peabody.io/uuidv6/
Also, this sounds like a non-starter for UUID v6:
"Like version 1 UUIDs, version 6 UUIDs use a MAC address from a local hardware network interface."
Because sometimes (and by sometimes I mean "surprisingly often"), you don't care about exact sortability but simply want roughly-correct ordering.
Examples for most social media sites:
"Give me the latest 100 tweets/videos/posts for this user" => Milliseconds differences won't matter, because users don't tweet/post videos that often.
"Give me the latest 1000 tweets/videos/posts for this search" => Even a significant drift won't matter, because you don't care if things aren't exactly in the correct order, you just care about showing recent stuff.
And at that scale (even long before that scale), having a single oracle for auto-incrementing IDs is a hassle. So this is a nice solution any time you need globally unique IDs, need to support sharding, and your default sort is time-based (or whatever-based if there's another piece of info you want to put in that timestamp portion of the ID, as long as "drift" is a non-issue).
BTW, not just theoretical: I believe this exact reasoning is why twitter came up with time-sortable snowflake IDs.
It's not super uncommon for people to use a normal UUID (usually v1, NOT v4; you need a timestamp), then restructure the fields so the timestamp is in front on save, then flip it back on load; this gives you a "proper" UUID, but (in theory) gives you better performance. See, eg, here: https://www.percona.com/blog/2014/12/19/store-uuid-optimized...
Now, ULID mades an odd attempt at making keys strictly sortable by time, which 1) doesn't work and 2) is pointless. :-) In that case, you really would be better off with an auto-incrementing idea. But while their implementation is questionable, the idea isn't absurd.
You can easily prepend a unix millisecond timestamp to a UUID if you want time-sortability, i.e.
174e1da9377-d739f4ed-6aa1-4c99-a0bf-3b0b68e9ce77
where 174e1da9377 is a unix timestamp in milliseconds.
> short (26 chars)
Short == increased possibility of a collision. If you're okay with this possibility, you can just use UUIDv4 and truncate it.
> and nearly human readable
Convert your UUID into base26 or whatever you wish for human readbility.
> and good enough entropy/randomness
Subjective, if you are okay with "good enough entropy" you could arguably just use a random string of the number of bits you'd like.
Also, for most applications you can just add a timestamp record and sort by that.
Only that they aren’t more widely adopted or supported.
I use UUIDs in things like OBSERVER support[0]. They just need to be unique within the context of the runtime, so there’s no need for anything more.
However, for things like Bluetooth Attributes, I think ULIDs would be a good thing. That ship has probably sailed, though.
[0] https://github.com/RiftValleySoftware/RVS_GeneralObserver#ob...
Combine that with KV stores like badger or rocks that store things in lexicographic order - ULIDs really comes in handy to have a random ID but still be able to do a scan in sorted order
Not a criticism, but something to be aware of, is that SQL Server doesn't sort the uniqueidentifier data type as one would expect. Instead of being ordered by bytes 0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15, uniqueidentifier values are ordered by bytes 10-11-12-13-14-15-8-9-7-6-5-4-3-2-1. So you want the timestamp in the last 48 bits instead of the first 48 bits.
But even without the need to sort by id, one of the advantage is that sequential ids makes it easier for databases to fetch data, as data stored ends up being more sequential in disk as well.
More reading on that:
- https://eager.io/blog/how-long-does-an-id-need-to-be/#locali...
- https://www.percona.com/blog/2019/11/22/uuids-are-popular-bu...
Default sorting out of the box is kind of nice anyways.
Functionally, they're similar, but I think a lot more work has gone into KSUIDs in terms of usability and ensuring they solve the intended problem. The only advantage ULIDs have is they are the same length as vanilla UUIDs, if such is useful to you.
It's cool that I can make deterministic UUIDs, but it seems silly that the spec should require me to expose the fact that it's deterministic. It's not like my deterministic UUID is any more likely to collide with someone else's random one than two deterministic or two random are to collide with each other.
From another angle, if you look at the difference between version 1 and version 2 time+node UUIDs, they're actually quite similar except for version 2 having a UID/GID-like field replacing some of the least significant bits of the other data. So collisions between v1 and v2 would be much more possible without the version bits. And at that point there's no reason not to use similar flags to refer to other generation methods; even if in theory you could decide that they're unlikely to create collisions, that just feels like it leaves the door open for edge cases and having to remember which of those decisions were made. Why not be consistent?
From the opposite angle, in a case where you're using random UUIDs generated within a trust boundary as lookup keys, the knowledge that one of them is random doesn't tell an adversary on the outside anything useful about the others, because they're all random, so it's a few harmless “wasted” bits. Of course, since UUIDs didn't have “unguessable” as a selling point to start with, if that's something you're leaning on, then go ahead and use something else…
In addition, those "drag a file to upload" dialogs are just awful. I don't have a file explorer installed that supports dragging (I just use midnight commander). If you give me a 'browse' button, I can use the gtk file-picker to get something, but "drag and drop" doesn't work. It's also an accessibility issue: screenreaders can click buttons, but they're much worse at dragging files into specific webpage areas.