But also, this sounds like a premature optimisation. Most applications will never reach a level where their performance is actually impacted by string comparison, and when you reach that stage, you're likely have already thrown out a lot of other common sense stuff like db normalisation to get there, and we shouldn't judge "regular people" advice because it doesn't usually apply to you anyway.
Out of curiosity, have you ever seen an application that was meaningfully impacted by this? How gigantic was it?
----
Scratch that. I've actually thought about it some more, and now I'm not 100% sure it's premature, I have to investigate further to be sure. Question still stands though.
Due to the type of aggregate queries that typify analytics workloads, almost everything turns into a scan, whether it be of the a table, field, or index. Strings occupy more space on disk, or in RAM, so scanning a whole column or table simply takes longer, because you have to shovel more bytes through the CPU. This doesn't even take into account the relative CPU time to actually do the comparisons.
I've never personally worked with a system that has string keys shorter than 10 [1][2] characters. At that point, regardless of how you pack characters into a register, you're occupying more bits with two strings of character data than you would with two 64-bit integers[3]. This shows through in join time.
[0]: Even modestly sized companies tend to have at least a few tables that get into the millions of records.
[1]: I've heard of systems with shorter string keys
[2]: Most systems with string keys I've encountered have more than 10 characters.
[3]: The vast majority of systems I've seen since the mid-2010s use 64-bit integers for keys for analytics. 32-bit integers seemed to phase out for new systems I've seen since ~2015, but were more common prior to that.
Don't most databases set a length limit on ID strings?
If you're setting a length limit, and it's made out of digits with no leading zeroes, then you might as well store it as a number. Is there a downside?
A numeric identity is an identity and so is a string.
If you want to math it, it is a number, otherwise... string.
"Will you ever want the 95th percentile PID? Then it is not a number. Move on."
If you need to do math with the thing, use an appropriate type of number, of course.
If you're using 64 bit numbers as a high cardinality identity that can be randomly generated without concern for collision (like a MAC address with more noise) -- well, that's an identity and doesn't need to have math applied to it. For example: "What's the mean IP address that's connected to cloudflare in the last 10 minutes" or "what's the sum of every mac address in this subnet?" are both nonsense properties because these "numbers" are identities not numbers, and using a data type that treats them as numbers invites surprising, sometimes unpleasantly so, results.
Of course, because these are computers, all strings are ultimately numbers but their numberness is without real meaning.