> takes twice as many instructions
What is your preferred system? How does it affect other needs, like collation, or testing if something is upper-case vs. lower-case, or ease of supporting case-insensitivity?
Have you measured the performance difference? https://johnnylee-sde.github.io/Fast-unsigned-integer-to-hex... shows a branchless UlongToHexString which is essentially as fast as a lookup table and faster than the "naive" implementation.
> Bounds-checking for the English alphabet
In the following it goes from 2 assembly instructions to three:
int is_letter(char c) {
c |= 0x20; // normalize to lowercase
return ('a' <= c) && (c <= 'z');
}
Yes, that's 50% more assembly, to add a single bit-wise or, when testing a single character.
But, seriously, when is this useful? English words include an apostrophe, names like the English author Brontë use diacritics, and æ is still (rarely) used, like in the "Endowed Chair for Orthopædic Investigation" at https://orthop.washington.edu/research/ourlabs/collagen/peop... .
And when testing multiple characters at a time, there are clever optimizations like those used in UlongToHexString. SIMD within a register (SWAR) is quite powerful, eg, 8 characters could be or'ed at once in 64 bits, and of course the CPU can do a lot of work to pipeline things, so 50% more single-clock-tick instructions does not mean %50 more work.
> like front and back braces/(angle)brackets/parens not being convertible
I have never needed that operation. Why do you need it?
Usually when I find a "(" I know I need a ")", and if I also allow a "[" then I need an if-statement anyway since A(8) and A[8] are different things, and both paths implicitly know what to expect.
> and saved a few instructions in common parsing loops.
Parsing needs to know what specific character comes next, and they are very rarely limited to only those characters. The ones I've looked use a DFA, eg, via a switch statement or lookup table.
I can't figure out what advantage there is to that ordering, that is, I can't see why there would be any overall savings.
Especially in a language like C++ with > and >> and >>= and A<B<int>> and -> where only some of them are balanced.