I look at it the other way: I've hard coded the reading and writing routines inside the tokenization logic.
Being able to do that is exactly the point why it's so much simpler to avoid a silly API such as strcspn (or, god forbid, strtok).
> non-ASCII
yeah i know... Do you prefer strcspn("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVXYZ")? Do you think it's faster?
If you're pedantic, you could lex (0x41 <= c && c <= 0x5A). That way at least you consistently read ASCII, even on non-ASCII implementations. But I don't care and it's less readable.
> I suppose you consider isalpha() to have a weird API?
Yes. I do not even understand what it does.
>> isalpha()
checks for an alphabetic character; in the standard "C" locale, it is equivalent to (isupper(c) || islower(c)). In some locales, there may be additional characters for which isalpha() is true-letters which are neither upper case nor lower case.
Well in any case I'm sure that's not what I wanted... By the way locale is super hard to use as well. Locale is a process global property. I'm not aware of any way to pass explicit locales to library functions.