undefined | Better HN

0 pointsjstimpfle7y ago0 comments

    Token tok;
    start_token(&tok);
    for (;;) {
        int c = look_next_char();
        if (('A' <= c && c <= 'Z') ||
            ('a' <= c && c <= 'z')) {  /* or whatever test */
            consume_char();
            add_to_token(tok, c);
        } else {
            break;
        }
    }
    end_token(tok);

Done. There's no point in going through a weird API.

0 comments

torstenvl7y ago

You've not only hard-coded your tokenization rules inside your logic, but you've managed to make it break on anything non-ASCII. I suppose you consider isalpha() to have a weird API?

jstimpfleOP7y ago

I look at it the other way: I've hard coded the reading and writing routines inside the tokenization logic.

Being able to do that is exactly the point why it's so much simpler to avoid a silly API such as strcspn (or, god forbid, strtok).

> non-ASCII

yeah i know... Do you prefer strcspn("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVXYZ")? Do you think it's faster?

If you're pedantic, you could lex (0x41 <= c && c <= 0x5A). That way at least you consistently read ASCII, even on non-ASCII implementations. But I don't care and it's less readable.

> I suppose you consider isalpha() to have a weird API?

Yes. I do not even understand what it does.

>> isalpha() checks for an alphabetic character; in the standard "C" locale, it is equivalent to (isupper(c) || islower(c)). In some locales, there may be additional characters for which isalpha() is true-letters which are neither upper case nor lower case.

Well in any case I'm sure that's not what I wanted... By the way locale is super hard to use as well. Locale is a process global property. I'm not aware of any way to pass explicit locales to library functions.

avar7y ago

> If you're pedantic, you could lex (0x41 <= c && c <= 0x5A)

'A' v.s. 0x41 makes no difference for portability. The thing that's unportable about that is that it assumes that the characters A..Z are continuous in your character encoding, which isn't portable C.

Although admittedly having to deal with EBCDIC these days is rare in anything except highly portable programs like C compilers or popular script interpreters.

This is why ctype.h functions exist. Just use them.

1 more reply

tptacek7y ago

strcspn is ANSI C90.

j / k navigate · click thread line to collapse