undefined | Better HN

0 pointsvidarh2y ago0 comments

The BASIC code is only in it's full textual form on screen. The moment you press return on it, it's tokenized, and it's stored tokenized both in memory and when saved. Unlike modern systems, the full textual representation of the code is never stored anywhere.

0 comments

dep_b2y ago

When I SAVE a program in C64 BASIC and LOAD it again the syntax doesn't change no matter what I do, add spaces or not, use shorthand or not, colons, etcetera. So I get the feeling that my whole program gets saved as a string and then parsed, not tokenized and saved.

Also there is a line limit in C64 BASIC that would overflow if certain shorthand would be expanded and for beginners to see their fully written keywords being transformed to shorthand after loading would be even more confusing.

pgeorgi2y ago

The keywords are tokenized, the line number is converted to a 16bit integer, leading spaces are stripped (which is why some "formatted" BASIC uses ":" as the first character in a line, like the following), everything else is kept intact.

10 for i = 1 to 10

20 : (arbitrary number of spaces) print "hello"

30 next

The short hand issue is real, too:

1?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?

expands into six lines of "1 print:print:....:print" that you can't simply edit because the limit is 80 characters (two lines)

actionfromafar2y ago

It's an accident of history this didn't continue. So many code style wars could have been avoided over the eons.

vidarhOP2y ago

It wasn't that it didn't continue, but that this was unique to a branch of languages that largely were sidelined.

And the tokenization didn't prevent you from style differences anyway - as the article points out it e.g. keeps spaces etc. It only tokenized a few things, like keywords and line numbers.

(EDIT: in the late 90's I worked on a project written in Word BASIC.... It was also tokenized and that was used as an opportunity to translate the keywords in the localised versions of Word. But someone had managed to write a bunch of code in the Danish version and somehow exported it as text and imported it into the Norwegian version - the languages are similar enough that it was really hard to tell (no syntax highlighting, and they'd edited a bunch before realising and I had the fun job of untangling it... Yay...)

BayesianDice2y ago

I think I read about a tool for the Acorn BBC micros which you could apply to your program to remove unnecessary spaces etc. in the tokenised form and shave off a few bytes.

This had the side-effect that you could still display and (presumably with a bit more mental effort, read) the program listing fine, but re-entering a line as shown in that listing would fail because the computer depended on the spaces to do the parsing, even if they were redundant after the tokenisation happened.

cvcount2y ago

On the ZX Spectrum, numeric values were saved as both text, and in a five-byte floating point format. So making lines shorter often involved using keywords to avoid that: NOT PI, SGN PI, VAL "2" etc.

kbelder2y ago

Atari BASIC tokenized source a bit more thoroughly... eliminated spaces, parsed constants into their floating point representation, etc. It did enforce formatting to a degree.

I was always surprised that it did all that, but wasn't any faster than Commodore BASIC.

j / k navigate · click thread line to collapse