Why did base64 win against uuencode? (opens in new tab)

(retrocomputing.stackexchange.com)

225 pointsegorpv2y ago103 comments

103 comments

One reason that uuencode lost out to Base64 was that uuencode used spaces in its encoding. It was fairly common for Internet protocols in those days to mess with whitespace, so it was often necessary to patch up corrupted uuencode files by hand.

Base64, on the other hand, was carefully designed to survive everything from whitespace corruption to being passed through non-ASCII character sets. And then it became widely used as part of MIME.

jjeaff2y ago

and yet, Internet protocols (http, at least) don't play well with equal signs which are part of base64, sometimes. That little issue has caused lots of intermittent bugs for me over the years, either from forgetting to urlencode it or not urldecoding it at the right time.

eastbound2y ago

So there are 7 base64 encodings, one with “+ / =“, one with “- _ =“, one with “+,” and no “=“… https://en.wikipedia.org/wiki/Base64#Variants_summary_table

2 more replies

OskarS2y ago

And slashes as well, which is a magic character in both urls and file systems. Means you can't reliably use normal base64 for filenames, for instance. That might seem like a niche use-case, but it's really not, because you can use it for content-based addressing. Git does this, names all the blobs in the .git folder after their hash, but you can't encode the hash with regular base64.

1 more reply

JoshTriplett2y ago

Ditto the obnoxious "quoted-printable" mail encoding, which turns every = into =3D.

Still more robust than uuencode though.

1 more reply

onetimeuse923042y ago

I am using base62 for data that can be included in URIs.

afiori2y ago

all three symbols are some of the worst possible choices for compatibility with urls and many other things

.-_ would have been a better choice tha +/=

1 more reply

Vt71fcAqt72y ago

And now we can have whitespace in url queries but we are still using %20 everywhere because "that's standard"...

4 more replies

candiddevmike2y ago

...only for base64 to become fragmented into standard or URL due to character choice, and padded or not padded as a "cool trick to save some bytes".

ggm2y ago

Having lived through the transition, I can say personally it comes down to "packaging" -if MIME had adopted UUENCODE format, I probably would have used it but as materials emerged to me which depended on base64 decode, it became compelling to use it. Once it was ubiquitously available in e.g ssl, it became trivial to decode a base64 encoded thing, no matter what. Not all systems had a functioning uudecode all the time. DOS for instance, you had to find one. If you're given base64 content, you install a base64 encode/decode package and then its what you have.

There was also an extended period of time where people did uux much as they did shar: both of which are inviting somebody else's hands into your execution state and filestore.

We were also obsessed with efficiency. base64 was "sold" as denser encoding. I can't say if it was true overall, but just as we discussed lempel-zif and gzip tuning on usenet news, we discussed uuencode/base64 and other text wrapping.

Ned Freed, Nathaniel Borenstein, Patrik Falstrom and Robert Elz amongst others come to mind as people who worked on the baseXX encoding and discussed this on the lists at the time. Other alphabets were discussed.

uu* was the product of Mike Lesk a decade before, who was a lot quieter on the lists: He'd moved into different circles, was doing other things and not really that interested in the chatter around line encoding issues.

eesmith2y ago

Here are Usenet comments from the the 1994 comp.mail.mime thread "Q: why base64 ,not UUencode?"

1) https://www.usenetarchives.com/view.php?id=comp.mail.mime&mi...

> Some of the characters used by uuencode cannot be represented in some of the mail systems used to carry rfc 822 (and therefore MIME) mail messages. Using uuencode in these environments causes corruption of encoded data. The working group that developed MIME felt that reliability of the encoding scheme was more important that compatibility with uuencode.

In a followup (same link):

> "The only character translation problem I have encountered is that the back-quote (`) does not make it through all mailers and becomes a space ( )."

A followup from that at https://www.usenetarchives.com/view.php?id=comp.mail.mime&mi... says:

> The back-quote problem is only one of many. Several of the characters used by uuencode are not present in (for example) the EBCDIC character set. So a message transmitted over BITNET could get mangled -- especially for traffic between two different countries where they use different versions of EBCDIC, and therefore different translate tables between EBCDIC and ASCII. There are other character sets used by 822-based mail systems that impose similar restrictions, but EBCDIC is the most obvious one.

> We didn't use uuencode because several members of our working group had experience with cases where uuencoded files were garbaged in transit. It works fine for some people, but not for "everybody" (or even "nearly everybody").

> The "no standards for uuencode" wasn't really a problem. If we had wanted to use uuencode, we would have documented the format in the MIME RFC.

That last comment was from Keith Moore, "the author and co-author of several IETF RFCs related to the MIME and SMTP protocols for electronic mail, among others" says https://en.wikipedia.org/wiki/Keith_Moore .

mjevans2y ago

After a given point usenet was nearly 8-bit clean, and thus https://en.wikipedia.org/wiki/YEnc was also developed to convolve all the octets (I + 42 (decimal)) and escape the results that happened to still match reserved characters (CR, LF, 0x0, = (yEnc escape)) - it seems that if the result character was among that set, then = was output and new output determined by O = (I+64) % 256 instead.

wkat42422y ago

Yenc is still used a lot actually, for the purpose of what Usenet has de facto become, a piracy network :)

u801e2y ago

It's too bad yenc didn't take the place of base64 for email.

1 more reply

LukeShu2y ago

> We were also obsessed with efficiency. base64 was "sold" as denser encoding. I can't say if it was true overall

uuencode has file headers/footers, like MIME. But the actual content encoding is basically base64 with a different alphabet; both add precisely 1/3 overhead (plus up to 2 padding bytes at the end).

cratermoon2y ago

uuencode has some additional overhead, namely 2 additional bytes per line, that means it varies from 60-70%, the latter being best case, while base64 is 75% efficient in all cases.

DaiPlusPlus2y ago

On a related note, I'm getting flashbacks to being on the web in the late-1990s, back when "Downloads!" was a reason to visit a particular website; and noticing that Windows users like myself could just download-and-run an .exe file, while the same downloads for Mactintosh users would be a BinHex file that'd also be much larger than the Windows equivalent - and this wasn't over FTP or Telnet, but an in-browser HTTP download, just like today.

Can anyone explain why BinHex remained "popular" in online Mac communities through to the early 2000s? Why couldn't Macs download "real" binary files back then?

ksherlock2y ago

Classic Macintosh files were basically 2 separate files with the same name (data fork and resource fork). Additionally, there was important meta data (Finder Info, most importantly the file type, creator type). Since other file systems couldn't handle forks or finder info, it had to be encapsulated in some other format like binhex, macbinary, applesingle, or stuffit. The other 3 were binary so they would have been smaller. Why not them... shrug

ksherlock2y ago

I wasn't a macintosh user back in the day but for the file archives I frequented (apple II), sometimes files were in BINSCII which was a similar text encoding. The advantage being that they could be emailed inline, posted to usenet, didn't require an 8-bit connection (important back in the 80s), and could be transferred by screen scraping if there wasn't a better alternative.

1 more reply

bradknowles2y ago

What I saw most of the time was a file that was compressed with StuffIt, and then encoded with BinHex. They were usually about inverse in terms of efficiency, so what you saved with StuffIt you would then turn around and lose with BinHex. But the resulting file was roughly the same size as the original file set.

LeoPanthera2y ago

The most common Macintosh archive format was (eventually) StuffIt, but StuffIt Expander couldn't open a .sit file which was missing its resource fork, and when you downloaded a file from the internet, it only came with a data fork.

So a common hack was to binhex the .sit file. Binhex was originally designed to make files 7-bit clean, but had the side effect that it bundled the resource fork and the data fork together.

Later versions of StuffIt could open .sit files which lacked the resource fork just fine, but by then .zip was starting to become more common.

amatecha2y ago

I could be remembering wrong, but didn't later versions of stuffit compress to a .sit file that had no resource fork, so it would stay fully-intact on any filesystem? I may be imagining that, but I remember hitting a certain version where "copying to Windows" would no longer ruin my .sit files... haha

2 more replies

tornato72y ago

Funny because today I find the install process for Mac much simpler. Most installs are "drag this .app file to your Applications folder", meanwhile on Windows you download an installer that downloads another installer that does who-knows-what to your system and leaves ambiguously-named files and registry modifications all over the place.

vbezhenar2y ago

There are plenty of portable windows applications (distributed as a zipped directory) and there are plenty of pkg macOS installers.

I don't really understand why macOS users like this "simple" installation, because when you "uninstall" the app, it leaves all the trash in your system without a chance to clean up. And implying that macOS application somehow will not do "who-knows-what" to your system is just wrong. Docker Desktop is "simple", yet the first thing it does after launch is installing "who-knows-what".

2 more replies

steve19772y ago

If the installer on Windows is properly done, you actually know exactly what it does to your system (including registry modifications). This includes the ability to remove the application completely.

Whereas on macOS, installation is trivial, but then the application sets up stuff upon first run and that is really intransparent then, with no way of properly uninstalling the app unless there is a dedicated uninstaller.

Clamchop2y ago

There are plenty of inscrutable installers for macOS software. DRM-riddled bullshit and enterprise crapware are a disease.

But yeah, the simple case is quite nice.

demondemidi2y ago

The one annoying thing macOS apps do is pollute /Library. Even apps that don’t explicitly write to this area end up with dozens of permafiles. Tons of stuff is spewed in there when you install an application that actually uses it. It’s like a directory version of a registry kitchen sink.

1 more reply

Aardwolf2y ago

A thing I wonder: why is using = padding required in the most common base64 variant?

It's redundant since this info can be fully inferred from the length of the stream.

Even for concatenations it is not necessary to require it, since you must still know the length of each sub stream (and = does not always appear so is not a separator).

There's no way that using the = instead of per-byte length-checking gains any speed, since to prevent reading out of bounds you must check the per byte length anyway, you can't trust input to be a multiple of 4 length.

It could only make sense if it's somehow required to read 4 bytes at once, and you can't possibly read less, but what platform is such?

4gotunameagain2y ago

from Wikipedia:

  The padding character is not essential for decoding, since the number of missing bytes can be inferred from the length of the encoded text. In some implementations, the padding character is mandatory, while for others it is not used. An exception in which padding characters are required is when multiple Base64 encoded files have been concatenated.

vbezhenar2y ago

IMO padding is not necessary and just a relic of old implementations.

Aardwolf2y ago

I think so too. It feels similar to how many specifications from the 90s use big endian 4-byte integers for many things (like png, riff, jpeg, ...) despite little endian CPU's being most common since the 80s already, and those specifications seemingly assuming that you would want to decode those 4-byte values with fread without any bounds checking or endianness dependency.

teo_zero2y ago

Without padding, how would you encode, for example, a message with just a single zero? To be more precise, how do you distinguish it from two zeroes and three zeroes?

Aardwolf2y ago

Both for encoding and decoding the padding is not needed. Without ='s, you get a uniquely different base64 encoding for NULL, 2 NULLs and 3 NULLs.

This shows the binary, base64 without padding and base64 with padding:

NULL --> AA --> AA==

NULL NULL --> AAA --> AAA=

NULL NULL NULL --> AAAA --> AAAA

As you can see, all the padding does is make the base64 length a multiple of 4. You already get uniquely distinguishable symbols for the 3 cases (one, two or three NULL symbols) without the ='s, so they are unnecessary

1 more reply

4gotunameagain2y ago

The output padding is only relevant for decoding. For encoding, since the alphabet of Base64 is 6 bits wide, the padding is 0 when the input is not a multiple of 6 (e.g. encoding two bytes (16 bits) needs two more bits to become a multiple of 6 (18))

Refer to the "examples" section of the wikipedia page

IshKebab2y ago

Perhaps to simplify implementations that read multiple characters at a time?

But I think it's likely just poor design taste.

remram2y ago

> Even for concatenations it is not necessary to require it, since you must still know the length of each sub stream

I'm not sure I understand this part. You can decode aGVsbG8=IHdvcmxk, what do you need to know?

Aardwolf2y ago

The = does not appear if the base64 data is a multiple of 4 length. So you wouldn't know if aGVsbG8I is one or two streams. The = is not a separator, only padding to make the base64 stream a multiple of 4 length for some reason.

I only mentioned the concatenation because Wikipedia claims this use case requires padding while in reality it doesn't.

2 more replies

smudgy2y ago

You know, when I first got into binary encoding into text I asked myself this very question but never put any effort into looking it up.

Now, 25+ years later, I have some answers - thanks!

Dwedit2y ago

There is still one sort-of efficient way of embedding binary content in an HTML file. You must save the file as UTF-16. A Javscript string from a UTF-16 HTML file can contain anything except these: \0, \r, \n, \\, ", and unmatched surrogate pairs 0xD800-0xDFFF.

If you escape any disallowed character in the usual way for a string ("\0", "\r", "\n", "\\", "\"", "\uD800") then there is no decoding process, all the data in the string will be correct.

If you throw data that is compressed in there, you're unlikely to get very many zeroes, so you can just hope that there aren't too many unmatched surrogate pairs in your binary data, because those get inflated to 6 times their size.

Note that this operates on 16-bit values. In order to see a null, \r, \n, \\ and ", the most significant byte must also be zero, and in order for your data to contain a surrogate pair, you're looking at the two bytes taken together. When the data is compressed, the patterns are less likely.

382y ago

https://wikipedia.org/wiki/Binary-to-text_encoding

whoopdedo2y ago

Not listed was a clever encoding for MS-DOS files, XXBUG[1]. DOS had a rudimentary debugger and memory editor. (It even stuck around all the way to Windows XP but didn't survive the transition to 64-bit.) Because it had the ability to write to disk you could convert any file to hexadecimal bytes and sprinkle some control commands about to create a script for DEBUG.EXE. The text-encoded file could then be sent anywhere without needing to download a decoder program first.

[1] http://justsolve.archiveteam.org/wiki/XXBUG

bnjf2y ago

but uuencode makes for fun: https://github.com/bnjf/compre.sh

nibbula2y ago

Ascii85 survived by hiding in popular bloat.

quickthrower22y ago

I first met base64 in ASP.NET viewstate.

t-32y ago

Base64 is very bizarre in general. Why did they use such a weird pattern of symbols instead of a contiguous section, or at least segments ordered from low->high (on that note, ASCII is also quite strange, I'm guessing due to some backwards compatibility idiocy that seemed like it made sense at some point (or maybe changing case was super important to a lot of workloads or something, making a compelling reason to fuck over the future in favor of optimisation now))?

layer82y ago

The original specification is in RFC 989 [0] from 1987, called “Printable Encoding”, where it explains “The bits resulting from the encryption operation are encoded into characters which are universally representable at all sites, though not necessarily with the same bit patterns […] each group of 6 bits is used as an index into an array of 64 printable characters; the character referenced by the index is placed in the output string. These characters, identified in Table 1, are selected so as to be universally representable, and the set excludes characters with particular significance to SMTP (e.g., ".", "<CR>", "<LF>").”

Using the array-indexing method, the noncontiguity of the characters doesn’t matter, and the processing is also independent of the character encoding (e.g. works exactly the same way in EBCDIC).

[0] https://www.rfc-editor.org/rfc/rfc989.html#page-9

eesmith2y ago

The comments point out conversion issues with EBCDIC. You can't use ASCII characters like @ which are not in EBCDIC.

https://datatracker.ietf.org/doc/html/rfc2045#section-6.8 says:

   This subset has the important property that it is represented
   identically in all versions of ISO 646, including US-ASCII, and all
   characters in the subset are also represented identically in all
   versions of EBCDIC. Other popular encodings, such as the encoding
   used by the uuencode utility, Macintosh binhex 4.0 [RFC-1741], and
   the base85 encoding specified as part of Level 2 PostScript, do not
   share these properties, and thus do not fulfill the portability
   requirements a binary transport encoding for mail must meet.

If you want to learn why ASCII is the way it is, try "The Evolution of Character Codes, 1874-1968" at https://archive.org/details/enf-ascii/mode/2up by Eric Fischer (an HN'er). My reading is contiguous A-Z was meant for better compatibility with 6-bit use.

mikecoles2y ago

I thought the ASCII upper-case <-> lower-case being a bit operation as being clever.

3 more replies

jibal2y ago

Base64 and ASCII both made perfect sense in terms of their requirements, and the future, while not fully anticipated at the time, is doing just fine, with ASCII being now incorporated into largely future-proof UTF-8.

Considerably stranger in regard to contiguity was EBCDIC, but it too made sense in terms of its technological requirements, which centered around Hollerith punch cards. https://en.wikipedia.org/wiki/EBCDIC

There are numerous other examples where a lack of knowledge of the technological landscape of the past leads some people to project unwarranted assumptions of incompetence onto the engineers who lived under those constraints.

(Hmmm ... perhaps I should have read this person's profile before commenting.)

jibal2y ago

P.S. He absolutely did attack the competence of past engineers. And "questioning" backwards compatibility with ASCII is even worse ... there was no point in time when a conversion would not have been an impossible barrier.

And the performance claims are absurd, e.g.,

"A simple and extremely common int->hex string conversion takes twice as many instructions as it would if ASCII was optimized for computability."

WHICH conversion, uppercase hex or lowercase hex? You can't have both. And it's ridiculous to think that the character set encoding should have been optimized for either one or that it would have made a measurable net difference if it had been. And instruction counts don't determine speed on modern hardware. And if this were such a big deal, the conversion could be microcoded. But it's not--there's no critical path with significant amounts of binary to ASCII hex conversion.

"There are also inconsistencies like front and back braces/(angle)brackets/parens not being convertible like the alphabet is."

That is not a usable conversion. Anyone who has actually written parsers knows that the encodings of these characters is not relevant ... nothing would have been saved in parsing "loops". Notably, programming language parsers consume tokens produced by the lexer, and the lexer processes each punctuation character separately. Anything that could be gained by grouping punctuation encodings can be done via the lexer's mapping from ASCII to token values. (I have actually done this to reduce the size of bit masks that determine whether any member of a set of tokens has been encountered. I've even, in my weaker moments, hacked the encodings so that <>, {}, [], and () are paired--but this is pointless premature optimization.)

Again, this fellow's profile is accurate.

2 more replies

t-32y ago

I never questioned the competence of past engineers, I question the use of backwards compatibility.

Hardware has advanced, but software depends on standards and conventions formulated for far less capable hardware, and that's a problem.

The efficiency of string processing/generation is hugely important in terms of global energy consumption.

A simple and extremely common int->hex string conversion takes twice as many instructions as it would if ASCII was optimized for computability.

Bounds-checking for the English alphabet requires either an upfront normalization or twice the checking, so 50-100% more instructions for that.

There are also inconsistencies like front and back braces/(angle)brackets/parens not being convertible like the alphabet is.

[({< <-> >})] would have been just as or more useful than the alphabet being convertible and saved a few instructions in common parsing loops.

1 more reply

gumby2y ago

Look at ASCII mapped out with four bits across and four bits down and the logic may suddenly snap into place. Also remember that it was implemented by mechanical printing terminals.

pravus2y ago

> I'm guessing due to some backwards compatibility idiocy that seemed like it made sense at some point ... > ... making a compelling reason to fuck over the future in favor of optimisation now

> I never questioned the competence of past engineers

False just based on your opening volley of toxic spew. Backwards compatibility is an engineering decision and it was made by very competent people to interoperate with a large number of systems. The future has never been fucked over.

You seem to not understand how ASCII is encoded. It is primarily based on bit-groups where the numeric ranges for character groupings can be easily determined using very simple (and fast) bit-wise operations. All of the basic C functions to test single-byte characters such as `isalpha()`, `isdigit()`, `islower()`, `isupper()`, etc. use this fact. You can then optimize these into grouped instructions and pipeline them. Pull up `man ascii` and pay attention to the hex encodings at the start of all the major symbol groups. This is still useful today!

No, the biggest fuckage of the internet age has been Unicode which absolutely destroys this mapping. We no longer have any semblance of a 1:1 translation between any set of input bytes and any other set of character attributes. And this is just required to get simple language idioms correct. The best you can do is use bit-groupings to determine encoding errors (ala UTF-8) or stick with a larger translation table that includes surrogates (UTF-16, UTF-32, etc). They will all suffer the same "performance" problem called the "real world".

aap_2y ago

What do you find strange about ASCII?

j / k navigate · click thread line to collapse

103 comments

ekidd2y ago

Base64, on the other hand, was carefully designed to survive everything from whitespace corruption to being passed through non-ASCII character sets. And then it became widely used as part of MIME.

jjeaff2y ago

eastbound2y ago

So there are 7 base64 encodings, one with “+ / =“, one with “- _ =“, one with “+,” and no “=“… https://en.wikipedia.org/wiki/Base64#Variants_summary_table

2 more replies

OskarS2y ago

1 more reply

JoshTriplett2y ago

Ditto the obnoxious "quoted-printable" mail encoding, which turns every = into =3D.

Still more robust than uuencode though.

1 more reply

onetimeuse923042y ago

I am using base62 for data that can be included in URIs.

afiori2y ago

all three symbols are some of the worst possible choices for compatibility with urls and many other things

.-_ would have been a better choice tha +/=

1 more reply

Vt71fcAqt72y ago

And now we can have whitespace in url queries but we are still using %20 everywhere because "that's standard"...

4 more replies

candiddevmike2y ago

...only for base64 to become fragmented into standard or URL due to character choice, and padded or not padded as a "cool trick to save some bytes".

ggm2y ago

There was also an extended period of time where people did uux much as they did shar: both of which are inviting somebody else's hands into your execution state and filestore.

eesmith2y ago

Here are Usenet comments from the the 1994 comp.mail.mime thread "Q: why base64 ,not UUencode?"

1) https://www.usenetarchives.com/view.php?id=comp.mail.mime&mi...

In a followup (same link):

> "The only character translation problem I have encountered is that the back-quote (`) does not make it through all mailers and becomes a space ( )."

A followup from that at https://www.usenetarchives.com/view.php?id=comp.mail.mime&mi... says:

> The "no standards for uuencode" wasn't really a problem. If we had wanted to use uuencode, we would have documented the format in the MIME RFC.

mjevans2y ago

wkat42422y ago

Yenc is still used a lot actually, for the purpose of what Usenet has de facto become, a piracy network :)

u801e2y ago

It's too bad yenc didn't take the place of base64 for email.

1 more reply

LukeShu2y ago

> We were also obsessed with efficiency. base64 was "sold" as denser encoding. I can't say if it was true overall

uuencode has file headers/footers, like MIME. But the actual content encoding is basically base64 with a different alphabet; both add precisely 1/3 overhead (plus up to 2 padding bytes at the end).

cratermoon2y ago

uuencode has some additional overhead, namely 2 additional bytes per line, that means it varies from 60-70%, the latter being best case, while base64 is 75% efficient in all cases.

DaiPlusPlus2y ago

Can anyone explain why BinHex remained "popular" in online Mac communities through to the early 2000s? Why couldn't Macs download "real" binary files back then?

ksherlock2y ago

1 more reply

bradknowles2y ago

LeoPanthera2y ago

So a common hack was to binhex the .sit file. Binhex was originally designed to make files 7-bit clean, but had the side effect that it bundled the resource fork and the data fork together.

Later versions of StuffIt could open .sit files which lacked the resource fork just fine, but by then .zip was starting to become more common.

amatecha2y ago

2 more replies

tornato72y ago

vbezhenar2y ago

There are plenty of portable windows applications (distributed as a zipped directory) and there are plenty of pkg macOS installers.

2 more replies

steve19772y ago

If the installer on Windows is properly done, you actually know exactly what it does to your system (including registry modifications). This includes the ability to remove the application completely.

Clamchop2y ago

There are plenty of inscrutable installers for macOS software. DRM-riddled bullshit and enterprise crapware are a disease.

But yeah, the simple case is quite nice.

demondemidi2y ago

1 more reply

Aardwolf2y ago

A thing I wonder: why is using = padding required in the most common base64 variant?

It's redundant since this info can be fully inferred from the length of the stream.

Even for concatenations it is not necessary to require it, since you must still know the length of each sub stream (and = does not always appear so is not a separator).

It could only make sense if it's somehow required to read 4 bytes at once, and you can't possibly read less, but what platform is such?

4gotunameagain2y ago

from Wikipedia:

  The padding character is not essential for decoding, since the number of missing bytes can be inferred from the length of the encoded text. In some implementations, the padding character is mandatory, while for others it is not used. An exception in which padding characters are required is when multiple Base64 encoded files have been concatenated.

vbezhenar2y ago

IMO padding is not necessary and just a relic of old implementations.

Aardwolf2y ago

teo_zero2y ago

Without padding, how would you encode, for example, a message with just a single zero? To be more precise, how do you distinguish it from two zeroes and three zeroes?

Aardwolf2y ago

Both for encoding and decoding the padding is not needed. Without ='s, you get a uniquely different base64 encoding for NULL, 2 NULLs and 3 NULLs.

This shows the binary, base64 without padding and base64 with padding:

NULL --> AA --> AA==

NULL NULL --> AAA --> AAA=

NULL NULL NULL --> AAAA --> AAAA

1 more reply

4gotunameagain2y ago

Refer to the "examples" section of the wikipedia page

IshKebab2y ago

Perhaps to simplify implementations that read multiple characters at a time?

But I think it's likely just poor design taste.

remram2y ago

> Even for concatenations it is not necessary to require it, since you must still know the length of each sub stream

I'm not sure I understand this part. You can decode aGVsbG8=IHdvcmxk, what do you need to know?

Aardwolf2y ago

I only mentioned the concatenation because Wikipedia claims this use case requires padding while in reality it doesn't.

2 more replies

smudgy2y ago

You know, when I first got into binary encoding into text I asked myself this very question but never put any effort into looking it up.

Now, 25+ years later, I have some answers - thanks!

Dwedit2y ago

If you escape any disallowed character in the usual way for a string ("\0", "\r", "\n", "\\", "\"", "\uD800") then there is no decoding process, all the data in the string will be correct.

382y ago

https://wikipedia.org/wiki/Binary-to-text_encoding

whoopdedo2y ago

[1] http://justsolve.archiveteam.org/wiki/XXBUG

bnjf2y ago

but uuencode makes for fun: https://github.com/bnjf/compre.sh

nibbula2y ago

Ascii85 survived by hiding in popular bloat.

quickthrower22y ago

I first met base64 in ASP.NET viewstate.

t-32y ago

layer82y ago

Using the array-indexing method, the noncontiguity of the characters doesn’t matter, and the processing is also independent of the character encoding (e.g. works exactly the same way in EBCDIC).

[0] https://www.rfc-editor.org/rfc/rfc989.html#page-9

eesmith2y ago

The comments point out conversion issues with EBCDIC. You can't use ASCII characters like @ which are not in EBCDIC.

https://datatracker.ietf.org/doc/html/rfc2045#section-6.8 says:

   This subset has the important property that it is represented
   identically in all versions of ISO 646, including US-ASCII, and all
   characters in the subset are also represented identically in all
   versions of EBCDIC. Other popular encodings, such as the encoding
   used by the uuencode utility, Macintosh binhex 4.0 [RFC-1741], and
   the base85 encoding specified as part of Level 2 PostScript, do not
   share these properties, and thus do not fulfill the portability
   requirements a binary transport encoding for mail must meet.

mikecoles2y ago

I thought the ASCII upper-case <-> lower-case being a bit operation as being clever.

3 more replies

jibal2y ago

(Hmmm ... perhaps I should have read this person's profile before commenting.)

jibal2y ago

And the performance claims are absurd, e.g.,

"A simple and extremely common int->hex string conversion takes twice as many instructions as it would if ASCII was optimized for computability."

"There are also inconsistencies like front and back braces/(angle)brackets/parens not being convertible like the alphabet is."

Again, this fellow's profile is accurate.

2 more replies

t-32y ago

I never questioned the competence of past engineers, I question the use of backwards compatibility.

Hardware has advanced, but software depends on standards and conventions formulated for far less capable hardware, and that's a problem.

The efficiency of string processing/generation is hugely important in terms of global energy consumption.

A simple and extremely common int->hex string conversion takes twice as many instructions as it would if ASCII was optimized for computability.

Bounds-checking for the English alphabet requires either an upfront normalization or twice the checking, so 50-100% more instructions for that.

There are also inconsistencies like front and back braces/(angle)brackets/parens not being convertible like the alphabet is.

[({< <-> >})] would have been just as or more useful than the alphabet being convertible and saved a few instructions in common parsing loops.

1 more reply

gumby2y ago

Look at ASCII mapped out with four bits across and four bits down and the logic may suddenly snap into place. Also remember that it was implemented by mechanical printing terminals.

pravus2y ago

> I'm guessing due to some backwards compatibility idiocy that seemed like it made sense at some point ... > ... making a compelling reason to fuck over the future in favor of optimisation now

> I never questioned the competence of past engineers

aap_2y ago

What do you find strange about ASCII?

j / k navigate · click thread line to collapse