Almost a perfect standard, but the prepended one byte header is a mistake IMHO. It makes it impossible to encode when the input size is unknown. Better to encode whether the last chunk is one byte or two at the end of the stream.
Please whoever is involved with this, revise the standard to not have a header and call this existing spec a beta. Otherwise, good work.
Edit: I have opened an issue for this: https://github.com/kevinAlbs/Base122/issues/3#issue-19188159...
This was true on Vax 8200 hardware back in the day. In the same software, with Huffman decoding of JPEGs it was also fastest to create a finite state machine with an 8 bit symbol size. I suspect that is no longer true since it would kill your L1 cache and be well into your L2 cache on modern x86 machines. It is probably better to take the instruction count hit and process as bits or nibbles.
Still useful for Javascript, as the bit shift operators work on 32 bit "registers".
Stick with the characters that nearly everyone assumes could legitimately come up in a document and your chances of running afoul of some "creative genius" who decided "Hey its unprintable so no one will try to print it, but when I do print it I want this thing to happen..."
[1] http://grepcode.com/file/repository.grepcode.com/java/root/j...
http://www.emergencevector.com/
It's pretty easy to write the decode for the 0-91 integer in Javascript.
if (ch == "!") {
return 57;
} else {
return ch.charCodeAt(0) - 35;
}
It doesn't give you that much usable compactness over base 64, though you can easily encode a 360 degree angle with two bits of precision lost. Also, 5 base 92 characters can fully encode 32 bits of binary data. (Of course, since base 85 can do it in 5 characters.)I'm probably going to go to typed arrays of 32 bit values. Currently, I can encode an entire ship's data in 18 bytes, of which 4 characters is a hash id.
The most efficient one is yEnc[1]. Still the simplest ones such as base64 or good old hex may actually work better once compression comes into the picture.
Besides that, the Z85 encoding is the next runner up as a compact "string safe" encoding: https://rfc.zeromq.org/spec:32/Z85/
Using an alternative to base64 encoded data for web pages and node is a horrible idea. If I came across a code base using that I would scream. A lot. None of my tools work with it, and now my browser has to run a bunch of js for something that's normally native and very fast. Its a 1-2kb savings per page that's going to make somebody jump off bridge one day.
More terrifying is that the JavaScript community is so fascinated by shiny objects that thousands of people are going to use this. I'm not sure if it's more funny or terrifying.
Anyways, this encoding is still useful if you need to pass data between legacy systems. I've used html escapes and ASCII85 before to get around annoying old stuff that doesn't use Unicode many times
It's not that I don't agree with the general idea of your post, but I disagree with this part. If nobody every tries something new we'd still be in the stone age. I think base122 is a bit silly but if it had it's use and adoption became widespread then browsers and tooling etc. would follow, just as they now support base64.
Base-122 was created with the web in mind.
And
As §3 shows, base-122 is not recommended to be used on gzip compressed pages, which is the majority of served web pages.
Occur in just a few lines from each other? I get there's more use cases like email and such but if you're going to create something for the web but it can't be used on the majority of web pages that seems like a fairly large oversight/caveat.
> Base-122 encoded strings contain characters which did not seem to play well with copy-pasting.
A very important part of web development is being able to manipulate text documents. It seems that using UTF-8 in more places can reveal cracks in implementations for browsers/DE's/editors/terminals/etc.
It's very easy to write a cache-timing-safe version of base{16,32,64} encoding for use in encoding/decoding cryptographic keys in configuration files. To wit: https://github.com/paragonie/constant_time_encoding
Base-122? Not sure if it's even possible.
https://github.com/deckar01/digit-array/blob/master/README.m...
Each byte of base64 produces 6 bits of data, so the boundary aligns at 32 bits. LCM(6,8) = (6•8)/2
Each byte of base122 produces 7 bits of data, so the byte boundary aligns at 56 bits. LCM(7,8) = (7•8)/1
Edit: Due to the variable length encoding, there is no guarantee of byte alignment.
From the posted article: "This leaves us with 122 legal one-byte UTF-8 characters to use"
Seems legit to me.
> Base-122? Not sure if it's even possible.
Parent is clearly referring to his previous statement, not that Base-122 itself is not possible.
If the goal is to reduce latency for small images, wouldn't it make it more sense to extend data URIs so the same base64 string can be referenced in multiple places?
Actually, as HTTP2 can effectively return multiple resurces in the answer of one request, do we still need embedded images for latency reduction at all?
Also, there are still cases where compression cannot be applied, e.g. if a script naively queries innerHTML. (This wouldn't affect loading time but it could inflate the page's RAM unnecessarily)
Base122 does not have that property - as far as I could tell.
Whilst we are all, basically, living in an 8-bit world; I suspect it will be sometime before people feel comfortable assuming that an 8-bit transport is viable over email.
[edit: spelling ]
Raw size: 37409
gzip(raw): 6170
gzip(base16): 7482
gzip(base64): 10675
gzip(base85): 10549
So it works better than base64, and it has the advantage of working 4 bytes at a time rather than 3.Also note that when you're feeding in binary data the gzip sizes for raw and base-xx data get a lot closer together.
https://github.com/diafygi/Offset248
Pros: simple, no ratio or ending calculation, easy copy/paste, same character length as original bytes
Cons: has to be percent encoded in urls, some fonts might not render all characters
Base122 requires UTF-8, and while that's pretty common, it's not universal, so base64 can't ever go away in favor of base122.
Compressed base64 is more efficient than base122 (with or without compression).
Conclusion: Big nope.
There is an explicit call out for "double quote == bad", but single quotes are also valid property delimeters in HTML.
Still, single quotes are somewhat asking for trouble.