Amazon open-sources Ion – a binary and text interchangable, typed JSON-superset (opens in new tab)

(github.com)

359 pointsmachinagod10y ago163 comments

163 comments

To think about the difference between serialization formats, here's an analogy I hope will help.

Protocol Buffers (and I think Thrift, and maybe Avro) are sort of like C or C++: you declare your types ahead of time, and then you take some binary payload and "cast" it (parse it actually) into your predefined type. If those bytes weren't actually serialized as that type, you'll get garbage. On the plus side, the fact that you declared your types statically means that you get lots of useful compile-time checking and everything is really efficient. It's also nice because you can use the schema file (ie. .proto files) to declare your schema formally and document everything.

JSON and Ion are more like a Python/Javascript object/dict. Objects are just attribute-value bags. If you say it has field fooBar at runtime, now it does! When you parse, you don't have to know what message type you are expecting, because the key names are all encoded on the wire. On the downside, if you misspell a key name, nothing is going to warn you about it. And things aren't quite as efficient because the general representation has to be a hash map where every value is dynamically typed. On the plus side, you never have to worry about losing your schema file.

I think this is a case where "strongly typed" isn't the clearest way to think about it. It's "statically typed" vs. "dynamically typed" that is the useful distinction.

jcrites10y ago

That's a great analogy! However, I do think strongly typed vs. weakly typed has a role in thinking about this, just a different dimension than the one you're describing. Let's say we come across a JSON structure that looks like this:

  {"start": "2007-03-01"}

Is that a timestamp? Maybe! Does it support a time within the day? Perhaps I can write "2007-03-01T13:00:00" in ISO 8601 format if we're lucky. Can I supply a time zone? Who knows for sure? It's weakly typed data. The actual specification of that type of that field lives in a layer on top of JSON, if it's even specified at all. It might be "specified" only in terms of what the applications that handle it can parse and generate. I could drop that value into Excel and treat it as all sorts of different things.

Ion by comparison has a specific data type for timestamps defined in the spec [1]. The timestamp has a canonical representation in both text and binary form. For this reason, I know that "2007-02-23T20:14:33.Z" and "2007-02-23T12:14:33.079-08:00" are valid Ion timestamp text values. In this instance I would describe Ion as strongly typed and JSON as weakly typed. Or, as the Ion documentation puts it, "richly typed".

To make an analogy, weakly typed is the Excel cell that can store whatever value you put in it, or the PHP integer 1 which is considered equal to "1" (loose equality). Strongly typed is the relational database row with a column described precisely by the table schema. Weakly typed is the CSV file; strongly typed is the Ion document.

[1] http://amznlabs.github.io/ion-docs/spec.html

haberman10y ago

Ion has more data types than JSON, it's true. Ion has a timestamp type and JSON does not, so you could say it's "richer" if you want, but that just means "it has more types."

However I don't think it's accurate to say that the typing of Ion is any "stronger." Both Ion and JSON are fully dynamically typed, which means that types are attached to every value on the wire. It's just that without an actual timestamp type in JSON, you have to encode timestamp data into a more generic type.

2 more replies

dietrichepp10y ago

That's a good description, but I'd say that we have a strongly <-> weakly typed axis and a statically <-> dynamically typed axis here. Or I might actually prefer to name the first axis poorly <-> richly typed.

            poorly typed <-------------> richly typed
    dynamic CSV, INI          JSON          YAML, Ion
    static        Bencode, ASN.1      Protobuf

What I mean by "richly typed" is that you would never read a timestamp off the wire and not know that it's a timestamp. By comparison, with CSV or INI files, you just have strings everywhere. Formats on the richly typed side have separate and explicit types for binary blobs and text, for example.

haberman10y ago

Sure, I think your "poorly typed" vs. "richly typed" axis just refers to how many built-in types it has. It's true that CSV and INI only have one type (string). And it's true that when more types are built in, you have fewer cases where you have to just stuff your data into a specially-formatted string.

1 more reply

_pmf_10y ago

> It's "statically typed" vs. "dynamically typed" that is the useful distinction.

I officially propose to use the term "accidentally typed" or "eventually typed".

sound_of_basker10y ago

!!!! My understanding went up several orders of magnitude! Thank you!!

leef10y ago

Finally! I've had to live the JSON nightmare since I left Amazon.

Some of the benefits over JSON:

* Real date type

* Real binary type - no need to base64 encode

* Real decimal type - invaluable when working with currency

* Annotations - You can tag an Ion field in a map with an annotation that says, e.g. its compression ("csv", "snappy") or its serialized type ('com.example.Foo').

* Text and binary format

* Symbol tables - this is like automated jsonpack.

* It's self-describing - meaning, unlike Avro, you don't need the schema ahead of time to read or write the data.

efaref10y ago

You could have used CBOR for many of those things (http://cbor.io/).

dgreensp10y ago

Thanks for the link!

conradev10y ago

Sounds a lot like Apple's property list format, which shares almost everything you listed in common, except for annotations and symbol tables.

Its binary format was introduced in 2002!

Edit: Property lists only support integers up to 128 bits in size and double-precision floating point numbers. On top of those, Ion also supports infinite precision decimals.

JonathonW10y ago

Plists are nifty, but the text format's XML-based, which makes it too complex and too verbose to be a general-purpose alternative to something like JSON.

(plutil "supports" a json format, but it's not capable of expressing the complete feature set of the XML or binary formats.)

1 more reply

jonhohle10y ago

Like Property Lists the binary format is TLV encoded as well. Ion has a more compact binary representation for the same data and additional types and metadata. Also, IIRC, Plist types are limited to 32-bit lengths for all data types. The binary Ion representation has no such restriction (though in practice sizes are often limited by the language implementation).

nikolay10y ago

Okay, but they did a really poor job marketing it in this release. Plus, if it's used within Amazon, why it's Java-only so far?

makoz10y ago

Amazon's mainly a Java shop, not sure if that helps you.

1 more reply

umanwizard10y ago

I'm sure there are ion bindings for every language in common use at Amazon. But a huge percentage of Amazon code is Java, so presumably this one was the best maintained and documented.

1 more reply

kyllo10y ago

Real decimal type - invaluable when working with currency

What does JavaScript do with this though, just cast it to a float?

buro910y ago

The real way is:

  "price": {
    "amount": "1500",
    "scale": 2,
    "symbol": "GBP",
  }

Currency has 3 properties, the amount, scale, and symbol.

Amount is a string, it holds a bigint. Yes, it's a string.

The value of Scale can be up to 5 but is usually 2 or 3.

Symbol is the ISO code.

Whenever I see a financial system that uses "amount": 15.00 I know that the system is ill-conceived.

2 more replies

wyc10y ago

I find that many financial technology companies opt to store currency as strings. The small overhead is typically well worth freedom from floating-point errors.

3 more replies

troyk10y ago

move the decimal and use an int (cents in US). It still blows me away that javascript has become so popular on the server without 64bit int types.

justin_oaks10y ago

Do you really need those extra 11 bits? Javascript numbers accurately represent integers up to 2^53 - 1. See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

2 more replies

leef10y ago

Ion has functions to turn Ion into JSON which will, of course, lose information. Annotations are dropped, decimals turn into a JSON number type which may lose precision, etc.

hartror10y ago

Does it support comments?

leef10y ago

Yes - http://amznlabs.github.io/ion-docs/spec.html

Pamar10y ago

> * Real decimal type - invaluable when working with currency

I believe that the proper way to handle money is to use Integer values plus a pre-defined precision.

kazinator10y ago

I Consider this Harmful (TM) and will oppose the adoption in every organization where I have an opportunity to voice such. (In its present form, to be clear!)

There is no need to have a null which is fragmented into null.timestamp, null.string and whatever. It will complicate processing. Just because you know the type of some element is timestamp, you must worry whether or not it is null and what that means.

There should be just one null value, which is its own type. A given datum is either permitted to be null OR something else like a string. Or it isn't; it is expected to be a string, which is distinct from the null value; no string is a null value.

It's good to have a read notation for a timestamp, but it's not an elementary type; a timestamp is clearly an aggregate and should be understood as corresponding to some structure type. A timestamp should be expressible using that structure, not only as a special token.

This monstrosity is not exhibiting good typing; it is not good static typing, and not good dynamic typing either. Under static typing we can have some "maybe" type instead of null.string: in some representations we definitely have a string. In some other places we have a "maybe string", a derived type which gives us the possibility that a string is there, or isn't. Under dynamic typing, we can superimpose objects of different type in the same places; we don't need a null version of string since we can have "the" one and only null object there.

This looks like it was invented by people who live and breathe Java and do not know any other way of structuring data. Java uses statically typed references to dynamic objects, and each such reference type has a null in its domain so that "object not there" can be represented. But just because you're working on a reference implementation in such a language doesn't mean you cannot transcend the semantics of the implementation language. If you want to propose some broad interoperability standard, you practically must.

jonhohle10y ago

> It will complicate processing.

In practice, it doesn't. If you want to know if an IonValue is null, ask it with #isNull. If you don't care about the null's type, ignore it. On the other hand, the type is an additional form of metadata which allows overloading the meaning of a value.

nulls can also be annotated, so Ion doesn't really have the concept of a singular shared null sentinel.

More so than JSON, Ion often uses nulls to differentiate presence from value (that is, the lack of a field in a struct has a different meaning the presence of that field with a null value). Since nulls are objects, they can be tested separately from the lack of a field definition.

> a timestamp is clearly an aggregate and should be understood as corresponding to some structure type.

Timestamps are structured types with a literal representation that is explicitly modeled in the specification. You're free to ignore it and use a custom schema for representing time, but you've moved any validation into your application at that point and are no better off than JSON.

aphexairlines10y ago

I think the concern is that if you take an IonValue and cast it to IonText, then call stringValue(), you'll get an exception somewhere if the document contained a null value.

It recalls the nullability arguments between the ML family and the C/Java family.

kazinator is asking for safer document semantics and a type-safe API.

jasonpeacock10y ago

Actually, ION was originally developed for C++ and took quite a while to gain Java support.

Source: I was there.

tjonker10y ago

Sorry, this isn't true. The Java and C++ implementations were developed simultaneously, by different authors. The Java side has always been more full-featured, and has always had a larger user base by at least an order of magnitude. Today IonJava is among the most widely-consumed libraries within Amazon.

Source: I am one of the primary authors.

umanwizard10y ago

Ion was invented ten years ago according to someone else on this thread. I believe Amazon was mostly a perl and C++ shop at that time.

wyc10y ago

This reminds me a lot of Avro:

https://avro.apache.org/docs/current/

They both have self-describing schemas, support for binary values, JSON-interoperability, basic type systems (Ion seems to support a few more field types), field annotations, support for schema evolution, code generation not necessary, etc.

I think Avro has the additional advantages of being production-tested in many different companies, a fully-JSON schema, support for many languages, RPC baked into the spec, and solid performance numbers found across the web.

I can't really see why I'd prefer Ion. It looks like an excellent piece of software with plenty of tests, no doubt, but I think I could do without "clobs", "sexprs", and "symbols" at this level of representation, and it might actually be better if I do. Am I missing something?

jcrites10y ago

What do you mean by they both have self-describing schemas? In order to read or write Avro data, an application needs to possess a schema for that data -- the specific schema that the data was written with, and (when writing) the same schema that a later reader expects to find. This means the data is not self-describing.

Ion is designed to be self-describing, meaning that no schema is necessary to deserialize and interact with Ion structures. It's consequently possible to interact with Ion in a dynamic and reflective way, for example, in the same way that you can with JSON and XML. It's possible to write a pretty-printer for a binary Ion structure coming off the wire without having any idea of or schema for what's inside. Ion's advantage over those formats is that it's strongly typed (or richly typed, if you prefer). For example, Ion has types for timestamps, arbitrary-precision decimals like for currency, and can embed binary data directly (without base64 encoding), etc.

I wouldn't try to say that one or the other is better across the board. Rather, they have tradeoffs and relative strengths in different circumstances. Ion is in part designed to tackle scenarios like where your data might live a really long time, and needs to be comprehensible decades from now (whether you kept track of the schema or not, or remember which one it was); and needs to be comprehensible in a large distributed environment where not every application might possess the latest schema or where coordinating a single compile-time schema is a challenge (maybe each app only cares about some part of the data), and so on. Ion is well-suited to long-lived, document-type data that's stored at rest and interacted with in a variety of potentially complex ways over time. Data data. In the case of a simple RPC relationship between a single client and service, where the data being exchanged is ephemeral and won't stick around, and it's easy to definitively coordinate a schema across both applications, a typical serialization framework is a fine choice.

wyc10y ago

I think it depends on what level you're referring to. If you mean record-level, then I concede that it's not self-describing. However, looking at the suggested use cases, it seems that it's "self-describing" in that you'll always be able to decode data stored according to what the documentation recommends:

"Avro data is always serialized with its schema. Files that store Avro data should always also include the schema for that data in the same file. Avro-based remote procedure call (RPC) systems must also guarantee that remote recipients of data have a copy of the schema used to write that data."

https://avro.apache.org/docs/current/spec.html#Data+Serializ...

3 more replies

umanwizard10y ago

Amazon invented Ion because yaml, Avro, etc. didn't exist at the time. Ion is actually pretty old.

The timing of open-sourcing it mystifies me a bit. Maybe Amazon is trying to become more open-source friendly, like Microsoft did?

Perhaps more likely: they're planning on making some internal APIs that use ION heavily public?

kbenson10y ago

Are we talking about a different YAML, or has Ion existed for ~15 years?

2 more replies

leef10y ago

That's not true. Avro at least existed at the time. However they wanted something self-describing to replace JSON/XML usage. Avro is better suited as a data storage format rather than a transit oriented format. Of course, both Ion and Avro can be used for either, but Avro will give you better compression on disk, but Ion is less cumbersome since it doesn't require a schema

tjonker10y ago

A big problem with Avro, BSON, and many other "binary JSON" formats is that they're not isomorphic with JSON, they have a bunch of additional stuff added on. There's Avro documents that don't have direct JSON equivalents. Which means that when a human needs to read the data, you have to transform it into some third format.

It's a core feature of Ion that the text and binary representations are isomorphic. You can take any Ion binary document and pretty-print it as an Ion text document that is exactly equivalent. You can edit that document and send it into your application, which will be guaranteed to be able to read it. Or you can take your hand-authored text data and transcode it into binary, and know that any Ion application can handle it without any extra effort.

amyjess10y ago

> A big problem with Avro, BSON, and many other "binary JSON" formats is that they're not isomorphic with JSON, they have a bunch of additional stuff added on. There's Avro documents that don't have direct JSON equivalents. Which means that when a human needs to read the data, you have to transform it into some third format.

Also, field order matters in Avro but not in JSON. That bit me pretty hard once... fortunately, I found out that Python's JSON library lets you read a JSON file into an OrderedDict instead of a plain dict, so I was able to get around it.

cm310y ago

> but I think I could do without "clobs", "sexprs", and "symbols" at this level of representation

I'm by no means a real Lisp programmer, but even I find S-Expressions more natural to write and process. And the simplicity of it allows for great editing tools too. This may be personal, but I always found JSON clunky.

jonhohle10y ago

Big congrats to Todd, Almann, Chris, Henry, and everyone else who made this happen.

Several years ago, I wouldn't have imagined this possible and I'm a little bummed that I left before it happened.

Like leef said above, I'm glad to have Ion as an option again.

machinagodOP10y ago

Hear, hear! Posted it as I've been raving about ion (particularly s-exp support) to non-amazonians, but credit goes all to them.

It's particularly interesting to see the fixes and improvements from the actual open source cleanup effort getting to (many) Internal production services.

tveita10y ago

I am curious why you "wouldn't have imagined this possible", is the reason technical or political?

serge2k10y ago

Political.

Amazon doesn't open source things, as a general rule. It can be done but it is a lot of jumping through hoops and they generally need good reasons to do it (as opposed to a lack of good reasons not to).

mjt022910y ago

I was there at the beginning of Ion back in my Amazon days. We may well use it in my current job, too!

desdiv10y ago

Interestingly enough a JSON alternative named "ION" was just posted as a Show HN[0] about three months ago.

So now not only do we have the problem of redundant and mutually incompatible protocols (cue obligatory xkcd), but that we have so many such protocols that name collision is becoming an extra problem.

[0] https://news.ycombinator.com/item?id=11027319

umanwizard10y ago

I mentioned this Ion in that thread, if anyone is interested in the ensuing discussion: https://news.ycombinator.com/item?id=11028205

drawkbox10y ago

Binary values can be stored as base64 in regular old JSON as well. Yes that is bigger but same as email/MIME binary chunks are converted to base64. Email messages and attachments are handled this way, we do this everyday. Base64 does bloat by 40%ish, so the larger content could be compressed/decompressed prior to base64 encoding it and vice versa or even encrypted/decrypted on either end in software/app layer.

No need for a new protocol when doing it that way for basic things, if you need more binary (busy messaging/real-time) there are plenty of alternatives to JSON.

I love the simplicity of JSON, so do others and it is successful so many try to attach on to that success. The success part was that it was so damn simple though, most attachments just complicate and add verbosity, echoes back to XML and SOAP wars which spawned the plain and simple JSON. Adding complexity is easy and anyone can do it, good engineers take complexity and make it simple, that is damn difficult.

jonhohle10y ago

> Binary values can be stored as base64 in regular old JSON as well

But in JSON you'd encode that Base64 as a string and the application must know that the data isn't really a string but a blob of some type of encoding. That probably means wrapping in another struct to provide that metadata. Ion provide a terse method of doing the same while maintaining data integrity:

    'image/gif'::{{ R0lGODlhAQABAIABAP8AAP///yH5BAEAAAEALAAAAAABAAEAAAICRAEAOw== }}

The 'image/gif' annotation is application specific, but all consumers know that the contents of that value are binary. In the binary Ion representation, those 43-bytes are encoded as a 45 byte value (one byte for the type marker and a second for the length in this case; as little as 47 with the annotation and a shared symbol table), making the binary representation very efficient for transferring binary data.

Since Ion is a superset of JSON, it's by definition more complex, but the complexity isn't unapproachable. Most of the engineers I worked with assumed it was JSON until coming across timestamps, annotations, or bare word symbols.

deathanatos10y ago

I can't decide if "JSON-superset" is technically accurate or not.

JSON's string literals come from JavaScript, and JavaScript only sortof has a Unicode string type. So the \u escape in both languages encodes a UTF-16 code unit, not a code point. That means in JSON, the single code point U+1f4a9 "Pile of Poo" is encoded thusly:

    "\ud83d\udca9"

JSON specifically says this, too,

   Any character may be escaped.  If the character is in the Basic
   Multilingual Plane (U+0000 through U+FFFF), then it may be
   represented as a six-character sequence: a reverse solidus, followed
   by the lowercase letter u, followed by four hexadecimal digits that
   encode the character's code point.  The hexadecimal letters A though
   F can be upper or lowercase.  So, for example, a string containing
   only a single reverse solidus character may be represented as
   "\u005C".

   [… snip …]

   To escape an extended character that is not in the Basic Multilingual
   Plane, the character is represented as a twelve-character sequence,
   encoding the UTF-16 surrogate pair.  So, for example, a string
   containing only the G clef character (U+1D11E) may be represented as
   "\uD834\uDD1E".

Now, Ion's spec says only:

   U+HHHH	\uHHHH	4-digit hexadecimal Unicode code point

But if we take it to mean code point, then if the value is a surrogate… what should happen?

Looking at the code, it looks like the above JSON will parse:

  1. Main parsing of \u here:
     https://github.com/amznlabs/ion-java/blob/1ca3cbe249848517fc6d91394bb493383d69eb61/src/software/amazon/ion/impl/IonReaderTextRawTokensX.java#L2429-L2434

  2. which is called from here, and just appended to a StringBuilder:
     https://github.com/amznlabs/ion-java/blob/1ca3cbe249848517fc6d91394bb493383d69eb61/src/software/amazon/ion/impl/IonReaderTextRawTokensX.java#L1975

My Java isn't that great though, so I'm speculating. But I'm not sure what should happen.

This is just one of those things that the first time I saw it in JSON/JS… a part of my brain melted. This is all a technicality, of course, and most JSON values should work just fine.

jrgv10y ago

> But if we take it to mean code point, then if the value is a surrogate… what should happen?

Surrogates are code points. The spec does not say what should happen if the surrogate is invalid (for example, if only the first surrogate of a surrogate pair is present), but neither does the JSON spec.

Java internally also represents non-BMP code points using surrogates. So, simply appending the surrogates to the string should yield a valid Java string if the surrogates in the input are valid.

escherize10y ago

Is there a source for benchmarks/reviews for the various ways to represent data? As far as I see it, there are a lot of them that I'd like to hear pros/cons for: json, edn + transit (my fave), yaml, google protobufs, thrift (?), as well as Ion.

And where does Ion fit here?

nitrogen10y ago

MessagePack is quite fast and the newest version has binary fields, but it lacks the rich datatypes like decimals and timestamps mentioned by another commenter. If Ion is as fast and has adequate language support, it sounds like it would be a good first choice for a new project.

Edit: There is a benchmark script that tests a few serializers and validators in Ruby in my [employer's] ClassyHash gem: https://github.com/deseretbook/classy_hash/. It would be easy to add more serializers to the benchmark: https://github.com/deseretbook/classy_hash/blob/master/bench...

jcrites10y ago

Ion's advantage is that it's both strongly-typed with a rich type system, as well as self-describing.

Data formats like JSON and XML can be somewhat self-describing, but they aren't always completely. Both tend to need to embed more complex data types as either strings with implied formats, or nested structures. (Consider: How would you represent a timestamp in JSON such that an application could unambiguously read it? An arbitrary-precision decimal? A byte array?) I'm not familiar with EDN, but it appears to be in a similar position as JSON in this regard. ProtocolBuffers, Thrift, and Avro require a schema to be defined in advance, and only work with schema-described data as serialization layers. Ion is designed to work with self-describing data that might be fairly complex, and have no compiled-ahead-of-time schema.

Ion makes it easy to pass data around with high fidelity even if intermediate systems through which the data passes understand only part of the data but not all of it. A classic weakness of traditional RPC systems is that, during an upgrade where an existing structure gains an additional field, that structure might pass through an application that doesn't know about the field yet. Thus when the structure gets deserialized and serialized again, the field is missing. The Ion structure by comparison can be passed from the wire to the application and back without that kind of loss. (Some serialization-based frameworks have solutions to this problem too.)

One downside is that its performance tends to be worse than schema-based serialization frameworks like Thrift/ProtoBuf/Avro where the payload is generally known in advance, and code can be generated that will read and deserialize it. Another downside is that it's difficult to isolate Ion-aware code from the more general purpose "business logic" in an application, due to the absence of a serialization layer producing/consuming POJOs; instead it's common to read an Ion structure from the wire and access it directly from application logic.

brandonbloom10y ago

EDN supports dates, etc, too.

However, it doesn't support blobs. I'm conflicted about this point. On one hand, small blobs can occasionally be useful to send within a larger payload. On the other hand, small blobs almost always become large blobs, and so I'd rather plan for out-of-band (preferably even content addressable) representations of blobs.

hkscfreak10y ago

> Another downside is that it's difficult to isolate Ion-aware code from the more general purpose "business logic" in an application, due to the absence of a serialization layer producing/consuming POJOs; instead it's common to read an Ion structure from the wire and access it directly from application logic.

This is indeed a common pitfall, especially since traversing Ion is slow and expensive. I've squeezed up to 30% performance gain by converting Ion data to POJOs up front and just using those.

zapov10y ago

For JVM most popular benchmark is https://github.com/eishay/jvm-serializers/wiki

eyan10y ago

Surprised nobody mentioned CBOR (http://cbor.io) yet. Aka RFC 7049 (http://tools.ietf.org/html/rfc7049).

LVB10y ago

It is referenced in the Ion docs: http://amznlabs.github.io/ion-docs/index.html

brianolson10y ago

They complain about how CBOR is a superset of JSON data types and so some CBOR values (like bignum) might not down-convert to JSON cleanly, and then in the next paragraph they talk about how Ion is a superset of JSON data types including 'arbitrary sized integers'. Bad doubletalk. Boo. (I have implemented CBOR in a couple languages and like it. Every few months we get to say, "oh look, _another_ binary JSON.")

eyan10y ago

@LVB, thanks for that. RTFM-ing made me think twice about adopting CBOR or going with Ion. I'll also mention Velocypack (https://github.com/arangodb/velocypack) while here.

vparikh10y ago

Wasn't this solved already by the BSON specification - http://bsonspec.org ? Sure this allows you a definition of types, but this could easily be done using standard JSON meta data for each field. I find BSON simpler and more elegant.

duskwuff10y ago

BSON is awful.

* It doesn't have "true" types in the sense that Ion does. It's basically just a binary serialization of JSON, with extra stuff.

* Despite being a binary format, it's actually bulkier than JSON in most situations.

* It removes any semblance of canonicity from many representations. A number, for instance, can potentially be represented by any of at least 3 types (double, int32, and int64).

* It has signed 32-bit length limits all over the place. Not that I'd want to be storing 2GB of data in a single JSON document either, but it's not even possible to do so with BSON!

* It requires redundant null bytes in unpredictable places. For instance, all strings must be stored with a trailing null byte, which is included in their length. There's also a trailing null byte at the end of a document for no reason at all.

* It is unabashedly Javascript-specific, containing types like "JavaScript code with scope" which are meaningless to other languages.

* It also contains some MongoDB-specific cruft, such as the "ObjectID" and "timestamp" types (the latter of which, despite its name, cannot actually be used to store time values).

* It contains numerous "deprecated" and "old" features (in version 1.0!) with no guidance as to how implementations should handle them.

rurban10y ago

Yes, and not only that. It also inherently insecure, while JSON is together with msgpack the only fast and secure serialization format out there. The problem is the encoding of objects and code without any checksumming, so it can be trivially tampered with, leading to very nice exploits, mostly remotely.

See e.g. https://metacpan.org/pod/Cpanel::JSON::XS#SECURITY-CONSIDERA... I need to add ion to this security matrix.

YAML does most of those and does more and can be made quite secure by limiting the allowed types to the absolute and trusted minimum, but this e.g. not implemented in the perl, only the python backend. By default YAML is extremely insecure.

There are more new readable and typed JSON variants out there. E.g. jzon-c should be faster than ion, but there are also Hjson and SJSON. See https://github.com/KarlZylinski/jzon-c

_wmd10y ago

Most of this comes from BSON also being the internal storage format for a database server. For example, at least the redundant string NULs make it possible to use C library functions without copying, the unpacked ints allow direct dereferencing, etc.

I've no clue about the trailing NUL on the record itself, perhaps a safety feature?

1 more reply

Ericson231410y ago

> Decimal maintains precision: -0. != -0.0

What? This means their "arbitrary-precision decimals" are actually isomorphic to (Rational x Natural).

alextgordon10y ago

The use of != there is very confusing but what they mean is stores a precision along each number, not that -0 != -0.0

e.g. in Python:

    >>> from decimal import Decimal as D
    >>> 2 * D("1.0")
    Decimal('2.0')

    >>> 2 * D("1.000")
    Decimal('2.000')

    >>> D("1.0") == D("1.000")
    True

Ericson231410y ago

That just means == is a "lossy" equivalence relation. I rather the precision be truely observable----every number is "infinite precision". Once can always include natural as extra field if one cares about empirical precision.

1 more reply

tjonker10y ago

The "!=" means "not the same value according to the Ion data model".

The Ion value 0.0 has one digit of precision (after the decimal point), while the value 0.00 has two. In the Ion data model, those are two distinct values, and conforming implementations must maintain the distinction.

saosebastiao10y ago

Do any of the popular message serialization formats have first class support for algebraic data types? It seems like every one I've researched has to be hacked in some way to provide for sum types.

QuercusMax10y ago

Protocol buffers support oneof, which is a union type. https://developers.google.com/protocol-buffers/docs/proto#on...

(Insert joke here about Google engineers just copying around protobufs.)

koloron10y ago

Nearly the same question was recently asked in r/haskell:

https://www.reddit.com/r/haskell/comments/4fhuw3/json_for_ad...

kevinSuttle10y ago

Would like to see a comparison to EDN. https://github.com/edn-format/edn

akavel10y ago

Can anyone share links to some examples, showcasing the differentiating features vs. json? I couldn't easily find any via the main link

userbinator10y ago

Almost every time I see yet another structured data format I'm surprised at the number of people who haven't ever heard of ASN.1, despite it forming the basis of many protocols in widespread use.

_wmd10y ago

Usual ASN.1 caveat: parsing its specifications requires money and a lot of time, implementing many of its encodings (e.g. unaligned PER) is a lifetime's work, and even the simpler ones thousands of eyes haven't managed to get right despite years of effort (see OpenSSL, NSPR, etc)

ASN.1 also has a million baroque types (VideotexString, anyone?) where most people just need "string", "small int", "big int", etc.

Some more on BER parsing hell here: https://mirage.io/blog/introducing-asn1

userbinator10y ago

Usual ASN.1 caveat: parsing its specifications requires money and a lot of time, implementing many of its encodings (e.g. unaligned PER) is a lifetime's work

...unless you're Fabrice Bellard, who apparently wrote one just because it was one of the minor obstacles on the way to writing a full LTE base station:

http://www.bellard.org/ffasn1/

rurban10y ago

We heard of it and we despise it. It's the most horrible structured data format out there in the wild. Even worse than XML, and this is quite something.

mcguire10y ago

10.5.7.34.211.3.7.9.7.3.1.4 YOUAREINTHEHELLOFTHOBJECTIDENTIFIERS.

cm310y ago

A question for frontend devs: Will H2 being binary on the wire inspire more use of binary data representations as well, with conversion to JSON only on the client? Passing around JSON or XML across a big SOA (or micro-services) architecture is a waste of cycles and doesn't have types attached for reliability and security.

voltagex_10y ago

Do you mean passing around binary between backend services and then having a binary->JSON "proxy" behind whatever is receiving AJAX requests from the client?

cm310y ago

My idea was that the client (HTML+JS) will transform the binary data into JSON or skip the conversion and process it directly. Seeing how fast JS engines have become and the amount of typed binary arrays processed in JavaScript, I believe it's a viable approach. But I'm not a frontend dev, so I can't be certain.

1 more reply

blake808610y ago

How does Ion help with schema evolution? I see it mentioned, but not described.

jonhohle10y ago

In practice there are three properties that help with schema evolution:

    1) open types - typically applications consuming Ion data 
       do  not restrict the fields included (that is, they 
       gracefully ignore, and often even pass along additional 
       fields). Schemas may grow while being backwards 
       compatible with existing software.
    2) Type annotations allow embedding schema information into 
       a datagram without the need for agreeing on special 
       fields. Datagrams may have multiple values at the top 
       level, so its possible to provide multiple 
       representations without introducing a new top-level 
       container.
    3) The only data might need to be shared between a producer 
       and consumer is a SymbolTable which may be applicable to 
       several schemas and may be shared inline if necessary.
       Otherwise, objects in a datagram are always inspectable 
       and discoverable without additional metadata.

tn1310y ago

This appears to be something in between of JSON and Protocol buffers. I wonder under what conditions Ion makes more sense than either of the JSON/PBuff.

jonhohle10y ago

One significant advantage is you can opt-in to sharing schemas - without requiring all consumers to have your schema. Like a lot of Amazon's internal data formats, Ion designed to support backwards compatible schemas as well (that is, adding additional fields does not break existing consumers).

It has isomorphic text and binary representations as part of the standard making debugging or optimized transport a config option.

The type system is significantly richer than JSON and maps well to several languages (internally Amazon uses it with C, C++, Perl, Java, Ruby, etc.).

S-Expressions.

tantalor10y ago

> without requiring all consumers to have your schema

Then how is the client supposed to handle the data? Guessing?

> backwards compatible schemas

> text and binary representations

> type system

> maps well to several languages

Protos have all these.

> S-Expressions

Okay? Is that useful?

2 more replies

soconfused110y ago

I would guess protocol buffers is obviously more useful since it has been ported to several languages already.

viraptor10y ago

So far, most of the interesting bits I see in Ion are covered in YAML (which is also JSON-superset). Most of the rest are extra types, which YAML allows you to implement. The only really missing bit is the binary encoding... but that seems unrelated to the text format itself.

This really looks like a NIH specification.

tjonker10y ago

Ion's equivalently-expressive text and binary formats is absolutely central to its design, and IMO one of its most compelling features. You don't have to choose between "human readable" or "compact and fast", you can switch between them at will. This helps Ion meet the requirements of a broader set of applications, eliminating the cost and complexity and impedance-mismatch problems you get by transforming between multiple serialization formats.

viraptor10y ago

I get that binary format is nice, but I just don't get why instead of adding binary format to an existing good text format Amazon decided to first extend a poor text format and then add binary to that.

Basically: Ion == JSON + extra features + binary format spec. But Ion ~= YAML + binary format spec. You're going to write a new serializer/deserializer in both cases anyway, but in the second one, at least you get the text part for free in almost any language available.

coldcode10y ago

Is there any other implementation besides Java? I would be using it from iOS.

voltagex_10y ago

Open question to anyone reading this: Would you use Ion if you were designing a new house-wide message queue? (e.g. broadcast messages to /Home/Lounge/Lights/ to turn on/off)

pmontra10y ago

Maybe, when Ion gets supports for most major languages. I won't touch it now because it means to go Java for every application that reads or writes to that queue. Not because of Java, but because it's only one language. It should get on par with support for the other formats listed in the comments before one should be confident to use it.

jinst8gmi10y ago

At least the JVM supports multiple languages (Scala, Clojure etc.), but if there's a spec it shouldn't be hard for anyone to add support for other languages.

tantalor10y ago

No that's overkill, just use JSON.

userbinator10y ago

Even JSON is overkill for an application like that. Pure binary would be my choice.

voltagex_10y ago

This is offtopic, but I'm looking into having JSON schemas on another Mosquitto topic so that clients can request it, kinda like SOAP's WSDL (recovering C# programmer here).

kilink10y ago

Things I dislike about Ion, having used it while at Amazon:

- IonValues are mutable by default. I saw bugs where cached IonValues were accidentally changed, which is easy to do: IonSequence.extract clears the sequence [1], adding an IonValue to a container mutates the value (!) [2], etc.

- IonValues are not thread-safe [3]. You can call makeReadOnly() to make them immutable, but then you'll be calling clone since doing anything useful (like adding it to a list) will need to mutate the value. While it says IonValues are not even thread-safe for reading, I believe this is not strictly true. There was an internal implementation that would lazily materialize values on read, but it doesn't look like it's included in the open source version.

- IonStruct can have multiple fields with the same name, which means it can't implement Map. I've never seen anyone use this (mis)feature in practice, and I don't know where it would be useful.

- Since IonStruct can't implement Map, you don't get the Java 8 default methods like forEach, getOrDefault, etc.

- IonStruct doesn't implement keySet, values, spliterator, or stream, and thus doesn't play well with the Java 8 Stream API.

- Calling get(fieldName) on an IonStruct returns null if the field isn't present. But the value might also be there and be null, so you end up having to do a null check AND call isNullValue(). I'm not convinced it's a worthwhile distinction, and would have preferred a single way of doing it. You can already call containsKey to check for the presence of a field.

- In practice most code that dealt with Ion was nearly as tedious and verbose as pulling values out of an old-school JSONObject. Every project seemed to have a slightly different IonUtils class for doing mundane things like pulling values out of structs, doing all the null checks, casting, etc. There was some kind of adapter for Jackson that would allow you to deserialize to a POJO, but it didn't seem like it was widely used.

[1] https://github.com/amznlabs/ion-java/blob/master/src/softwar...

[2] https://github.com/amznlabs/ion-java/blob/master/src/softwar...

[3] https://github.com/amznlabs/ion-java/blob/master/src/softwar...

drudru1110y ago

This is a good critique. Have you found anything better?

intrasight10y ago

I use this http://dataprotocols.org/tabular-data-package/

incepted10y ago

> <groupId>software.amazon.ion</groupId>

Why not "com.amazon.ion", like thousands of other existing packages?

dwb10y ago

They just want to use their shiny new-gTLD domain: http://amazon.software

stolsvik10y ago

Are there any object marshalling/serialization solution for Ion? (Like GSON, Jackson)

machinagodOP10y ago

It _is_ possible to adapt Jackson (with minimal effort) to use Ion, since it's very similar to Jackson's native JSON format.

swaranga10y ago

It can and has been done (an internal library exists in Amazon) with minimal effort.

voltagex_10y ago

I wonder how difficult this would be to port to C#?

breatheoften10y ago

Why this instead of clojures "transit"?

j / k navigate · click thread line to collapse

163 comments

haberman10y ago

To think about the difference between serialization formats, here's an analogy I hope will help.

I think this is a case where "strongly typed" isn't the clearest way to think about it. It's "statically typed" vs. "dynamically typed" that is the useful distinction.

jcrites10y ago

  {"start": "2007-03-01"}

[1] http://amznlabs.github.io/ion-docs/spec.html

haberman10y ago

Ion has more data types than JSON, it's true. Ion has a timestamp type and JSON does not, so you could say it's "richer" if you want, but that just means "it has more types."

2 more replies

dietrichepp10y ago

            poorly typed <-------------> richly typed
    dynamic CSV, INI          JSON          YAML, Ion
    static        Bencode, ASN.1      Protobuf

haberman10y ago

1 more reply

_pmf_10y ago

> It's "statically typed" vs. "dynamically typed" that is the useful distinction.

I officially propose to use the term "accidentally typed" or "eventually typed".

sound_of_basker10y ago

!!!! My understanding went up several orders of magnitude! Thank you!!

leef10y ago

Finally! I've had to live the JSON nightmare since I left Amazon.

Some of the benefits over JSON:

* Real date type

* Real binary type - no need to base64 encode

* Real decimal type - invaluable when working with currency

* Annotations - You can tag an Ion field in a map with an annotation that says, e.g. its compression ("csv", "snappy") or its serialized type ('com.example.Foo').

* Text and binary format

* Symbol tables - this is like automated jsonpack.

* It's self-describing - meaning, unlike Avro, you don't need the schema ahead of time to read or write the data.

efaref10y ago

You could have used CBOR for many of those things (http://cbor.io/).

dgreensp10y ago

Thanks for the link!

conradev10y ago

Sounds a lot like Apple's property list format, which shares almost everything you listed in common, except for annotations and symbol tables.

Its binary format was introduced in 2002!

Edit: Property lists only support integers up to 128 bits in size and double-precision floating point numbers. On top of those, Ion also supports infinite precision decimals.

JonathonW10y ago

Plists are nifty, but the text format's XML-based, which makes it too complex and too verbose to be a general-purpose alternative to something like JSON.

(plutil "supports" a json format, but it's not capable of expressing the complete feature set of the XML or binary formats.)

1 more reply

jonhohle10y ago

nikolay10y ago

Okay, but they did a really poor job marketing it in this release. Plus, if it's used within Amazon, why it's Java-only so far?

makoz10y ago

Amazon's mainly a Java shop, not sure if that helps you.

1 more reply

umanwizard10y ago

I'm sure there are ion bindings for every language in common use at Amazon. But a huge percentage of Amazon code is Java, so presumably this one was the best maintained and documented.

1 more reply

kyllo10y ago

Real decimal type - invaluable when working with currency

What does JavaScript do with this though, just cast it to a float?

buro910y ago

The real way is:

  "price": {
    "amount": "1500",
    "scale": 2,
    "symbol": "GBP",
  }

Currency has 3 properties, the amount, scale, and symbol.

Amount is a string, it holds a bigint. Yes, it's a string.

The value of Scale can be up to 5 but is usually 2 or 3.

Symbol is the ISO code.

Whenever I see a financial system that uses "amount": 15.00 I know that the system is ill-conceived.

2 more replies

wyc10y ago

I find that many financial technology companies opt to store currency as strings. The small overhead is typically well worth freedom from floating-point errors.

3 more replies

troyk10y ago

move the decimal and use an int (cents in US). It still blows me away that javascript has become so popular on the server without 64bit int types.

justin_oaks10y ago

Do you really need those extra 11 bits? Javascript numbers accurately represent integers up to 2^53 - 1. See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

2 more replies

leef10y ago

Ion has functions to turn Ion into JSON which will, of course, lose information. Annotations are dropped, decimals turn into a JSON number type which may lose precision, etc.

hartror10y ago

Does it support comments?

leef10y ago

Yes - http://amznlabs.github.io/ion-docs/spec.html

Pamar10y ago

> * Real decimal type - invaluable when working with currency

I believe that the proper way to handle money is to use Integer values plus a pre-defined precision.

kazinator10y ago

I Consider this Harmful (TM) and will oppose the adoption in every organization where I have an opportunity to voice such. (In its present form, to be clear!)

jonhohle10y ago

> It will complicate processing.

nulls can also be annotated, so Ion doesn't really have the concept of a singular shared null sentinel.

> a timestamp is clearly an aggregate and should be understood as corresponding to some structure type.

aphexairlines10y ago

I think the concern is that if you take an IonValue and cast it to IonText, then call stringValue(), you'll get an exception somewhere if the document contained a null value.

It recalls the nullability arguments between the ML family and the C/Java family.

kazinator is asking for safer document semantics and a type-safe API.

jasonpeacock10y ago

Actually, ION was originally developed for C++ and took quite a while to gain Java support.

Source: I was there.

tjonker10y ago

Source: I am one of the primary authors.

umanwizard10y ago

Ion was invented ten years ago according to someone else on this thread. I believe Amazon was mostly a perl and C++ shop at that time.

wyc10y ago

This reminds me a lot of Avro:

https://avro.apache.org/docs/current/

jcrites10y ago

wyc10y ago

https://avro.apache.org/docs/current/spec.html#Data+Serializ...

3 more replies

umanwizard10y ago

Amazon invented Ion because yaml, Avro, etc. didn't exist at the time. Ion is actually pretty old.

The timing of open-sourcing it mystifies me a bit. Maybe Amazon is trying to become more open-source friendly, like Microsoft did?

Perhaps more likely: they're planning on making some internal APIs that use ION heavily public?

kbenson10y ago

Are we talking about a different YAML, or has Ion existed for ~15 years?

2 more replies

leef10y ago

tjonker10y ago

amyjess10y ago

cm310y ago

> but I think I could do without "clobs", "sexprs", and "symbols" at this level of representation

jonhohle10y ago

Big congrats to Todd, Almann, Chris, Henry, and everyone else who made this happen.

Several years ago, I wouldn't have imagined this possible and I'm a little bummed that I left before it happened.

Like leef said above, I'm glad to have Ion as an option again.

machinagodOP10y ago

Hear, hear! Posted it as I've been raving about ion (particularly s-exp support) to non-amazonians, but credit goes all to them.

It's particularly interesting to see the fixes and improvements from the actual open source cleanup effort getting to (many) Internal production services.

tveita10y ago

I am curious why you "wouldn't have imagined this possible", is the reason technical or political?

serge2k10y ago

Political.

mjt022910y ago

I was there at the beginning of Ion back in my Amazon days. We may well use it in my current job, too!

desdiv10y ago

Interestingly enough a JSON alternative named "ION" was just posted as a Show HN[0] about three months ago.

[0] https://news.ycombinator.com/item?id=11027319

umanwizard10y ago

I mentioned this Ion in that thread, if anyone is interested in the ensuing discussion: https://news.ycombinator.com/item?id=11028205

drawkbox10y ago

No need for a new protocol when doing it that way for basic things, if you need more binary (busy messaging/real-time) there are plenty of alternatives to JSON.

jonhohle10y ago

> Binary values can be stored as base64 in regular old JSON as well

    'image/gif'::{{ R0lGODlhAQABAIABAP8AAP///yH5BAEAAAEALAAAAAABAAEAAAICRAEAOw== }}

deathanatos10y ago

I can't decide if "JSON-superset" is technically accurate or not.

    "\ud83d\udca9"

JSON specifically says this, too,

   Any character may be escaped.  If the character is in the Basic
   Multilingual Plane (U+0000 through U+FFFF), then it may be
   represented as a six-character sequence: a reverse solidus, followed
   by the lowercase letter u, followed by four hexadecimal digits that
   encode the character's code point.  The hexadecimal letters A though
   F can be upper or lowercase.  So, for example, a string containing
   only a single reverse solidus character may be represented as
   "\u005C".

   [… snip …]

   To escape an extended character that is not in the Basic Multilingual
   Plane, the character is represented as a twelve-character sequence,
   encoding the UTF-16 surrogate pair.  So, for example, a string
   containing only the G clef character (U+1D11E) may be represented as
   "\uD834\uDD1E".

Now, Ion's spec says only:

   U+HHHH	\uHHHH	4-digit hexadecimal Unicode code point

But if we take it to mean code point, then if the value is a surrogate… what should happen?

Looking at the code, it looks like the above JSON will parse:

  1. Main parsing of \u here:
     https://github.com/amznlabs/ion-java/blob/1ca3cbe249848517fc6d91394bb493383d69eb61/src/software/amazon/ion/impl/IonReaderTextRawTokensX.java#L2429-L2434

  2. which is called from here, and just appended to a StringBuilder:
     https://github.com/amznlabs/ion-java/blob/1ca3cbe249848517fc6d91394bb493383d69eb61/src/software/amazon/ion/impl/IonReaderTextRawTokensX.java#L1975

My Java isn't that great though, so I'm speculating. But I'm not sure what should happen.

This is just one of those things that the first time I saw it in JSON/JS… a part of my brain melted. This is all a technicality, of course, and most JSON values should work just fine.

jrgv10y ago

> But if we take it to mean code point, then if the value is a surrogate… what should happen?

Java internally also represents non-BMP code points using surrogates. So, simply appending the surrogates to the string should yield a valid Java string if the surrogates in the input are valid.

escherize10y ago

And where does Ion fit here?

nitrogen10y ago

jcrites10y ago

Ion's advantage is that it's both strongly-typed with a rich type system, as well as self-describing.

brandonbloom10y ago

EDN supports dates, etc, too.

hkscfreak10y ago

This is indeed a common pitfall, especially since traversing Ion is slow and expensive. I've squeezed up to 30% performance gain by converting Ion data to POJOs up front and just using those.

zapov10y ago

For JVM most popular benchmark is https://github.com/eishay/jvm-serializers/wiki

eyan10y ago

Surprised nobody mentioned CBOR (http://cbor.io) yet. Aka RFC 7049 (http://tools.ietf.org/html/rfc7049).

LVB10y ago

It is referenced in the Ion docs: http://amznlabs.github.io/ion-docs/index.html

brianolson10y ago

eyan10y ago

@LVB, thanks for that. RTFM-ing made me think twice about adopting CBOR or going with Ion. I'll also mention Velocypack (https://github.com/arangodb/velocypack) while here.

vparikh10y ago

duskwuff10y ago

BSON is awful.

* It doesn't have "true" types in the sense that Ion does. It's basically just a binary serialization of JSON, with extra stuff.

* Despite being a binary format, it's actually bulkier than JSON in most situations.

* It removes any semblance of canonicity from many representations. A number, for instance, can potentially be represented by any of at least 3 types (double, int32, and int64).

* It has signed 32-bit length limits all over the place. Not that I'd want to be storing 2GB of data in a single JSON document either, but it's not even possible to do so with BSON!

* It is unabashedly Javascript-specific, containing types like "JavaScript code with scope" which are meaningless to other languages.

* It also contains some MongoDB-specific cruft, such as the "ObjectID" and "timestamp" types (the latter of which, despite its name, cannot actually be used to store time values).

* It contains numerous "deprecated" and "old" features (in version 1.0!) with no guidance as to how implementations should handle them.

rurban10y ago

See e.g. https://metacpan.org/pod/Cpanel::JSON::XS#SECURITY-CONSIDERA... I need to add ion to this security matrix.

There are more new readable and typed JSON variants out there. E.g. jzon-c should be faster than ion, but there are also Hjson and SJSON. See https://github.com/KarlZylinski/jzon-c

_wmd10y ago

I've no clue about the trailing NUL on the record itself, perhaps a safety feature?

1 more reply

Ericson231410y ago

> Decimal maintains precision: -0. != -0.0

What? This means their "arbitrary-precision decimals" are actually isomorphic to (Rational x Natural).

alextgordon10y ago

The use of != there is very confusing but what they mean is stores a precision along each number, not that -0 != -0.0

e.g. in Python:

    >>> from decimal import Decimal as D
    >>> 2 * D("1.0")
    Decimal('2.0')

    >>> 2 * D("1.000")
    Decimal('2.000')

    >>> D("1.0") == D("1.000")
    True

Ericson231410y ago

1 more reply

tjonker10y ago

The "!=" means "not the same value according to the Ion data model".

saosebastiao10y ago

Do any of the popular message serialization formats have first class support for algebraic data types? It seems like every one I've researched has to be hacked in some way to provide for sum types.

QuercusMax10y ago

Protocol buffers support oneof, which is a union type. https://developers.google.com/protocol-buffers/docs/proto#on...

(Insert joke here about Google engineers just copying around protobufs.)

koloron10y ago

Nearly the same question was recently asked in r/haskell:

https://www.reddit.com/r/haskell/comments/4fhuw3/json_for_ad...

kevinSuttle10y ago

Would like to see a comparison to EDN. https://github.com/edn-format/edn

akavel10y ago

Can anyone share links to some examples, showcasing the differentiating features vs. json? I couldn't easily find any via the main link

userbinator10y ago

Almost every time I see yet another structured data format I'm surprised at the number of people who haven't ever heard of ASN.1, despite it forming the basis of many protocols in widespread use.

_wmd10y ago

ASN.1 also has a million baroque types (VideotexString, anyone?) where most people just need "string", "small int", "big int", etc.

Some more on BER parsing hell here: https://mirage.io/blog/introducing-asn1

userbinator10y ago

Usual ASN.1 caveat: parsing its specifications requires money and a lot of time, implementing many of its encodings (e.g. unaligned PER) is a lifetime's work

...unless you're Fabrice Bellard, who apparently wrote one just because it was one of the minor obstacles on the way to writing a full LTE base station:

http://www.bellard.org/ffasn1/

rurban10y ago

We heard of it and we despise it. It's the most horrible structured data format out there in the wild. Even worse than XML, and this is quite something.

mcguire10y ago

10.5.7.34.211.3.7.9.7.3.1.4 YOUAREINTHEHELLOFTHOBJECTIDENTIFIERS.

cm310y ago

voltagex_10y ago

Do you mean passing around binary between backend services and then having a binary->JSON "proxy" behind whatever is receiving AJAX requests from the client?

cm310y ago

1 more reply

blake808610y ago

How does Ion help with schema evolution? I see it mentioned, but not described.

jonhohle10y ago

In practice there are three properties that help with schema evolution:

    1) open types - typically applications consuming Ion data 
       do  not restrict the fields included (that is, they 
       gracefully ignore, and often even pass along additional 
       fields). Schemas may grow while being backwards 
       compatible with existing software.
    2) Type annotations allow embedding schema information into 
       a datagram without the need for agreeing on special 
       fields. Datagrams may have multiple values at the top 
       level, so its possible to provide multiple 
       representations without introducing a new top-level 
       container.
    3) The only data might need to be shared between a producer 
       and consumer is a SymbolTable which may be applicable to 
       several schemas and may be shared inline if necessary.
       Otherwise, objects in a datagram are always inspectable 
       and discoverable without additional metadata.

tn1310y ago

This appears to be something in between of JSON and Protocol buffers. I wonder under what conditions Ion makes more sense than either of the JSON/PBuff.

jonhohle10y ago

It has isomorphic text and binary representations as part of the standard making debugging or optimized transport a config option.

The type system is significantly richer than JSON and maps well to several languages (internally Amazon uses it with C, C++, Perl, Java, Ruby, etc.).

S-Expressions.

tantalor10y ago

> without requiring all consumers to have your schema

Then how is the client supposed to handle the data? Guessing?

> backwards compatible schemas

> text and binary representations

> type system

> maps well to several languages

Protos have all these.

> S-Expressions

Okay? Is that useful?

2 more replies

soconfused110y ago

I would guess protocol buffers is obviously more useful since it has been ported to several languages already.

viraptor10y ago

This really looks like a NIH specification.

tjonker10y ago

viraptor10y ago

coldcode10y ago

Is there any other implementation besides Java? I would be using it from iOS.

voltagex_10y ago

Open question to anyone reading this: Would you use Ion if you were designing a new house-wide message queue? (e.g. broadcast messages to /Home/Lounge/Lights/ to turn on/off)

pmontra10y ago

jinst8gmi10y ago

At least the JVM supports multiple languages (Scala, Clojure etc.), but if there's a spec it shouldn't be hard for anyone to add support for other languages.

tantalor10y ago

No that's overkill, just use JSON.

userbinator10y ago

Even JSON is overkill for an application like that. Pure binary would be my choice.

voltagex_10y ago

This is offtopic, but I'm looking into having JSON schemas on another Mosquitto topic so that clients can request it, kinda like SOAP's WSDL (recovering C# programmer here).

kilink10y ago

Things I dislike about Ion, having used it while at Amazon:

- IonStruct can have multiple fields with the same name, which means it can't implement Map. I've never seen anyone use this (mis)feature in practice, and I don't know where it would be useful.

- Since IonStruct can't implement Map, you don't get the Java 8 default methods like forEach, getOrDefault, etc.

- IonStruct doesn't implement keySet, values, spliterator, or stream, and thus doesn't play well with the Java 8 Stream API.

[1] https://github.com/amznlabs/ion-java/blob/master/src/softwar...

[2] https://github.com/amznlabs/ion-java/blob/master/src/softwar...

[3] https://github.com/amznlabs/ion-java/blob/master/src/softwar...

drudru1110y ago

This is a good critique. Have you found anything better?

intrasight10y ago

I use this http://dataprotocols.org/tabular-data-package/

incepted10y ago

> <groupId>software.amazon.ion</groupId>

Why not "com.amazon.ion", like thousands of other existing packages?

dwb10y ago

They just want to use their shiny new-gTLD domain: http://amazon.software

stolsvik10y ago

Are there any object marshalling/serialization solution for Ion? (Like GSON, Jackson)

machinagodOP10y ago

It _is_ possible to adapt Jackson (with minimal effort) to use Ion, since it's very similar to Jackson's native JSON format.

swaranga10y ago

It can and has been done (an internal library exists in Amazon) with minimal effort.

voltagex_10y ago

I wonder how difficult this would be to port to C#?

breatheoften10y ago

Why this instead of clojures "transit"?

j / k navigate · click thread line to collapse