Is the space between the key and the type necessary? If not, how to distinguish between objects and types?
Does the validation offer some form of unions or mutual exclusion?
[0]: https://hitchdev.com/strictyaml/why/implicit-typing-removed/
Every time I read about new formats, they seem to get either the 1-n relations or the n-n relations implemented well, but not both. I guess that's what's so hard about map/reduce...
Regarding YAML: somebody on HN mentioned his project DIXY a couple years ago, and it's much much _much_ easier to parse than YAML. [1] I'm using this over YAML pretty much everywhere now.
TOML is better, but it still has more gotchas that necessary. So much I find it easier to just edit a python file
I'm thinking of giving a try to cue. Any feedback ?
It's also [string:dictionary] and [string:?] where ? means nil. White space matters, and tab is fixed at 4 spaces wide. When creating text from a dictionary it adds "# Dixy 1.0\n\n" which means loading and saving will change the file every time! Not sure what other issues there are, but I noticed this line:
// TODO: if key is numeric, parse as Array
It does look simple though. It'd be nice if someone made strict rules and addressed the corner cases.Agreed. YAML does have some use cases. I find it useful when I want to manually write lots of JSON data for test scripts. But the format, because it tries to be concise, ends up to be hard to manually parse.
I don't consider YAML a good serialisation format.
If a compact columnar representation is what you're after to avoid having to repeat every field name in an array of objects (which CSV is good for) but you don't want to give up the ability to include metadata in your JSON, there are a ton of different ways for structure your document to solve this issue without inventing new document formats.
Also this example is unclear (possibly ambiguous?); how is "int" as a type for the "age" column distinguished from "street", "city", etc as what I assume are field names?
Plus, as I wrote elsewhere, gzipping your JSON will result in essentially "avoiding having to repeat every field name" by dictionary coding it. The only case in which that wouldn't be true is when dealing with extremely unusual and heteromorphic data, but then this format doesn't seem to support such data at all.
I'm also mystified that the author claims this is readable. It looks eminently unreadable compared with JSON, if you have anything beyond one row of very simple data with all optional fields present. And, in that case, it's basically just 'JSON with the keys on a different row'.
(Congrats to the author, but this is more of a fun personal project rather than something to seriously present as a 'JSON killer'. If you do present it as a JSON killer, then you have to expect a rigorous review.)
Gzipping indeed helps in getting mostly back the space taken by the field names, but a parser will still have to parse these strings. On a large document, this might have a performance impact.
One good side of having the field names however is that one can reorder them adlib.
age:{int, min:20},
address: {street, city, state}
Alternatively, there may be a set of forbidden field names, including bool, int and string.Of these two, I like neither, but would opt for the latter.
I also considered that min:20 implied the previous had to be a type, but I don’t see how that’s consistent with
active?:bool
and tags?:[string]Plus JSON's exceptionally wide support means you can benefit from SIMD-assisted decoders which will absolutely blow this out of the water – and much, much more besides. I wish people would devote their time to something more useful than 'yet another competing standard'.
Edit: Sorry, I want to be clear, this is an impressive and cool personal project. I hope it's a step on an exciting journey for the person who wrote it. It just doesn't actually have enough strengths to replace JSON - which would be a tall order for any new format.
Without that you would need to either
1. store multiple not-nested (tabular, eg. csv) files and join them at the time of use.
2. denormalize all these csvs into a single big csv duplicating the same values over and over. Compression should handle this at storage time, bht you still pay the cost when reading.
3. store values by columns, not by rows, adding various RLE and dict encodings to compress repeated values in columns, making the files not human friendly
4. once you store it in columns and make it unreadable, just store it as binary instead of text. You get parquet
Json and csb are simple and for that reason they won and will stay with us no matter how hard you try to add features to it.That said I think adding a trailing comma and comments to json wouldn't be a big stretch.
The battle will be for the best columnar binary format. Parquet is the closest to a standard, but it seems to be used only as a standard for a storage. Big data systems still uncompress it and work with their own representation. The holy grail is when you get a columnar format which is good enough that big data systems use it as their underlying data representation instead of coming up with their own. I suspect such format will come from something like open sourced Snowflake, Clickhouse, Chaossearch or something like that, which has battle tested performant algorithms on them, instead of designed by committee, such as parquet.
Sadly, json's designers suffered from the same hubris as the designers of markdown and gemini, when they decided to not include a version number in the file format. So you are kind of hosed if you want to make a change like that.
Before json there was xml (ugh), but before xml there were Lisp S-expressions, which seem to have handled all these issues perfectly well 50 years ago. Yet we keep re-inventing them. Greenspun's tenth law is still with us.
The problem with Apache Arrow and Parquet is that you have two - one for storage and one for computation - but in the end you only want one for both. You want to run fast algorithms on memory mapped compressed columns. Not doing this stupid deserialization from parquet to arrow.
Parquet and arrow are designed by committee and try to accomplish too much for that matter. While that's good for some cases, my prediction is that there will exist a data processing system in the future whose file format will support that and be good enoigh for most data intensive applications. It will not be feature complete, like json, but will be good enough. Some devs from then on will complain about adding this and that feature to that format, but majority will be happy as they are now with json. Such format can only come from industry, not from a committee.
JSON Alternative – Internet Object - https://news.ycombinator.com/item?id=21220405 - Oct 2019 (12 comments)
Show HN: Internet Object – a thin, robust and schema oriented JSON alternative - https://news.ycombinator.com/item?id=20982180 - Sept 2019 (8 comments)
Unless the space after the colon is significant it seems we have to just "know" that int introduces a type definition instead of a structure.
Also
> Schema Details JSON doesn't have built-in schema support!
seems a little disingenuous. JSON provides a name for each type of value, so there is mostly no need for the schema when viewing the data. There is a JSON Schema definition.
I am the creator of the Internet Object. I have been silently working on the specs. But due to my busy schedule, I was not very active during the past couple of months. It is good to see all of you are discussing the pre-released format! However, I see many people have presumed many things in the wrong context. I want to share the draft of in-progress specs. It will probably bring in more clarity. Recently I have resumed working on this project again. If anyone would like to contribute Internet Object please join the discord channel (Just created).
Specs Draft - https://docs.internetobject.org/ Discord Channel - https://discord.gg/kZ6CD3hF
Thanks and Regards - Aamir
In the Zed project, we've been thinking about and iterating on a better data model for serialization for a few years, and have concluded that schemas kind of get in the way (e.g., the way Parquet, Avro, and JSON Schema define a schema then have a set of values that adhere to the schema). In Zed, a modern and fine-grained type system allows for a structure that is a superset of both the JSON and the relational models, where a schema is simply a special case of the type system (i.e., a named record type).
If you're interested, you can check out the Zed formats here... https://github.com/brimdata/zed/tree/main/docs/formats
- JSON - TOML - CSON - INI - ENO - XML
I like CSV for tabular data obviously. This looks, as others have mentioned, like CSV with better metadata.
I like INI for its simplicity. JSON is good for more complicated data, but I have to say I like CSON.
For a broader take on an alternative, there is concise encoding Concise Encoding [1][2], which I believe addresses a few more issues with existing encodings (clear spec, schema not an afterthought, native support for a variety of data structures, security, ...).
[1] https://concise-encoding.org/ [2] The author gave a presentation on it here: https://www.youtube.com/watch?v=_dIHq4GJE14
A smaller data format requires less compression time and power and you can fit more of it in memory at either end.
I still kind of like classic NeXT (and pre-XML OS X) property lists.
GNUstep seems address some of their limitations:
http://wiki.gnustep.org/index.php/Property_Lists https://everything.explained.today/Property_list/
I think Apple probably erred in switching to XML.
I'd be more interested to know about serialisation and deserialisation time.
>If you look closely, this JSON document mixes the data employees with other non-data keys (headers) such as count, currentPage, and pageSize in the same response.
But they don't explain at all how Changing the data format fixes the underlying issue of mixed concerns in one data object.
> Name , Email > Remain updated, we'll email you when it is available.
Why do this? Should i read that the format isn't ready? Is there going to be a mailing list of format enthusiast? Are you planning on releasing a V2022 next year and every year? More use-case specific derivatives?
All a format needs is 3 short examples, a language definition, and a link to an implementation.
Everything else lowers my expectation and its appeal.
> However, this time, something felt wrong; I realized that with the JSON, we were exchanging a huge amount of unnecessary information to and from the server
b) Text size really ain't an issue given that we're talking about typically just a few kb on gzipped protocols over hundreds of mbps connections. Compactness sounds like a bad argument to me.
c) "json doesn't have schema built in is a really dubious argument". If you want schemas you can still get them using json-schema, and if you don't you can still understand the message using the field names, which makes for a degraded schema ; which doesn't exist in the case of internet objects. If you don't have the schema, go figure what's in there
What really gives it to me is the comparison at the bottom between internet objects anf json; json looks better to me.
Looks like it's an idea executed over a bad premise
Human readibility is one of the most important aspects of JSON. Without that requirement you could use a binary serialization.
Edit: 50 -> 60
Everything is a trade off. So what do we get in trade for those rather large costs?
40% bandwidth savings might be worth it. But what are the gzipped comparisons?
> age:{int, min:20}
Why would a data serialization format bother with data validation like the minimum value here?Addresses are so varied in implementation and meaning that it’s frankly ridiculous.
I figured that because I need to describe the tag, it was just as easy to not use tags and describes the elements that would make one up.
Also field-names which don't contain whitespace should not need to be quoted.