Introducing TJSON: Tagged JSON with Rich Types (opens in new tab)

(tjson.org)

36 pointsalexdean9y ago52 comments

52 comments

Surprised no one here has mentioned Transit. It's an extensible typed data format with both JSON and binary representations. In other words, you can configure custom types, such as Immutable data structures, and they'll be automatically serialized and restored for you.

Good intro: http://cognitect.github.io/transit-tour/

GitHub: https://github.com/cognitect/transit-js

Note that in the introduction they provide a simple benchmark where Transit is both more compact and faster to parse than JSON with custom hydration.

keithwhor9y ago

Biggest complaint:

Unreadable format, as mentioned in this thread.

{"key:A<A<s>>":[["values"],["here"]]}

This doesn't mean anything to me as a developer, unless I've seen the spec. It's kludgy. It's not reverse-compatible if you don't install a TJSON parser.

Two solutions immediately strike me as better, one has been mentioned here.

(1) Not optimal, but actually spell out words in key names. There's no reason "A" has to mean Array. That doesn't mean anything to me. If I'm seeing it for the first time and have no idea what TJSON is, the very next value could be "key2:B<B<t>>".

(2) Far more optimal: as an example has been provided with "date", just nest objects as values for any extended types. Then this spec is completely reverse compatible and compliant, and as a developer I don't have to worry about parsing key names.

e.g.

  {
    "some_nested_array": {
      "type": "array.array.string",
      "value": [
        ["values"],
        ["here"]
      ]
    }
  }

Extremely easy to implement and not reliant on a governing body.

kafkaesq9y ago

And that was the beauty of JSON: There was no "format", per se. And you certainly didn't need to think or care about some friggen "spec". All you had to do was sort of take a look at it, say "OK", and get moving with the business of writing your application.

tobltobs9y ago

Dear JS Hipsters, even if you all suffer from NIHS, could you please take look at XML before you invent another format. I am sure you will get used to those square brackets.

bascule9y ago

Hi, TJSON creator here.

I have certainly studied XML and think XML Schema did fantastic work specifying datatypes:

https://www.w3.org/TR/xmlschema11-2/#built-in-datatypes

I briefly considered adopting this work wholesale:

https://github.com/tjson/tjson-spec/issues/37

If you'd like to see that happen, please make a note of it in the issue. Thanks!

Also note: I'm not a JS hipster, I'm part of the Rust Evangelism Strike Force.

pvg9y ago

I'm curious what drove decisions like no top-level arrays and strict conditions on set members. It's not mentioned explicitly in the spec but if the object syntax is the same as JSON, multiple field names would be allowed in that case.

bascule9y ago

TJSON requires arrays be homogenous, and this is presently accomplished by specifying the types of arrays up-front in object member names and rejecting the array if any of the contents don't adhere to the type signature.

With toplevel arrays, in absence of this type information being explicitly specified in an object, implementations would have to rely on detecting homogeneity at decode time.

This is certainly possible, and in fact the serialization logic does it. But it seems like a sharp edge to include in deserialization logic in a security-oriented format. The format aims to keep the deserialization logic free of any sort of "guesswork".

1 more reply

seagreen9y ago

First: XML is a markup language for encoding documents, JSON is a data-interchange language. Each can be twisted to do the job of the other, but they don't naturally do the same job.

Second, XML is extraordinarily complicated. Flipping around the XML 1.0 spec (https://www.w3.org/TR/xml/) isn't really encouraging me that all of this is there for a reason. I'd love to be proved wrong though!

In contrast, RFC 7159 is incredibly short and readable: https://tools.ietf.org/html/rfc7159. The TJSON spec isn't bad either: https://www.tjson.org/spec/. Even combining both the result is still far shorter and more clear than XML.

tootie9y ago

First, either XML or JSON are suitable for encoding documents or data interchange. Second, XML is also very _sophisticated_ and has an array of useful features that JSON developers are suddenly realizing to be pretty valuable sometimes. XSD is verbose, but it's rock solid. XPath and XInclude are also pretty awesome.

tobltobs9y ago

If you want something lightweight and readable JSON is fine. If you need a solution which covers 99% of all possible requirements there is XML. Everything in between will converge to the feature set of XML over time, if it lives long enough.

zeveb9y ago

Even better, research things like ASN.1 and canonical S-expressions. Re. the latter, here are some examples:

    {"hello-world:s": "Hello, world!"} → (hello-world "Hello, world!")
    {"hello-base-sixteen:d16": "48656c6c6f2c20776f726c6421"} → (hello-base-sixteen #48656c6c6f2c20776f726c6421#)
    {"base-sixty-four-is-default:d": "SGVsbG8sIHdvcmxkIQ"} → (base-sixty-four |SGVsbG8sIHdvcmxkIQ|)
    {"hello-signed-int:i": "42"} → (some-int 42)
    Ø → (some-big-int [bigint]|GY0+kwq94p4QRs2j4rHisQLgEN3zsFSZNJrgK+ZFcV0s1ShyMkMFOHip0oRuG7v+TAC7qmDaYSojFbZjNV5dSA==|)
    {"hello-timestamp:t": "2016-10-02T07:31:51Z"}  → (hello-timestamp [timestamp]2016-10-02T07:31:51Z)

Seriously, this is IMHO so clearly good I'm surprised more folks don't agree.

bascule9y ago

TJSON is ultimately being written in service of a credential format I'm working on (however it will be using a compact binary format isomorphic with TJSON)

The main inspiration for this format is SPKI/SDSI, which was based on S-expressions. As beautiful as you think the S-expression version may be over the (T)JSON, I personally blame the use of S-expressions as one of many reasons SPKI/SDSI failed to gain more widespread traction, and personally think something like TJSON is a lot more likely to gain traction than the second coming of S-expressions. This is, of course, a debatable point, but you won't find me working on Sexp-based formats any time soon.

ASN.1 of course has a sordid history in the credential space as well, often reviled by security experts as the source of frequent vulnerabilities, particularly problematic encodings like BER. I will admit OER is nice, but nobody uses OER and the IETF prefers things be standardized in terms of DER.

"Research things", yes been there, done that.

ungzd9y ago

This thing is designed for representing common data structures (array, set, date) in JSON. XML does not have that too. It's not validator such as XML Schema (JSON has JSON schema which is quite popular), it does not check anything.

keithwhor9y ago

If it's not a validator, then what's the value? Why would I use this?

I assume the TJSON libraries throw errors if invalid types or formats are provided --- which is good, but that makes this a validator. Developers have been representing non-standard formats in JSON for years.

Google's response to JSON's limitations was the Protocol Buffer [1], and as I understand it, it's used internally relatively extensively, but there hasn't been much adoption outside of Google. JSON is just the right mix of simple + robust for the majority of use cases.

[1] https://developers.google.com/protocol-buffers/

zepolen9y ago

XML provisions an extensible way to markup your data though - TJSON is just a hack upon JSON...and it's full of potential problems that XML has solved years ago.

laurent1234569y ago

I'm curious what would be the use case for this? JSON is a human readable/writable format, however this kind of syntax is not anymore: "{"nested-array:A<A<s>>": [["Nested"], ["Array!"]]}"

So it feels more like a machine format, but in that case why not use a more efficient one, like a binary format?

zmanian9y ago

The point of this format to push data over the wire in a format that is both semantically richer and authenticatable using techniques like object-hash. This gets us to one true unambiguous representation of the data which you need for redactable signatures and rich credentials.

bascule9y ago

Hello, I created TJSON. The answer to your question can be found in the second sentence on the page:

> TJSON documents are amenable to "content-aware hashing" where different encodings of the same data (including both TJSON and binary formats like Protocol Buffers, MessagePack, BSON, etc) can share the same content hash and therefore the same cryptographic signature.

TJSON is designed to facilitate documents that retain the same content hash when transcoded to/from binary formats.

seagreen9y ago

Could you help clarify this? My guess is that you're saying that you have some data type with (eg) strings and timestamps. When encoded to binary these are encoded differently, resulting in hash A. But if you roundtrip the data through JSON first both come back as strings, which when encoded to binary gives hash B. Am I on the right track?

laurent1234569y ago

If hashing is the main concern, wouldn't a "strict" spec for JSON do the job? eg. "all keys must be sorted", "all dates must be ISO-xxx", etc.?

bascule9y ago

You're describing canonicalization, which incorporates elements of the encoding format into the final hash, and therefore does not facilitate retaining the same content hash when transcoding to different formats.

Also, canonicalization is a bit of a mess. There are several incompatible canonicalization schemes for JSON, and even within a single one of those people have a difficult time implementing them correctly. See e.g. https://github.com/theupdateframework/tuf/issues/362

2 more replies

adambrod9y ago

It's very readable if you've used a typed language before. The <> brackets are like generics.

dolmen9y ago

JSON is a subset of JavaScript. <> brackets are not in JavaScript.

alayne9y ago

That's not technically true (JSON is a subset of JavaScript) due to some extra characters allowed in JSON http://timelessrepo.com/json-isnt-a-javascript-subset.

hajile9y ago

If you're making it unreadable with types, you might as well switch to a statically typed binary JSON format like bson or ubjson instead. You get smaller files, faster parsing, partial parsing (skip what you don't need), and (in some implementations) streaming of large files.

http://ubjson.org/

http://bsonspec.org/

escherize9y ago

Edn seems like a better solution here. Not o ly is the tagging more straightforward (wow not embedded in a string?), But you can write your own tags for custom types.

user59944619y ago

If you want JSON with type checking, use a json schema.

http://json-schema.org/examples.html

Been there for almost a decade. Already supported by all the major json libraries in all the major languages.

ungzd9y ago

This is literally hungarian notation for JSON.

dolmen9y ago

As any normal JSON document is not a valid TJSON document (and worse, some JSON document may be valid TJSON documents but TJSON imposes a different interpretation) using the "JSON" suffix is just misleading.

pfooti9y ago

literally the only use case I see here is dates. Like everything else I can infer the type of the field based on its contents. "boolean": false, no kidding. "event_ts": 1223349483, is that an index number or milliseconds since epoch or what? Well, probably ms since epoch, but my one gripe about json is that there's no good way to push dates without domain knowledge (anything whose property name ends in _at or _ts gets converted? all numbers in a certain range get converted?)

geezerjay9y ago

> literally the only use case I see here is dates.

{ "date": "1937-01-01T12:00:27.87+00:20" }

As you can see, JSON doesn't stop anyone from using RFC3339 to encode dates.

pfooti9y ago

Sure, and then on the other end of the connection you need to say: newThing.date = new Date(newThing.date), or else it'll deserialize as a string.

What I'm getting as is that a date gets serialized into JSON as either a string or a number, depending on who wrote the toJSON method, and that the consumer of that JSON needs knowledge about the schema of the data in order to properly deserialize it.

geezerjay9y ago

> Sure, and then on the other end of the connection you need to say: newThing.date = new Date(newThing.date), or else it'll deserialize as a string.

Only if the software running on the other end does not support your data format.

Meanwhile, HTML uses string attributes to declare languages, and no one ever complained that browsers may interpret the lang tag as a string.

1 more reply

dolmen9y ago

With TJSON the receiver will have to know about TJSON deserialization. So how is it an improvement?

dep_b9y ago

Backward compatibility seems terrible to me. A regular JSON parser will produce garbage from this since variable names are changed while an XML parser parsing without any context how to parse specific fields will still provide correct data. Your dates might remain strings for example but the string is still correct.

TazeTSchnitzel9y ago

I question the value of the tags where they state the obvious. Does this

  {"foo:O":{}}

really tell you more than

  {"foo":{}}

The ability to encode sets, integers, binary data and time stamps is useful. But why tag things which are what they look like? It's a waste of space.

bascule9y ago

Domain separation. Unless everything is tagged, an attacker can trick the parser into misinterpreting the type of an object.

Or, a more mundane explanation: the parser will silently clobber the name because it contains a ":"

Leaving any names untagged is ambiguous.

zwerdlds9y ago

Anyone care to give a first blush comparison vs protobuffs/json schema?

dolmen9y ago

JSON schema is a much more complete language for validation. And every types in the limited TJSON can be described with JSON Schema.

Besides that, in JSON Schema the schema is not bundled with the data. This is a feature for input validation: the receiver must know what it allows, not just what is received. This is a feature for readability (which is a great feature of JSON) as the data is not uncumbered with the schema. A receiver is free to use a schema or not. While TJSON imposes a receiver to recognize its dirty format.

So TJSON brings nothing new, except interoperability problems.

panic9y ago

Some previous discussion: https://news.ycombinator.com/item?id=12856968

keredson9y ago

Why would you define the type of something in the PARENT object?

cartercole9y ago

so you can be cute and have a reason to enforce no top level arrays with the added benefit of ajax security... but really why would you dirty your key that allows fast lookup with type information and keep yourself from using : in keys

Entangled9y ago

I prefer YAML no matter how much lipstick you put on JSON.

dolmen9y ago

A major problem with YAML is that the spec is so complex that no existing parser implements it fully. And each implementation supports a different subset. YAML is just not interoperable.

j / k navigate · click thread line to collapse

52 comments

portlander123459y ago

Good intro: http://cognitect.github.io/transit-tour/

GitHub: https://github.com/cognitect/transit-js

Note that in the introduction they provide a simple benchmark where Transit is both more compact and faster to parse than JSON with custom hydration.

keithwhor9y ago

Biggest complaint:

Unreadable format, as mentioned in this thread.

{"key:A<A<s>>":[["values"],["here"]]}

This doesn't mean anything to me as a developer, unless I've seen the spec. It's kludgy. It's not reverse-compatible if you don't install a TJSON parser.

Two solutions immediately strike me as better, one has been mentioned here.

e.g.

  {
    "some_nested_array": {
      "type": "array.array.string",
      "value": [
        ["values"],
        ["here"]
      ]
    }
  }

Extremely easy to implement and not reliant on a governing body.

kafkaesq9y ago

tobltobs9y ago

Dear JS Hipsters, even if you all suffer from NIHS, could you please take look at XML before you invent another format. I am sure you will get used to those square brackets.

bascule9y ago

Hi, TJSON creator here.

I have certainly studied XML and think XML Schema did fantastic work specifying datatypes:

https://www.w3.org/TR/xmlschema11-2/#built-in-datatypes

I briefly considered adopting this work wholesale:

https://github.com/tjson/tjson-spec/issues/37

If you'd like to see that happen, please make a note of it in the issue. Thanks!

Also note: I'm not a JS hipster, I'm part of the Rust Evangelism Strike Force.

pvg9y ago

bascule9y ago

With toplevel arrays, in absence of this type information being explicitly specified in an object, implementations would have to rely on detecting homogeneity at decode time.

1 more reply

seagreen9y ago

First: XML is a markup language for encoding documents, JSON is a data-interchange language. Each can be twisted to do the job of the other, but they don't naturally do the same job.

tootie9y ago

tobltobs9y ago

zeveb9y ago

Even better, research things like ASN.1 and canonical S-expressions. Re. the latter, here are some examples:

    {"hello-world:s": "Hello, world!"} → (hello-world "Hello, world!")
    {"hello-base-sixteen:d16": "48656c6c6f2c20776f726c6421"} → (hello-base-sixteen #48656c6c6f2c20776f726c6421#)
    {"base-sixty-four-is-default:d": "SGVsbG8sIHdvcmxkIQ"} → (base-sixty-four |SGVsbG8sIHdvcmxkIQ|)
    {"hello-signed-int:i": "42"} → (some-int 42)
    Ø → (some-big-int [bigint]|GY0+kwq94p4QRs2j4rHisQLgEN3zsFSZNJrgK+ZFcV0s1ShyMkMFOHip0oRuG7v+TAC7qmDaYSojFbZjNV5dSA==|)
    {"hello-timestamp:t": "2016-10-02T07:31:51Z"}  → (hello-timestamp [timestamp]2016-10-02T07:31:51Z)

Seriously, this is IMHO so clearly good I'm surprised more folks don't agree.

bascule9y ago

TJSON is ultimately being written in service of a credential format I'm working on (however it will be using a compact binary format isomorphic with TJSON)

"Research things", yes been there, done that.

ungzd9y ago

keithwhor9y ago

If it's not a validator, then what's the value? Why would I use this?

[1] https://developers.google.com/protocol-buffers/

zepolen9y ago

XML provisions an extensible way to markup your data though - TJSON is just a hack upon JSON...and it's full of potential problems that XML has solved years ago.

laurent1234569y ago

I'm curious what would be the use case for this? JSON is a human readable/writable format, however this kind of syntax is not anymore: "{"nested-array:A<A<s>>": [["Nested"], ["Array!"]]}"

So it feels more like a machine format, but in that case why not use a more efficient one, like a binary format?

zmanian9y ago

bascule9y ago

Hello, I created TJSON. The answer to your question can be found in the second sentence on the page:

TJSON is designed to facilitate documents that retain the same content hash when transcoded to/from binary formats.

seagreen9y ago

laurent1234569y ago

If hashing is the main concern, wouldn't a "strict" spec for JSON do the job? eg. "all keys must be sorted", "all dates must be ISO-xxx", etc.?

bascule9y ago

2 more replies

adambrod9y ago

It's very readable if you've used a typed language before. The <> brackets are like generics.

dolmen9y ago

JSON is a subset of JavaScript. <> brackets are not in JavaScript.

alayne9y ago

That's not technically true (JSON is a subset of JavaScript) due to some extra characters allowed in JSON http://timelessrepo.com/json-isnt-a-javascript-subset.

hajile9y ago

http://ubjson.org/

http://bsonspec.org/

escherize9y ago

Edn seems like a better solution here. Not o ly is the tagging more straightforward (wow not embedded in a string?), But you can write your own tags for custom types.

user59944619y ago

If you want JSON with type checking, use a json schema.

http://json-schema.org/examples.html

Been there for almost a decade. Already supported by all the major json libraries in all the major languages.

ungzd9y ago

This is literally hungarian notation for JSON.

dolmen9y ago

pfooti9y ago

geezerjay9y ago

> literally the only use case I see here is dates.

{ "date": "1937-01-01T12:00:27.87+00:20" }

As you can see, JSON doesn't stop anyone from using RFC3339 to encode dates.

pfooti9y ago

Sure, and then on the other end of the connection you need to say: newThing.date = new Date(newThing.date), or else it'll deserialize as a string.

geezerjay9y ago

> Sure, and then on the other end of the connection you need to say: newThing.date = new Date(newThing.date), or else it'll deserialize as a string.

Only if the software running on the other end does not support your data format.

Meanwhile, HTML uses string attributes to declare languages, and no one ever complained that browsers may interpret the lang tag as a string.

1 more reply

dolmen9y ago

With TJSON the receiver will have to know about TJSON deserialization. So how is it an improvement?

dep_b9y ago

TazeTSchnitzel9y ago

I question the value of the tags where they state the obvious. Does this

  {"foo:O":{}}

really tell you more than

  {"foo":{}}

The ability to encode sets, integers, binary data and time stamps is useful. But why tag things which are what they look like? It's a waste of space.

bascule9y ago

Domain separation. Unless everything is tagged, an attacker can trick the parser into misinterpreting the type of an object.

Or, a more mundane explanation: the parser will silently clobber the name because it contains a ":"

Leaving any names untagged is ambiguous.

zwerdlds9y ago

Anyone care to give a first blush comparison vs protobuffs/json schema?

dolmen9y ago

JSON schema is a much more complete language for validation. And every types in the limited TJSON can be described with JSON Schema.

So TJSON brings nothing new, except interoperability problems.

panic9y ago

Some previous discussion: https://news.ycombinator.com/item?id=12856968

keredson9y ago

Why would you define the type of something in the PARENT object?

cartercole9y ago

Entangled9y ago

I prefer YAML no matter how much lipstick you put on JSON.

dolmen9y ago

A major problem with YAML is that the spec is so complex that no existing parser implements it fully. And each implementation supports a different subset. YAML is just not interoperable.

j / k navigate · click thread line to collapse