I wrote this because I wanted a robust and ergonomic way to create ad-hoc JSON data from the command line and scripts. I wanted errors to not pass silently, not coerce data types, not put secrets into argv. I wanted to leverage shell features/patterns like process substitution, environment variables, reading/streaming from files and null-terminated data.
If you know of the jo program, jb is similar, but type-safe by default and more flexible. jo coerces types, using flags like -n to coerce to a specific type (number for -n), without failing if the input is invalid. jb encodes values as strings by default, requiring type annotations to parse & encode values as a specific type (failing if the value is invalid).
If you know jq, jb is complementary in that jq is great at transforming data already in JSON format, but it's fiddly to get non-JSON data into jq. In contrast, jb is good at getting unstructured data from arguments, environment variables and files into JSON (so that jq could use it), but jb cannot do any transformation of data, only parsing & encoding into JSON types.
I feel rather guilty about having written this in bash. It's something of a boiled frog story. I started out just wanting to encode JSON strings from a shell script, without dependencies, with the intention of piping them into jq. After a few trials I was able to encode JSON strings in bash with surprising performance, using array operations to encode multiple strings at once. It grew from there into a complete tool. I'd certainly not choose bash if I was starting from scratch now...
https://github.com/h4l/json.bash/blob/main/json.bash
You've boiled it down to a set of very elegant constructs. Respect. Thank you @h4l, this is badass.
I hope you follow up with a golang or rust implementation, that would really be something else.
p.s. I noticed the following odd behaviors with escaping delimiters (e.g. "="), is there a way to get an un-escaped equal sign as the trailing part of a key or leading part of a value?
$ docker container run --rm ghcr.io/h4l/json.bash/jb msg=Hi
{"msg":"Hi"}
$ docker container run --rm ghcr.io/h4l/json.bash/jb msg=\=Hi
{"msg=Hi":"msg=Hi"}
$ docker container run --rm ghcr.io/h4l/json.bash/jb "msg=\=Hi"
{"msg":"\\=Hi"}
$ docker container run --rm ghcr.io/h4l/json.bash/jb "msg\==\=Hi"
{"msg\\=\\":"Hi"}
$ docker container run --rm ghcr.io/h4l/json.bash/jb "msg\\==\=Hi"
{"msg\\=\\":"Hi"}
$ docker container run --rm ghcr.io/h4l/json.bash/jb "msg\\===Hi"
{"msg\\=":"Hi"}I definitely like the idea of a goland/rust implementation, there are certainly things I could improve.
So the argument syntax escapes by repeating a character rather than backslash. I chose this because with backslashes escapes it would be unclear whether a backslash was in the shell syntax or the jb syntax, and users may end up needing to double escape backslashes, which is no fun! Whereas a shell will always ignore two copies of a character like =:@.
The downside of double-escaping is that the syntax can be ambiguous, so sometimes you need to include the middle type marker to disambiguate the key from the value. But the type can be empty, so just : works:
$ jb ===msg==:==hi=
{"=msg=":"=hi="}
In the key part, the first = begins the key, the == following are an escaped =. The first = following the : marks the value, and everything after is not parsed, so =hi= is literal.When you have reserved characters in keys/values (especially if they're dynamic), it's easiest to store the values in variables and reference them with @var syntax:
$ k='=msg=' v='=hi=' jb @k@v
{"=msg=":"=hi="}Normally, detecting errors on the other end of a pipe requires care in a shell environment (e.g. retrospectively checking PIPESTATUS). I used an approach I've called Stream Poisoning. It takes advantage of the fact that control characters are never present in valid JSON. When jb fails to encode JSON, it emits a Cancel control character[1] on stdout. When jb encounters such a character in an input, it can tell the input it's reading from is truncated/erroneous. This avoids the typical problem of a pipe silently being read as an empty file.
I've got a page explaining this with some examples here: https://github.com/h4l/json.bash/blob/main/docs/stream-poiso... I can imagine using control characters in a text stream being rather controversial, but I feel it works quite well in practice.
For example `jb | jq`, where jq or a similar program discards the cancel character.
(Away from pc, unable to check right now.)
$ jb size:number=oops; echo $?
json.encode_number(): not all inputs are numbers: 'oops'
json(): Could not encode the value of argument 'size:number=oops' as a 'number' value. Read from inline value.
␘
1
If you pipe the jb error into jq, jq fails to parse the JSON (because of the Cancel ctrl char) and also errors: $ jb size:number=oops | jq
json.encode_number(): not all inputs are numbers: 'oops'
json(): Could not encode the value of argument 'size:number=oops' as a 'number' value. Read from inline value.
parse error: Invalid numeric literal at line 2, column 0
$ declare -p PIPESTATUS
declare -a PIPESTATUS=([0]="1" [1]="4")
So jq exits with status 4 here.However, I think in your case the rationale in the performance section of your Readme totally makes sense, and every single use case I can think of for this would prioritise minimal latency over increased throughput. I've seen init containers that would execute probably 100x faster with this for the exact reasons you point out. I'm quite curious as to what you would you choose instead of bash if you were starting from scratch now?
FYI Shellcheck has a couple of superficial nits that you might wanna address (happy to send a PR). And your Readme is great.
If I started from scratch now I'd use a compiled language that could produce a single static binary and start with really low latency. I'm pretty sure jo must not be tuned for startup time, if they optimised that they must be able get it way faster than bash can start and parse json.bash. I was pretty surprised that bash can startup faster!
The codebase is basically at the limit of what I'd want to do with bash, but there are features I could add if it was in a proper programming language. e.g. validating :int number types, pretty-printing output, not needing the :raw type to stream JSON input.
Thanks for the heads up on Shellcheck, I'd be happy to take a PR if you'd like to.
jb id=42 size:number=42 surname=null data:null
=> {"id":"42","size":42,"surname":"null","data":null}
I never had the need to use typed arguments in bash, but if I ever have it, this might be the syntax I'd use.In fact, I was thinking about such a syntax recently. I am writing a tool which lets you call functions in Python modules from the command line. At first, I thought I need to define the argument types on the command line. But then I decided it is more convenient to use inspection and auto-convert the values to the needed types.
The same using jo would be like this, which I find harder to type and remember:
jo -- -s id=42 -n size=42 -s surname=null data=null
{"id":"42","size":42,"surname":"","data":null}
Notice that surname comes out as the empty string though, I think this must be a bug in jo!Incidentally, this syntax shows a notation that is clearly superior to json (at least for non-nested stuff). If all you need is this, you'd be better off by avoiding json altogether.
[Rant: if json is so unergonomic that people keep inventing alternatives like this syntax and stuff like "gron" to de-jsonise their lives, maybe using json was always a bad idea, after all... I guess in a decade everybody will look at json with the same disdain as we do XML today.]
I don’t see that at all? Why is `n:number=1` superior to `{n:1}`? If anything, CLI commands are awful for anything other than strings.
A man of culture I see.
This looks really useful where you don't want to introduce another scripting VM just to spit out some JSON, i.e I have used Ruby a lot for this in the past.
I can see myself using this in container init scripts and other very low dep environments to format config files from env vars etc.
As a user, I’m fine with embedding a reasonably small VM to handle the configs; disk space is cheap. Better yet would be a compiled binary that handles it, but that feels like asking a lot of maintainers.
There’s a lot of surface area for someone to mis-quote stuff in their environment and generate unintelligible bash errors.
Or that may be just me; I hate bash in general, so maybe it’s just that bleeding over.
This is just the kind of use case I had in mind. Something I've considered is publishing a mini version with only the json.encode_string function, as that's enough to create an array of JSON-encoded strings and use a hard-coded template with printf to insert the JSON string values.
That would be a fraction of the overall json.bash file size.
$ jb dependencies:[,]=Bash,Grep
{"dependencies":["Bash","Grep"]}
One possible alternative would be to accept JSON literal snippets, like this: $ jb dependencies='["Bash", "Grep"]'
This should support all forms of nested JSON objects. You could have a rule that if an argument does NOT parse as a valid JSON value it is treated as a raw string, so this would work: $ jb foo=bar bar='"this is a well formed string"'
{"foo": "bar", "bar": "this is a well formed string"}
You could even then nest jb calls like this: $ jb foo=$(jb bar=baz)
{"foo": {"bar": "baz}} $ jb dependencies:json='["Bash","Grep"]'
{"dependencies":["Bash","Grep"]}
$ jb foo=bar bar:json='"this is a well formed string"'
{"foo":"bar","bar":"this is a well formed string"}
And then you can indeed use command substitution to nest calls: $ jb foo:json=$(jb bar=baz)
{"foo":{"bar":"baz"}}
It works even better to use process substitution, this way the shell gives jb a
file path to a file to read, and so you don't need to quote the $() to avoid whitespace breaking things: $ jb foo:json@<(jb msg=$'no need\nto quote this!')
{"foo":{"msg":"no need\nto quote this!"}}
Another option is to use jb-array to generate arrays. (jb-array is best for tuple-like arrays with varying types): $ jb dependencies:json@<(jb-array Bash Grep)
{"dependencies":["Bash","Grep"]}
And if you use it from bash as a function, you can put values into a bash array and reference it: $ source json.bash
$ dependencies=(Bash Grep)
$ json @dependencies:[]
{"dependencies":["Bash","Grep"]}This example uses the pattern of setting an out=varname when calling a json function, the encoded JSON goes into $varname variable. This pattern avoids the overhead of forking processes (e.g. subshells) when generating JSON.
Otherwise you can use the more normal approach of jb writing to stdout, and capturing the output stream.
$ abc=123 jq -cn '$ENV | {abc, othername: .abc}'
{"abc":"123","othername":"123"}
$ jq -cn --arg abc 123 '{$abc, othername: $abc}'
{"abc":"123","othername":"123"}
$ jq -cn --argjson abc 123 '{$abc, othername: $abc}'
{"abc":123,"othername":123}> jshn (JSON SHell Notation), a small utility and shell library for parsing and generating JSON data
Can you give a concrete example of when this is the sanest option?
Second is situations where you'd rather not add an additional dependency, but bash is pretty much a given. For example, CI environments, scripts in dev environments, container entrypoints. Or things that area already written in bash.
I don't advocate writing massive programs in bash, for sure it's better to turn to a proper language before things get hairy. But bash is just really ubiquitous, and most people who do any UNIX work will be able to deal with a bit of shell script.
Is this tool not an additional dependency?
> But bash is just really ubiquitous
Biggest crime of the Unix world probably.
But for when you don't want an extra dependency, awk and perl are better than bash and just about as ubiquitous. (I might dare to say more ubiquitous, since MacOS in particular ships with an ancient version of bash that can't even use this jb tool. But the versions of awk and perl it comes with are fine.)
> @{ hello = 'world' } | ConvertTo-Json
> { "hello": "world" } @{ Hello = 'world'; array = 1..10; object = @{ date = Get-Date } } | ConvertTo-Json
{
"array": [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10
],
"object": {
"date": "2024-07-03T21:07:21.6562053+02:00"
},
"Hello": "world"
}For good measure, this is how you might do the same with jb:
$ jb Hello=world array:number[]@<(seq 10) object:json@<(date=$(date -Iseconds) jb @date)
{"Hello":"world","array":[1,2,3,4,5,6,7,8,9,10],"object":{"date":"2024-07-03T19:26:36+00:00"}}
Alternatively, using the :{} object entry syntax: jb Hello=world array:number[]@<(seq 10) object:{}=date=$(date -Iseconds)
{"Hello":"world","array":[1,2,3,4,5,6,7,8,9,10],"object":{"date":"2024-07-03T19:30:26+00:00"}}Still, bash can try to keep up using json.bash. :)
$ source json.bash
$ declare -A greeting=([Hello]=World)
$ json ...@greeting:{}
{"Hello":"World"}
... is splatting the greeting associative array entries into the object created by the json call.Without the ... the greeting would be a nested object. Probably more clear with multiple entries:
$ declare -A greeting=([Hello]=World [How]="are you?")
$ json @greeting:{}
{"greeting":{"Hello":"World","How":"are you?"}}
Vs: $ json ...@greeting:{}
{"Hello":"World","How":"are you?"} $h=@{x=1; y=2}; $h + @{z=3} | ConvertTo-Json
{
"y": 2,
"z": 3,
"x": 1
}
You can even use [ordered]$h to make keys not go random place.Linux: what if everything was a file?
Soon we might have...
Mong/Os: what if everything was JSON?
YiAM/OS: YiAM/OS is ANOTHER MARKUP OPERATING SYSTEM... would come out shortly thereafter...
I like JSON and getting in the terminal is a challenge - GOOD JOB!