Zq: An easier and faster alternative to jq (opens in new tab)

(brimdata.io)

462 pointsmccanne4y ago223 comments

223 comments

Hi, all. Author here. Thanks for all the great feedback.

I've learned a lot from your comments and pointers.

The Zed project is broader than "a jq alternative" and my bad for trying out this initial positioning. I do know there are a lot of people out there who find jq really confusing, but it's clear if you become an expert, my arguments don't hold water.

We've had great feedback from many of our users who are really productive with the blend of search, analytics, and data discovery in the Zed language, and who find manipulating eclectic data in the ZNG format to be really easy.

Anyway, we'll write more about these other aspects of the Zed project in the coming weeks and months, and in the meantime, if you find any of this intriguing and want to kick the tires, feel free to hop on our slack with questions/feedback or file GitHub issues if you have ideas for improvements or find bugs.

Thanks a million!

https://github.com/brimdata/zed https://www.brimdata.io/join-slack/

preferjq4y ago

"cobbled-together" jq as it often appears in the wild will often compare badly with crafted solutions because the writer's goal is usually GSD and not write pretty code.

People with the time and inclination to slow down and think a little more about how the tools work will produce cleaner solutions.

In your example to convert

    {"name":"foo","vals":[1,2,3]}

    {"name":"foo","val":1}
    {"name":"foo","val":2}
    {"name":"foo","val":3}

All you need is this jq filter

    {name:.name, val:.vals[]}

To me this is much better than the proposed zq or jq solution you're using as a basis for comparison. You could almost use the shorter

    .vals = .vals[]

if the name in the output didn't change.

These filters takes advantage of how jq's [] operator converts a single result into separate results. For people new to jq this behavior is often confusing unless they've seen things like Cartesian products.

.[] - https://stedolan.github.io/jq/manual/#Array/ObjectValueItera...

MarkMarine4y ago

counter point: I reach for jq probably twice a year. It's a slog every time, but way way less work than diving into the terse syntax and understanding the inner workings of jq. A good abstraction is the border of my understanding, a leaky abstraction means I have to have mastery of the internals to be successful. jq is a leaky abstraction.

hyperpallium24y ago

can also use name instead of name:.name

I think jq is very elegant - genius even - but whenever I use it, I have to look up the docs for syntax. But I guess that's true for any infrequently used tool.

1 more reply

1vuio0pswjnm74y ago

Thank you for your work on tcpdump, (original) bpf and the pcap library. I benefit from those projects everyday.

ZSON looks way better than JSON. I pray that the Zed project becomes more popular.

mccanneOP4y ago

Wow, thanks.

Coincidentally, after hearing of a friend's woes dealing with massive amounts of CSV coming from a BPF-instrumental kernel, I played around a bit with integrating Zed and BPF. Just an experimental toy (and the repo is already out of date)...

https://github.com/brimdata/zbpf

The nice thing about Zed here is any value can be a group-by key so it's easy, for example, to use kernel stacks (an array of strings) in a grouping aggregate.

(p.s. for the record, the only thing I have to do with the modern linux BPF system is the tiny vestige of origin story it shares with the original work I did in the BSD kernel around 1990)

rienko4y ago

Ever since my team started using Splunk (circa 2012), we claimed for a more open version we could tinker with and not cost an arm and a leg to ingest multiple terabytes of daily data.

Positioning as an opensource Splunk would be an interesting play. Going through your docs the union() function looks like it returns a set, akin to splunk values(), is there the equivalent to list()?

Elastic is great in its lane, but it requires more resources and has a monolith weight, that has left a sour taste from our internal testing. Doing a minimal ElasticSearch compatible API would open up your target audience, are there any plans to do you it in a short term horizon (< 1 year)?

mccanneOP4y ago

That's a cool idea. We've had many collaborators using Zed lakes for search at smallish scale and we are still building the breadth of features needed for a serious search platform, but I think we have a nice architecture that holds the promise to blend the best of both worlds of warehouses and search.

As for list() and values() functions, Zed has native arrays and sets so there's no need for a "multi-value" concept as in splunk. If you want to turn a set into an array, a cast will do the trick, e.g.,

echo '1 2 2 3 3' | zq 'u:=union(this) | cast(u,<[int64]>) ' -

[1,2,3]

(Note that <[int64]> is a type value that represents array of int64.)

gauravphoenix4y ago

there is Dassana[1] if someone wants to try out json native,index-free, schema-less solution built on top of ClickHouse.

ShowHN post(FAQ)[2]

disclaimer- I'm founder/CEO of Dassana.

[1] https://lake.dassana.io/

[2] https://news.ycombinator.com/item?id=31111432

noborus4y ago

I wrote about how to solve with SQL. https://noborus.github.io/blog/jqsql/

weinzierl4y ago

jq is incredibly powerful and I'm using it more and more. Even better, there is a whole ecosystem of tools that are similar or work in conjunction with jq:

* jq (a great JSON-wrangling tool)

* jc (convert various tools’ output into JSON)

* jo (create JSON objects)

* yq (like jq, but for YAML)

* fq (like jq, but for binary)

* htmlq (like jq, but for HTML)

List shamelessly stolen from Julia Evans[1]. For live links see her page.

Just a few days ago I needed to quickly extract all JWT token expiration dates from a network capture. This is what I came up with:

    fq 'grep("Authorization: Bearer.*" ) | print' server.pcap | grep -o 'ey.*$' | sort | uniq | \
    jq -R '[split(".") | select(length > 0) | .[0],.[1] | gsub("-";"+") | gsub("_";"/") | @base64d | fromjson]' | \
   jq '.[1]' | jq '.exp' | xargs -n1 -I! date '+%Y-%m-%d %H:%M:%S' -d @!

It's not a beauty but I find the fact that you can do it in one line, with proper parsing and no regex trickery, remarkable.

[1] https://jvns.ca/blog/2022/04/12/a-list-of-new-ish--command-l...

kitd4y ago

Also highly recommended is gron [0], to make json easily searchable

[0] https://github.com/TomNomNom/gron

spudlyo4y ago

Most of the time I can get what I need with gron and traditional UNIX tools, without needing to reach for jq, and without having to re-learn its somewhat arcane syntax.

zikduruqe4y ago

I came here looking for a gron recommendation. I use this very often.

toxik4y ago

Is your example not easier to write and read as a 10-something line Python script? I never understood the appeal of jq etc because of this very reason.

1 more reply

hoherd4y ago

I would definitely add dasel to that list. It's become my de facto serialized data converter, and regularly use it to convert between csv, toml, yaml, json, and xml using jq-ish syntaxes.

https://github.com/tomwright/dasel

chriswarbo4y ago

The yq tool also provides 'xq', which works on XML :)

stormbrew4y ago

tbh my biggest problem with all these tools is that I really don't want to have to learn one for each of the json-y formats I have to use every day. If jq supported toml and yaml natively I'd be much much much happier to learn its kind of obtuse syntax.

samcal4y ago

Have you checked out https://www.nushell.sh/? It seems like exactly what you're describing. Although I know of people who are happily using it as their main shell, I only really use it when I need to read and manipulate data in files.

wwader4y ago

Hi, lover of jq and author of fq here! Just wanted to mention that fq is jq so you can do things like `fq 'grep("Authorization: Bearer.*") | split("\n") | ...' file.pcap`.

Also i'm working and prototyping some kind of http decoding support that will make things like select on headers and automatic/manual decoding body possible.

chris378794y ago

I need someone to make a `Q` wrapper that amalgamates all of them. And if that's already taken by a common utility, I vote we name it deLancie, instead.

msluyter4y ago

Whenever jq comes up I feel obligated to mention 'gron'[1]. If all you're doing is trying to grep some deeply nested field, it's way easier with gron, IMHO.

[1] https://github.com/tomnomnom/gron

RulerOf4y ago

Gron and jq are complementary tools IMO. I frequently use gron to trim down large json files such that I can determine what my ultimate jq query is going to look like.

radicality4y ago

For a moment I thought that this is `glom`, which is also a tool I can recommend if you need to be doing any json processing in python (comes with a cli too). It does have a relatively steep learning curve for the advanced features, but does allow you to do interesting things like concisely write recursive parsers in the mini-dsl Glom provides.

https://glom.readthedocs.io/en/latest/

zimpenfish4y ago

Used it only this morning to find out if/where the JSON for a tweet mentioned the verification status of the poster and/or retweetee[1]. Quick and easy to dump it through `gron | grep verif` to find out the paths.

[1] "the person who was retweeted" in lieu of a better word.

psacawa4y ago

Since no one seems to know about it, jq is described in great detail on the github wiki page [0]. That flattens the learning curve a lot. It's not as arcane as it seems.

The touted claim that is fundamentally stateless is not true. jq is also stateful in the sense that it has variables. If you want, you can write regular procedural code this way. Some examples [1]

The real problem of jq is that it is currently lacking a maintainer to assess a number of PRs that have accumulated since 2018.

[0] https://github.com/stedolan/jq/wiki/jq-Language-Description

[1] https://github.com/fadado/JBOL/blob/master/fadado.github.io/...

Beltalowda4y ago

> It's not as arcane as it seems.

The issue with jq is that I use it maybe once a month, or even less. The syntax is "arcane enough" that I keep forgetting how to use it because I use it so sporadically.

In comparison awk – which I also don't use that often – has a much easier syntax that I can mostly remember.

Not entirely convinced by the zq syntax either though; it also seems "arcane enough" that I would keep forgetting it.

hiram1124y ago

Bingo.

There are at least a dozen tools and languages and syntaxes that I've used sporadically over the years - awk, sed, bash, Mongo, perl, etc. I don't use them often enough to remember exactly how they work, and so I always have to spend a few hours reviewing manuals or old code repos or an O'Reilly book.

But if I do end up using it for a few days in a row, it starts to make sense, and I improve each time I use it.

But not with jq.

It just does not make sense to my brain, no matter how many times I've had to use it. Every single time I need to use it, it requires finding some Stack Exchange or blog and just copying and pasting. Even after seeing the solution, rarely do I then really understand why or how it works. Nor can I often take that knowledge and apply it to similar problems.

About the only other syntax or language that gives me such problems is Elastic Search DSL.

1 more reply

laurent1234564y ago

I wonder if someone tried to use plain JS as a filtering language? It would be more verbose but it would be easy to remember. For example:

   [1,2,3] | js "out = 0; for (const n of this) out += n"

That would print "6". `out` would be a special variable you write to to print the result, and `this` would be the input.

5 more replies

ts00004y ago

Interesting, for me it's the exact opposite.

I've tried a couple of times to get into awk, but still find the syntax arcane.

1 more reply

taude4y ago

Same issue. However, I do successfully rely on using ctrl-r a lot to search prior invoked commands. And have a few core aliases that I've cobbled together....

1 more reply

zeroimpl4y ago

I use both awk and jq infrequently enough that I tend to struggle with anything non-trivial. I think zq would be the same.

> Not entirely convinced by the zq syntax either though; it also seems "arcane enough" that I would keep forgetting it.

I think this is the main thing. I’d prefer a streamlined CLI tool where you passed in some JS code and it’d just run it on the input (with the same slurp/raw args as jq). Could just be npm with underscore.js autoimported.

ar_lan4y ago

This is ironic - I use `awk` so infrequently, I have no idea how to use it without reading its man page or using Google. But I use `jq` often and find it simple.

j1elo4y ago

Sadly very few authors seem to acknowledge or even know that github wiki pages are not indexed by search engines so if it wasn't for third-party sites like github-wiki-see.page (which could stop working at any time) their contents would be undiscoverable by the very same people they are usually intended...

oblio4y ago

What? That's crazy! Does Github block indexing?

2 more replies

jdnier4y ago

Here's podcast interview with the creator of jq about what he's been working on at Jane Street: https://signalsandthreads.com/memory-management/

klysm4y ago

I didn't realize jq was missing a maintainer, it's one of my most used CLI tools.

ethanwillis4y ago

It really is a fundamental problem where lots of these important projects aren't maintained simply because the reality is the maintainers can't beat the economics of a lot of rich freeloaders having no real short term incentive to compensate these maintainers..

3 more replies

skybrian4y ago

In this case it doesn't seem too critical? It means jq remains stable, which is probably what should happen once a tool like this gets a lot of users.

adamgordonbell4y ago

I found it hard to approach at first, but I think it was just the lack of material that worked through simple examples step by step.

I ended up writing my own guide to it, that in my unbiased opinion makes it easier to get the point where in-depth examples and language descriptions are easier to understand.

Edit: Oh, wow, it's even mentioned in this article. Maybe I should read before commenting.

https://earthly.dev/blog/jq-select/

sfink4y ago

I discovered jq after I wrote my own (extremely limited) version of it. I need it quite often, and yet I've never managed to get up the activation energy to learn enough for it to be useful. I need to have some notion of the computation model before anything is going to make sense to me. I hate learning things in completely disparate pieces that I need to memorize in hopes that someday it will just click together and I'll derive the underlying principles.

Your guide was great for this. It stepped me through enough of the bare basics in a way that the underlying model was obvious. It didn't get me nearly far enough for many of the tasks that I need jq for, but it got me started and that's all I really needed. Everything additional that I need to learn becomes obvious in retrospect—"of course there's an operator for this, there kind of has to be!".

Thank you!

dilap4y ago

From that page:

> The jq documentation is written in a style that hides a lot of important detail because the hope is that the language feels intuitive.

Yeah, not so much boys! Also, that disclaimer should really be at the top of the manual, with a link to the wiki, rather than vice-versa, as it is now.

The wiki is like secret information -- "oh, hey, here's the page that actually tells you how it works!"

eatonphil4y ago

If jq is getting too slow for you (that's never happened for me), it really seems like it's time to put your data in a database like sqlite or duckdb at least.

Incidentally there are many tools that help you do this like dsq [0] (which I develop), q [1], textql [2], etc.

[0] https://github.com/multiprocessio/dsq

[1] https://github.com/harelba/q

[2] https://github.com/dinedal/textql

jeffbee4y ago

I don’t agree. There is a great deal of room for improvement in jq performance. I profiled one invocation and it spent the majority of its time asserting that the stack depth was lower than some amount, which is crazy. I rebuilt it with NDEBUG defined and it was seriously ten times faster, but it’s not safe to run it that way because it has asserts with side effects, which is also crazy.

Rewriting all or parts of it in C++ would make it dramatically faster. I would start by ripping out the asserts and using a different strtod which they spend an awful lot of time in.

eatonphil4y ago

Fair point! I don't mean to say jq performance can't or shouldn't be improved.

Just that jq does two things: 1) ingest and 2) query.

If you're doing a bunch of exploration on a single dataset in one period of time or if the dataset is large enough and you're selecting subsets of it, you can ingest the data into a database (and optionally toggle indexes).

Then you can query as many times as you want and not worry about ingest again until your data changes.

All three of the tools I listed have variations of this sort of caching of data built in. For dsq and q with caching turned on, repeat queries against files with the same hashsum only do queries against data already in SQLite, no ingestion.

1 more reply

algesten4y ago

I don't get it. "Instead of learning jq DSL, learn zq DSL".

To me they look similarly complicated and the examples stresses certain aggregation operations that are harder to do in jq (due to it being stateless).

loeg4y ago

> "Instead of learning jq DSL, learn zq DSL"

I think you got it — that’s exactly the idea. They claim (reasonably?) that it’s a more intuitive DSL; and it supports state. They also make some performance claims towards the end of the article.

jerrysievert4y ago

> They also make some performance claims towards the end of the article.

essentially a marginal speed increase they think on json, but a much bigger speed increase (5x-100x they claim) if you switch to their native format ZNG.

if I'm switching formats completely, I'm not sure why I care about jq vs zq in json performance ...

1 more reply

p5a0u9l4y ago

Yes, but fortunately, your efforts will pay dividends when parsing all the 'z*' boutique formats that it supports, zson, zst, zng, the list goes on. /s

mattnibs4y ago

Not sure if this came across in the article, but all the "boutique" z* formats are all representations of the same zed model https://zed.brimdata.io/docs/formats/zed/

enriquto4y ago

> "Instead of learning jq DSL, learn zq DSL".

A saner approach is to gron the damn json and just use regular unix tools on the data.

knome4y ago

These guys must really hate functional programming.

I can see where jq might confuse someone new to it, but their replacement is irregular, stateful, still difficult, and I don't even see variable binding or anything.

jq requires you to understand that `hello|world` will run world for each hello, passing the world out values to either the next piped expression, the wrapping value-collecting list, or printing them to stdout.

it's a bit unintuitive if you come in thinking of them as regular pipelines, but it's a constant in the language that once learned always applies.

this zed thing has what appears to be a series of workarounds for its own awkwardness, where they kept tacking on new forms to try to bandaid those that came before.

additionally, since they made attribute selectors barewords where jq would require a preceding reference to a variable or the current value (.), I'm not sure where they'll go for variables should they add them.

johnday4y ago

No kidding!

This part in particular jumped out at me:

> To work around this statelessness, you can wrap a sequence of independent values into an array, iterate over the array, then wrap that result back up into another array so you can pass the entire sequence as a single value downstream to the “next filter”.

This is literally just describing a map. A technique so generally applicable and useful that it's made its way into every modern imperative/procedural programming language I can think of. The idea that this person fails to recognise such a common multiparadigmatic programming idiom doesn't fill me with confidence about the design of zq.

aarchi4y ago

In fact, jq already has `map`, which would replace the article's pattern of `[.[]|add]` with `map(add)`. It is defined as such:

    def map(f): [.[] | f];

Many built-in functions in jq are implemented in jq, in terms of a small set of core primitives. The implementations can be inspected in builtin.jq.

https://github.com/stedolan/jq/blob/master/src/builtin.jq#L3

thaliaarchi4y ago

I find the stateless streaming paradigm in jq very pleasing.

Results can be emitted iteratively using generators, which are implemented as tail-recursive streams [0]. Combined with the `input` built-in filter, which yields the next item in the input stream, and jq can handle real-time I/O and function as a more general-purpose programming language.

I built an interpreter for the Whitespace programming language in jq using these concepts and it's easily one of the most complex jq programs out there.

[0]: https://stedolan.github.io/jq/manual/#Generatorsanditerators

[1]: https://github.com/andrewarchi/wsjq

qmacro4y ago

Wow, I've been on the lookout for larger jq programs from which to learn. I'm going to enjoy learning from wsjq, thank you!

mattnibs4y ago

Variables exist in zq, "this" is a reserved word: echo {x:1} | zq 'x := x+1' -

thayne4y ago

I think their main complaint is that you can't iteratively operate on a stream as a whole without first converting it to an array, which besides sometimes requiring awkward syntax, can require a lot of memory for large datasets.

micimize4y ago

Their syntax comparison under "So you like chocolate or vanilla?" is disingenuous. You can do variable assignment and array expansion in jq:

  expand_vals_into_independent_records='
    .name as $name | .vals[] | { name: $name, val: . }
  '
  echo '{"name":"foo","vals":[1,2,3]} {"name":"bar","vals":[4,5]}' |
    jq "$expand_vals_into_independent_records"

Also, generally, not a fan of the tone of this article.

lilyball4y ago

Your `.name as $name` was my immediate attempt too, but it turns out you can go even simpler with

  jq '{name, val: .vals[]}'

diehunde4y ago

Pardon my ignorance, but would I spend time learning something like jq or zq when it only takes me a couple of minutes to develop a script using some high-level language? I've had to process complex JSON files in the past, and a simple Python script gets the job done, and the syntax is much more familiar and easier to memorize. Is there a use case I'm missing?

meowface4y ago

If you're doing a lot of JSON munging every day and have good mastery of something like jq or zq, you can probably get things done faster.

Like you, I almost always just write Python scripts for such tasks because it's a lot easier for me to reason through it and debug it, but it's definitely slower-going than what I might do if I were very adept in a terse language like jq. I don't do this too often, so it makes little difference to me, but if someone is doing this multiple times a day, every day, it'll add up. As you say, it takes a few minutes; with jq, it could be a few seconds.

meepmorp4y ago

The same thing could be said for grep, or really any other utility that can have its functionality reproduced in a programing language.

eru4y ago

Indeed! Jq is basically something like grep for JSON.

It might actually make sense to embed jq functionality into your favourite language (as a library or so), as it is quite a nice and well-chosen set of functionality.

1 more reply

folkrav4y ago

Honestly, I've only really used `jq` to quickly parse JSON structures in interactive sessions e.g

  curl -s http://foo.bar | jq .some.nested.value

Anything more complicated I would indeed go for writing a proper script.

eru4y ago

Don't tell anyone, but jq is secretly a pretty well thought out functional programming language.

johnthuss4y ago

There is certainly a learning curve with jq that can put people off. The attraction is that the end result is a very small amount of code that does only one thing: parse a JSON file, rather than invoking an external script that might send many HTTP requests or launch a missile.

As the complexity of the input JSON grows or the complexity of your processing, it does makes sense to leave jq behind for a higher level language.

eru4y ago

I agree with most of what you say.

I disagree with 'leaving for a higher level language'. Jq is an extremely high level language.

What it is _not_ is a general purpose language.

aftbit4y ago

This is how I felt about regular expression when I was first learning them. Now I feel that they're one of the most powerful text-processing tools that I know. I also felt similarly about SQL at the very beginning. IMO if you find yourself doing a _lot_ of JSON processing, learning at least basic jq gives you superpowers.

ris4y ago

1. The High Level Language of your choice may not be the flavour liked by other members of your team. Ruby? ew please use Python - unnecessary discussion ensues... 2. Your High Level Language of choice would probably require a non-trivial container image, which requires extra decisions to be made about sourcing, which is something you'd rather not think about if this is just e.g. a step in a CD pipeline. jq is tiny and a very simple addition to an existing image. It's even present by default in GitHub Actions' `ubuntu-latest`. 3. Your High Level Language of choice may require dependencies to do the same job. How are those dependencies going to be defined, pinned, who's going to be responsible for bumping them...?

I used to 100% agree with you, but these days I understand why so much stuff ends up being bash and jq.

orthecreedence4y ago

You can spend a few days getting to know jq or you can happily live with your 100+ purpose-built scripts. I know which one I prefer.

I don't even process complex JSON...it's usually pretty basic. But being able to quickly select parts out of streams of JSON data on the CLI is incredibly useful to me, and learning even just the basics of jq has paid for itself a hundred times over by now.

Granted, a lot of my job right now is data forensics stuff, so I breath this kind of stuff. You might never need jq.

pantulis4y ago

I am also on the same side of the discussion, but I'm a programmer by trade. Most of the cases I've seen non trivial jq uses is by people doing command line or shell script magic. In this context I guess it's easier to write the jq expression language than to whip up a fully fledged Python/Ruby/Perl script without having to debug pretty basic stuff once you know the syntax. Pretty much like awk.

eatonphil4y ago

I'm a programmer by trade. I use jq. :)

vlunkr4y ago

jq is great for shell scripts. Say your script hits an API that returns JSON, and you want to retrieve a single field. This can be difficult to do correctly with grep or other text matching tools, but is trivial with jq. You just pipe it in like "curl XYZ | jq '.path.to.your.data'"

I imagine this is how it's used 90% of the time, but can do lots more advanced stuff as described in the article.

johnday4y ago

Suppose you write a shell script which is intended for use among colleagues as part of a pipeline.

In many cases, the most appropriate and useful tool for the job would be jq - one line in the shell script corresponding to the required data transform, calling out to `jq`, which already has a reasonable user base and documentation, and could be trivially replaced by anyone if the business needs change.

brushfoot4y ago

The name of its corporate progenitor may leave a bad taste in some mouths, but I highly recommend PowerShell for this sort of thing. It's cross platform, MIT licensed, and comes with excellent JSON parsing and querying capabilities. Reading, parsing, and querying JSON to return all red cars:

  Get-Content cars.json | ConvertFrom-Json | ? { $_.color -eq 'red' }

The beauty of this is that the query syntax applies not just to JSON but to every type of collection, so you don't have to learn a specific syntax for JSON and another for another data type. You can use Get-Process on Linux to get running processes and filter them in the same way. The same for files, HTML tags, etc. I think nushell is doing something similar, though I haven't tried it yet.

I prefer this approach to another domain-specific language, as interesting as jq's and zq's are.

ptx4y ago

PowerShell "sends basic telemetry data to Microsoft [...] about the host running PowerShell, and information about how PowerShell is used" [1].

And since it relies on .NET, that also requires its own separate opt-out for its telemetry. There might be other components, now or in the future, that also send data to Microsoft by default and would have to be separately discovered and disabled.

[1] https://docs.microsoft.com/en-us/powershell/module/microsoft...

sandyarmstrong4y ago

> And since it relies on .NET, that also requires its own separate opt-out for its telemetry.

Building a program with .NET does NOT cause that program to send telemetry to Microsoft.

You're thinking of the .NET SDK itself. Using PowerShell does not trigger any use of the .NET SDK.

Disclaimer: I work for Microsoft.

1 more reply

brushfoot4y ago

To me a telemetry opt-out is a small price to pay for what PowerShell brings to the table, but to each their own.

> There might be other components, now or in the future, that also send data to Microsoft

Of course. Do your due diligence on whatever you install. No tool should be exempt from that.

1 more reply

vips7L4y ago

> The beauty of this is that the query syntax applies not just to JSON but to every type of collection,

This is the best part of pwsh. Everything is standardized, you're not guessing at the idioms of each command, and you're working with objects instead of parsing strings!

My second favorite part is having access to the entire C# standard library.

bblb4y ago

PowerShell is "Python interactive done right". It's too bad it has a bad rap in open source community and it might never get the traction it really deserves. Sure it has it's downsides, which tech doesn't, but PowerShell has solved so many issues and annoyances with the shells that we've been used to, that it still comes out as the winner.

I've been using it since day one from 2006, every single day. It has come a long way and the current PS7 is the best shell experience there is. Hands down no contest.

Snover's passionate early presentation about the PS pipeline is a pretty cool tech video. https://www.youtube.com/watch?v=325kY2Umgw8

spiralx4y ago

> PowerShell is "Python interactive done right".

Actually PowerShell is "Perl interactive done right" if you read what the designers say about their influences - the automatic variable $_ is straight from Perl and the array creation syntax @(a, b, c) is also a Perl-ism from @arr = (a, b, c). Which is funny as I dislike Perl intensely but really like PowerShell :)

To be fair there's not much Perl in PS, it's as much influenced by KSH, Awk, cmd.exe and VBScript as Perl. Thankfully "influenced by" isn't "a melange of", because a combination of all of those sounds like an abomination lol, and PS is wonderful in being about as consistent and simple as a proper shell can get.

klysm4y ago

I want to learn powershell, but I have an internal ick bias because I've been using bash for so many years. The tab behavior is the exact opposite of what I expect and it short circuits my brain every single time I press it. Having structured data in the pipes seems very useful and powerful though so I should probably just bite the bullet.

vips7L4y ago

Tab behavior is configurable. I have mine set to menu expansion.

    Set-PSReadLineKeyHandler -Key Tab -Function MenuComplete

1 more reply

ilyash4y ago

.. or you can try Next Generation Shell (author here):

fetch("cars.json").filter({"color": "red"})

# or

echo(fetch("cars.json").filter({"color": "red"}))

tl4y ago

Powershell's object pipes are more inspectable than any of Bourne's text-based decendants. But the tool itself occupies a niche between "write shell, dealing with esoteria of grep/sed/awk/jq/etc" and "write python getting constructs than handle complexity better than pipes".

Looking at the popularity of VSCode, I don't think Microsoft hatred blocks its adoption.

ComputerGuru4y ago

> Looking at the popularity of VSCode, I don't think Microsoft hatred blocks its adoption.

In-apt comparison. The people using VS Code are more likely to be migrating from proprietary tools like PyCharm, Sublime Text, etc or bloated offerings like NetBeans or roughly equivalent offerings like Atom.

The people that would use PowerShell would be migrating from the likes of Zsh, Bash, Fish, and other “hard core free” software.

mdaniel4y ago

I conceptually like pwsh, but even as your example shows, I don't have the RSI budget left to spend on typing that extremely verbose expression every day

jq and its unix-y friends allow me to trade off expressiveness against having to memorize arcane invocations

brushfoot4y ago

I hear that, I use and like *nix too. PowerShell aliases help a lot. It comes with some predefined, like `gc` for `Get-Content`. The above example could be rewritten:

  gc cars.json | ConvertFrom-Json | ? color -eq 'red'

`ConvertFrom-Json` doesn't have a default alias, but you can define one in your PowerShell profile. I do that for commands I find myself using frequently. Say we pick convjson:

  gc cars.json | convjson | ? color -eq 'red'

That's more like what my typical pipelines look like.

The nice thing about aliases is you can always switch back to the verbose names when clarity is more important than brevity, like in long-term scripts.

Edit: Seems I've been using too many braces and dollar signs all these years. Thanks to majkinetor for the tip.

1 more reply

Arnavion4y ago

jq can not only process JSON input but also emit JSON output. So on that note, has ConvertTo-Json stopped mangling your JSON yet? https://news.ycombinator.com/item?id=25500632

pxc4y ago

Agreed— PowerShell is really nice for this, as are some of the other shells it has inspired.

AcerbicZero4y ago

I'm pretty new to jq (maybe 2 years of exposure) but from my perspective - on some level, jq does to json what powershell does to everything windows, except powershell gives me the get-member cmdlet, so when I don't know what is even in my object, I can explore.

Sometimes jq -r '.[]' works, but its all just trial and error. I use plenty of jq in my scripts, but I can never seem to visualize how jq looks at the data. I just have to toss variations of '.[whateveriwant].whatever[.want.]' until something works....I suppose the root of my complaint is that jq does not do a good job of teaching you to use jq. It either works, or gives you nothing, and while I've learned to work around that, I'll try anything that claims to be even 1% better than jq.

anitil4y ago

I use jless to manually find what I'm looking for and then using the result as a starting point. Unfortunately I don't know how to get that query in to the paste buffer yet so there's a manual step in the middle

abledon4y ago

There is also "JP" https://github.com/jmespath/jp

which follows the jmespath standard

mdaniel4y ago

My heartburn with jmespath is that it lacks pipelines, only projections, so doing _crazy_ stuff to the input structure is damn near impossible

NateEag4y ago

I suspect the JMESPath people would argue that if you want to do major transformations to the input, you should write a proper program, and that a CLI query tool should focus on, well, querying.

I'm personally trying to move away from jq and towards jp, because

- there's a standard defining it, not just an implementation, decreasing the odds of being stuck with an unmaintained tool

- there are libraries supporting the syntax for most of the major programming languages

- JMESPath's relative simplicity compared to jq is a good thing, IMO - Turing-completeness is a two-edged sword

- JMESPath is the AWS CLI query language, which is a convenient bonus

1 more reply

remram4y ago

From a computer science point of view, what kind of transformations are impossible to express in jmespath but are possible in jq?

1 more reply

hbbio4y ago

jq is awesome, last time I used it is... today :)

Or rather the pure Go rewrite https://github.com/itchyny/gojq which is a better faster implementation, with bugs fixed

kitd4y ago

The better error messages alone make this an improvement over jq IMHO.

mdaniel4y ago

And if it's maintained, that's also a plus, since I didn't realize jq was unmaintained, I thought it just didn't have any bugs to fix

politelemon4y ago

> HomeBrew for Mac or Linux

Please do not recommend HomeBrew for Linux. A binary download is safer compared to how HomeBrew clobbers a Linux machine. If you do not wish to use a Linux package manager, simply point at the binary download. It is much safer and less intrusive.

xenophonf4y ago

Homebrew isn't any better on macOS. Why people use it instead of MacPorts is beyond me.

sfink4y ago

The thing that I find myself wanting, which is lacking in both jq and zq afaik, is interactive exploration. I want to move around in a large JSON file, narrow my context to the portion I'm interested in, and do specialized queries and transformations on just the data I care about.

I wrote a tool to do this -- https://github.com/hotsphink/sfink-tools/blob/master/bin/jso... -- but I do not recommend it to anyone other than as perhaps a source of inspiration. It's slow and buggy, the syntax is cryptic and just matches whatever I came up with when I had a new need, etc. It probably wouldn't exist if I had heard of jq sooner.

But for what it does, it's awesome. I can do things like:

  % json somefile.json
  > ls
    0/
    1/
    2/
  > cd 0
  > ls
    info/
    files/
    timings/
    version
  > cat version
  1.2b
  > cat timings/*/mean
  timings/firstPaint/mean = 51
  timings/loadEventEnd/mean = 103
  timings/timeToContentfulPaint/mean = 68
  timings/timeToDomContentFlushed/mean = 67
  timings/timeToFirstInteractive/mean = 658
  timings/ttfb/mean = 6

There are commands for searching, modifying data, aggregating, etc., but those would be better done in a more principled, full-featured syntax like jq's.

I see ijq, and it looks really nice. But it doesn't have the context and restriction of focus that I'm looking for.

eloh4y ago

You could take a look at jless [1], it allows interactive selections/browsing in JSON documents.

[1] https://jless.io/

anitil4y ago

Do you know a way to copy the current selection in jless to the paste buffer? I find myself narrowing down a query for jq using jless but then having to manually remember the query to then jump to my terminal.

1 more reply

ratorx4y ago

I really like fx (https://github.com/antonmedv/fx) for interactive stuff. It does exactly what I think you want. You can expand individual fields and explore the schema.

However, I really do like jq for queries and scripting, so I keep both around.

ggm4y ago

This is almost exactly how I think about the problem of deciding how to deep-key to a specific field of a nested json structure.

If you can emit the syntactic form as a Python or perl ref, or a jq array ref, then I could use your tool to find the structure and the other ones to stream.

Great example! Thanks for posting this.

lichtenberger4y ago

That's one of the main steps forward for Brackit, a retargetable JSONiq query engine/compiler (http://brackit.io) and the append-only data store SirixDB (https://sirix.io) and a new web frontend. My vision is not only to explore the most recent revision but also any other older revisions, to display the diffs, to display the results of time travel queries... help is highly welcome as I'm myself a backend engineer and working on the query engine and the data store itself :-)

Detect changes of a specific node and the whole subtree/subtree:

    let $node := jn:doc('mycol.jn','mydoc.jn')=>fieldName[[1]]
    let $result := for $node-in-rev in jn:all-times($node)
                   return
                     if ((not(exists(jn:previous($node-in-rev))))
                          or (sdb:hash($node-in-rev) ne sdb:hash(jn:previous($node-in-rev)))) then
                       $node-in-rev
                     else
                       ()
    return [
      for $jsonItem in $result
      return { "node": $jsonItem, "revision": sdb:revision($jsonItem) }
    ]

Get all diffs between all revisions and serialize the output in an array:

    let $maxRevision := sdb:revision(jn:doc('mycol.jn','mydoc.jn'))
    let $result := for $i in (1 to $maxRevision)
                   return
                     if ($i > 1) then
                       jn:diff('mycol.jn','mydoc.jn',$i - 1, $i)
                     else
                       ()
    return [
      for $diff at $pos in $result
      return {"diffRev" || $pos || "toRev" || $pos + 1: jn:parse($diff)=>diffs}
    ]

Open a specific revision

By datetime:

    jn:open('mycol.jn','mydoc.jn',xs:dateTime('2022-03-01T00:00:00Z'))

By revision number:

    jn:doc('mycol.jn','mydoc.jn',5)

And a view of an outdated frontend:

https://github.com/sirixdb/sirix/raw/master/Screenshot%20fro...

dan-robertson4y ago

One solution I’ve seen is basically to hijack fzf to interactively input a jq query, add closing brackets in a naive way, run jq -C … | head on an input file, and display the result as a fzf ‘preview’. fzf ends up handling things like the preview command and display and line-editing logic but it may be slow if you don’t get early results.

endgame4y ago

It feels a lot like the FP idea of a zipper coupled to an interactive shell.

lichtenberger4y ago

I'm working on a JSONiq based implementation to jointly process JSON data and XML. The compiler uses set-oriented processing (and thus uses hash joins for instance wherever applicable) and is meant to provide a base for JSON based database systems with shared common optimizations (but can also be used as a standalone in-memory query processor):

http://brackit.io

The language itself borrows a lot of concepts from functional languages as higher order functions, closures... you can also develop modules with functions for easy reuse...

A simple join for instance looks like this:

        let $stores :=
        [
          { "store number" : 1, "state" : "MA" },
          { "store number" : 2, "state" : "MA" },
          { "store number" : 3, "state" : "CA" },
          { "store number" : 4, "state" : "CA" }
        ]
        let $sales := [
           { "product" : "broiler", "store number" : 1, "quantity" : 20  },
           { "product" : "toaster", "store number" : 2, "quantity" : 100 },
           { "product" : "toaster", "store number" : 2, "quantity" : 50 },
           { "product" : "toaster", "store number" : 3, "quantity" : 50 },
           { "product" : "blender", "store number" : 3, "quantity" : 100 },
           { "product" : "blender", "store number" : 3, "quantity" : 150 },
           { "product" : "socks", "store number" : 1, "quantity" : 500 },
           { "product" : "socks", "store number" : 2, "quantity" : 10 },
           { "product" : "shirt", "store number" : 3, "quantity" : 10 }
        ]
        let $join :=
          for $store in $stores, $sale in $sales
          where $store=>"store number" = $sale=>"store number"
          return {
            "nb" : $store=>"store number",
            "state" : $store=>state,
            "sold" : $sale=>product
          }
        return [$join]

Of course you can also group by, count, order by, nest FLWOR clauses...

preferjq4y ago

Here is a straightforward jq translation

    def stores:
      [
        { "store number" : 1, "state" : "MA" },
        { "store number" : 2, "state" : "MA" },
        { "store number" : 3, "state" : "CA" },
        { "store number" : 4, "state" : "CA" }
      ];
    def sales:
      [
        { "product" : "broiler", "store number" : 1, "quantity" : 20  },
        { "product" : "toaster", "store number" : 2, "quantity" : 100 },
        { "product" : "toaster", "store number" : 2, "quantity" : 50 },
        { "product" : "toaster", "store number" : 3, "quantity" : 50 },
        { "product" : "blender", "store number" : 3, "quantity" : 100 },
        { "product" : "blender", "store number" : 3, "quantity" : 150 },
        { "product" : "socks", "store number" : 1, "quantity" : 500 },
        { "product" : "socks", "store number" : 2, "quantity" : 10 },
        { "product" : "shirt", "store number" : 3, "quantity" : 10 }
      ];
    
    [
        {store: stores[], sale: sales[]}
      | select(.store."store number" == .sale."store number")
      | { nb:    .store."store number",
          state: .store.state,
          sold: .sale.product
        }
    ]

Try it online - https://tio.run/##rZPPUsMgEMbP5Sl2ctIZmklbe6HTg@PZJ8jkkD84Rh...

lichtenberger4y ago

The difference might be that Brackit uses sophisticated join algorithms for these kinds of implicit joins as known from relational query processing.

arwineap4y ago

I've never found jq to be particularly hard, or slow

anitil4y ago

I've generally found it a bit hard to get an initial query going, but then it is as fast as anything I've thrown at it. Generally something else (probably my code) falls over before I even notice jq's impact.

cosmiccatnap4y ago

I would love to see what jq looks like on something like a 1mil line Json vs this. In my experience jq syntax is fine and I've not ran into a performance issue on any one file but I seem to see a jq clone every few months on here so someone seems to need that, or maybe it's just the new volume slider problem who knows.

justinsaccount4y ago

jq performance is pretty terrible. Here I'm going to do something super simple like pull out a single field out of a large log file:

  $ wc -l big.log 
    979400 big.log

  $ du -hs big.log 
  570M big.log

`count` is a small program that counts lines on stdin. like `sort|uniq -c |sort -n`

jq takes 12 seconds:

  $ time cat big.log |jq -cr .method |~/bin/count 
  848000 GET
  94800 POST
  34000 HEAD
  2400 OPTIONS
  200 null

  real 0m12.381s
  user 0m12.427s
  sys 0m0.333s

my tool takes .5 seconds

  $ time cat big.log |~/bin/jj method |~/bin/count 
  848000 GET
  94800 POST
  34000 HEAD
  2400 OPTIONS
  200 

  real 0m0.466s
  user 0m0.512s
  sys 0m0.198s

`jj` is a little tool I wrote that uses https://github.com/buger/jsonparser

xg154y ago

A bit OT:

The post links to the tutorial "An Introduction to JQ" at [1].

Somewhere inside the tutorial, array operators are introduced like this:

> jq lets you select the whole array [], a specific element [3], or ranges [2:5] and combine these with the object index if needed.

This is not supposed to be criticism on this particular tutorial (I've seen this kind of description quite often), but I could imagine this to be a typical "eyes glaze over" moment, where people subtly lose track of what is happening.

It appears to make sense on first glance, but leaves open the question what "selecting the whole array" actually means - especially, since you can write both ".myarray" and ".myarray[]" and both will select the whole array in a sense.

I think this is the point where one would really need to learn about sequences and about jq's processing model to not get frustrated later.

[1] https://earthly.dev/blog/jq-select/

adamgordonbell4y ago

Oh, I wrote that. I think I get what you mean. There are two different things, and they aren't being delineated. How would you explain it?

I don't know how jq works internally and in my mental model [] maps into the json array and also can wrap things back into an array. So that [.[]] unwraps and then rewraps a JSON array, sort of like how [.[].title] is the same as map(.title).

knowsuchagency4y ago

jq is a great tool, but my favorite alternative, by far, is jello and the libraries the author has created around it https://blog.kellybrazil.com/2020/03/25/jello-the-jq-alterna...

qmacro4y ago

There's a lot of references here to jq being 'arcane'. For me, one of the challenges in improving my jq fu has been to find examples of larger programs, from which to learn.

One thing that seems to be perhaps a misconception amongst some is that jq invocations are short and only 'one-liners', and that a 'real script' (in a 'real language') would be better in many cases. I think this lack of larger program examples probably helps to perpetuate this misunderstanding too.

Anyway, I was inspired enough by the article in question to write up some of my own thoughts on jq and statelessness: https://qmacro.org/blog/posts/2022/05/02/some-thoughts-on-jq...

29athrowaway4y ago

"Easier" is subjective. For simple use-cases, zq is harder to understand than jq.

I also have never seen jq as a performance bottleneck.

jq is stable, I have never encountered a bug with it and I have never seen it getting stuck after years of usage. It is dependable and practical.

jq has helped me put out countless fires throughout my career. I should donate to it one day.

pm904y ago

It took me a while to grok jq, but now that I do I kinda like it? I don't think I want to learn yet another thing.

I do like tools that complement/supplement jq though, like jid: https://github.com/simeji/jid

ilyash4y ago

While we are at it, I have a list of JSON tools for command line here - https://ilya-sher.org/2018/04/10/list-of-json-tools-for-comm...

eru4y ago

Jq being secretly a sort-of functional programming language is part of what makes it great.

Why would you change that?

gcmeplz4y ago

I like using `jq` to create line-delimited JSON and then using a language I know well (Node) to process it after that point. I find `jq '.[] | select(.location=="Stockholm")'` less readable than something like `nq --filter '({location}) => location === "Stockholm"'` because I'm much more used to Node syntax.

- https://github.com/thisredone/rb is a widely used ruby version of this idea

- https://github.com/KelWill/nq#readme is something similar that I wrote for my own use

eru4y ago

By Node, you mean JavaScript?

If yes, it's fascinating to me, that jq is so powerful, it's even useful when handling JavaScript Object Notation in JavaScript.

kaliszad4y ago

For me, transforming JSON on the command line was a pain, another DSL to learn. Now, I can just use Babashka/ Clojure + one or two functions from Cheshire https://github.com/dakrone/cheshire where I need to. If I needed a standalone tool, I would perhaps reach for https://github.com/borkdude/jet by the same author, Michiel Borkent, as Babashka or use jq that everybody else would find more familiar.

ducaale4y ago

In the theme of jq alternatives, there is fx[1] which has an interactive view and supports querying JSON in Javascript, Python and Ruby. It used to be a node CLI but was recently rewritten in golang[2]

[1] https://github.com/antonmedv/fx

[2] https://twitter.com/antonmedv/status/1515429017582809090

phibz4y ago

I think of

echo '1 2 3' | jq ....

as creating three separate json documents, each with a single number as their top level "document" , body, or content.

So of course you can't sum them. They are fed as separate documents to the jq pipeline as if you processed three separate jq commands.

Perhaps by stateless you mean no mutuable global state? But it certainly maintains state from the location in the input document to the output of each selector/functor.

IMO it helps if you have a background in some of the concepts of functional programming.

ilyash4y ago

In Next Generation Shell (author here), it is not as ergonomic (yet?) but on the other hand it's a fully fledged no-nonsense programming language... and I claim quite a readable.

good_data = fetch("openlibrary.json").docs.filter({"author_name": Arr, "publish_year":Arr})

good_data.map({{"title": A.title, "author_name": A.author_name[0], "publish_year": A.publish_year[0]}}).group("author_name").mapv(len).sortv((>=)).limit(3)

taude4y ago

I'm surprised no one mentioned rq [1] yet. It's come up before in older HN threads [2] whenever the discussion on jq comes up...

[1] https://github.com/dflemstr/rq [2] https://news.ycombinator.com/item?id=13090604

gzapp4y ago

I'm sure I'm not the only person that got fed up with occasionally needing to do something more advanced and just finding the JQ incantations inscrutable.

Also prob not the first to create a project for personal use that just wraps evals in another language haha: https://www.npmjs.com/package/jsling

bradwood4y ago

Nothing beats gron in my view.

That plus good old fashioned sed/grep/awk give me everything I need to do on the cli.

If I want more, it's python or node.

quotemstr4y ago

As an aside --- isn't the traditional flat namespace of unix command names getting a bit crowded nowadays?

anitil4y ago

We've got space for 26^2 2-letter commands...

> for d in $(echo $PATH | tr ":" "\n") ; do ls $d | grep "^..$"; done | sort -u | wc -l

> 52

I can fit a few more in

(edit: I can't work out how to put code in a comment)

henrydark4y ago

I have recently started to use jq massively, and I love it.

Zq looks cool, but the fact that this piece doesn't contain a single instance of the word "map" tells me the authors still haven't gotten jq. Especially with the running strawman example of adding numbers.

stblack4y ago

Why all the hate HN?

I feel the author makes his case clearly, then presents an alternative. Underneath all this is a ton of work, for which I applaud OP.

It may not scratch your particular itch, but come on!

Being an ass on HN is a choice. It happens far too often, and I wish everyone would just dial it back.

dimitrios14y ago

Do not confuse critique with hate.

This place has a high standard for new tools and libraries, particularly one that claims to be better in any stretch ("faster" and "easier"). If this was say, a college student learning programming and presenting it as "hey I made a jq alternative and I believe it's easier and faster" I imagine it would solicit more softened feedback.

Come prepared, and ready to defend your stance. If you can't take the heat, don't come in the kitchen.

eatonphil4y ago

I don't see hate for the project here.

I see criticism for the way they're trying to position it as easier than jq when it's just different than jq.

It looks like a cool project on its own and doesn't need to describe jq as confusing to make that point.

skybrian4y ago

But it is easier, for them.

Easier, as a universal claim, is hard to establish - you'd need to do user studies. Easier in the author's opinion is normal usage, and their opinion is as good as anyone else's. They gave a reasonable justification.

I kind of think you'd need to use both tools to have an informed opinion about which you think is easier. But most of us aren't going to do that, which is fine.

I think having strong opinions about which is easier without trying them both is weird, though.

1 more reply

dimensionc1324y ago

Simple json tasks .... read from, write to, read a value and save it as a variable in BASH .... where are those examples?

The question for is this; can I do with json files what i can do with Python using Zq?

jrm44y ago

Okay, so I'm a big scripter and not much of a programmer and I definitely have found jq to be mostly worthless to me; but it also looks like zq doesn't much help?

Seems to me that if you're in a shell, then you should be "shell-like." There should not be much of a learning curve at all, and when in doubt, try to behave like other shell tools, in a Unix way way. Make pipe behavior generally predictable, especially for those who aren't deep into json et al.

And if you're not going to do that, say so on "the box?"

(Disclaimer, it could be that I'm an idiot when it comes to all of this and I'm missing something big. Kind of feels that way, and I welcome correction)

dymk4y ago

Could you help me understand what your usecases are, and where jq/zq fall short? I find the tools useful for e.g. curl'ing a request with a JSON format into, and then mapping/filtering/reducing the content into what I want. It seems pretty unix-y to me, but I'm curious what the shortcomings are. For instance, could you give an example where the pipe behavior is unpredictable?

caymanjim4y ago

I almost gave up before I got to the first mention of zq, and then wished I had.

pygar4y ago

I really wish that a jq type program was included in coreutils (or similar). I have wanted to to use it a few times but could never be sure it was going to be installed.

tus6664y ago

The worst thing about JQ is printing out several values from an object at once. The syntax is so bad I have to look it up on SO every time.

spiralx4y ago

Looks interesting, and I like the analysis of JQ's model in order to compare and contrast it with ZQ. It reminds me somewhat of the difference between XPath 1 to XPath 2 - the former worked great if you were selecting stuff and using the built-in functions to do some simple transformations but if you tried to go further it was a hot mess of incompatible types each with their own constraints and quirks. In XPath 2 they completely revised the data model and type system, making everything an atomic value, node or a sequence of values and/or nodes, allowing types to be queried and operated on, and added a ton of useful built-in functions and operators, making impossible queries in XPath 1 possible.

https://docs.microsoft.com/en-us/archive/msdn-magazine/2003/...

Anyway, I've installed ZQ and will look to use it, even my simple usage of JQ had already led to thoughts of writing my own, better version :)

Quick bug report: On the Aggregate Functions page the link to _countdistinct_ goes to the page for _count_, and there actually isn't a page at https://zed.brimdata.io/docs/language/aggregates/countdistin....

kryptozinc4y ago

Is there a universal json normalizer (to csv for example) that doesn't require learning a terse language?

omaranto4y ago

There is gron [1], which prints json as a series of assignment statements that recreate the json value. It's pretty handy.

[1] https://github.com/TomNomNom/gron

harbor110124y ago

btw, in case you don't know, you can actually run jq using a curl command:

https://xbin.io/w/tool/jq

tzury4y ago

yq uses jq like syntax but works with YAML, JSON and XML.

https://github.com/mikefarah/yq

Aeolun4y ago

Both zq and jq seem like black magic to me.

xg154y ago

So, admitted jq fanboy here, but I found a lot of the criticism from the articale really sensible.

I think jq has a pretty elegant data model, but the syntax is often very clunky to work with.

So here is a half thought-out idea how you might improve the syntax for the "stateful operations" usecase the OP outlined:

I think it's not quite true that different elements of a sequence can never interact. The OP mentioned reduce/foreach, but it's also what any function that takes argument does:

If you have an expression 'foo | bar', then bar is called once for every element foo emits. However, foo could also a function that takes arguments. Then you can specify bar as an argument of foo like this: 'foo(bar)'. In this situation, execution of bar is completely controlled by foo. In particular, foo gets to see all elements that foo emits, not just one each. I believe this is how e.g. [x] can collect all elements of x into an array.

In the same way, you could write a function 'add_all(x)' which calls x and adds up all emitted elements to a sum.

However, this wouldn't help you with collecting all input lines, as there is nothing for you function to "wrap around". Or at least, there used to be nothing, but I think in one of the recent build, an "inputs" function was added, which emits all remaining inputs. So now, you can write e.g. '[., inputs]' to reimplement slurp. In the same way, you could sum up all input lines by writing 'add_all(., inputs)'.

However, this is still ugly and unintuitive to write, so I think introducting some syntactic sugar for this would be useful. E.g., you could imagine a "collect operator", e.g. '>>' which treats everything left of it as the first argument to the function to the right of it.

e.g., writing 'a >> b' would desugar to 'b(a)'.

Writing 'a | b >> c' would desugar to 'c(a | b)'.

Any steps further to the right are not affected:

'a | b >> c | d' would desugar to 'c(a | b) | d'.

Scope to the left could be controlled with parantheses:

'a | (b >> c)' would desugar to 'a | c(b)'.

To make this more useful for aggregating on input lines, you could add a special rule that, if the operator is used with no parantheses, it will implicitly prepend '(., inputs)' as the first step.

So if the entire top-level expression is 'a | b >> c', it would desugar to 'c((., inputs) | a | b)'.

This would make many usecases that require keeping state much more straight-forward. E.g. collecting all the "baz" fields into an array could be written as '.baz >> []' which would desugar to '[(., inputs) | .baz]'

Summing up all the bazzes could be written as '.baz >> add_all' which would desugar to 'add_all((., inputs) | .baz)'

...and so on.

On the other hand, this could also lead to new confusion, as you could also write stuff like '... | (.baz >> map) | ...' which would really mean 'map(.baz)' or 'foo >> bar >> baz' which would desugar to the extremely cryptic expression 'baz((., inputs) | bar((., inputs) | foo))'. So I'm not quite sure.

Any thoughts about the idea?

spiralx4y ago

My thought is that having | as the pipe operator in the shell means that it's okay to use | for the or operator in your query syntax as the people using your XGQ will be familiar with that usage, but to then introduce a pipe operator (which >> pretty much is) alongside using | as something that isn't a pipe is just going to fuck with your muscle memory for reading code. Especially as >> is already the right-shift operator, and so you expect the precedence for 'a | b >> c' to mean 'a | c(b)'.

The pipe operator that's in its final stages of approval for JavaScript uses '|>' as its sigil, which is a decent compromise between not conflicting with existing operators, being compatible with developers' existing pattern matching, and representing what it does somewhat. And 'a | b |> c' is ok.

You could just have '@c a | b' mean "everything from @c onwards to the end of the string is the argument to c" i.e. 'c(a | b)' and have 'c a | b' be 'c(a) | b', then anything more complicated just requires using the parentheses operator to enclose an expression i.e. 'c (a | b)' or just 'c(a | b)' if your tokenizer is a bit cleverer :) Actually I like this idea, because '@' is syntactic sugar for () around the rest of the query, and a function then operates on the value of the expression following it.

marmada4y ago

I see a lot of JQ experts on this thread, so I'll bite the bullet here as a novice.

The purpose of life is not to know JQ. I just want to process the JSON so I can move on and do whatever is actually important. Ideally, I'd just be able to tell GPT-codex to do what I want to do to the JSON in English.

We're not there yet, but in the meantime if there's another tool that allows me to know less in exchange for doing more, I'll gladly use it.

preferjq4y ago

I completely agree when your goal is GSD just use the tools you have.

When you have time to sharpen the saw come back and dig into the details of how jq and tools like it work and where their limits are. Looking at the jq builtins[1] can be very enlightening

If you get to the point where your goal is to increase your jq skills I'd recommend looking at the jq questions on Stack Overflow and posting your own solution. Contributing a solution to https://rosettacode.org/wiki/Category:Jq is also good.

1- https://github.com/stedolan/jq/blob/master/src/builtin.jq

boyter4y ago

Same boat here. I ended up finding gron https://github.com/tomnomnom/gron which resolved that issue for me. Now I don't have to look up how to use jq each time I want to quickly find something in some JSON.

phil2944y ago

This is one of the purposes I think Deno should have been built for: Use JavaScript for oneliners in the command line. We had

    ... | deno xeval '...stdin processing code using special var $'

which was close to xargs in terms of conciseness. Unfortunately, it was removed as being considered "too niche" [1].

[1] https://github.com/denoland/deno/issues/3230

dotopotoro4y ago

> know less in exchange for doing more

That is very rare event with established tooling.

Most of the time complexity is just shifted around.

nixpulvis4y ago

No, not ideally.

English descriptions will never be completely unambiguous and unique keys into a JSON data structure. There is a very good reason programming languages (and other forms of languages) exist.

ctur4y ago

It takes a while to get to the point so I’ll save others some time and tldr this very lengthy and agenda-driven blog post:

jq had a tough learning curve so you should switch to zq which is a (closed source?) wrapper around an obscure language you’ve never heard of that we promise is easier because reasons. Also coincidentally it’s the language of an ecosystem we were funded to build.

Edit: mea culpa, turns out you can download the source (revealed half way through the article).

loeg4y ago

Closed source? https://github.com/brimdata/zed/blob/main/runtime/query.go

Yes, it’s an obscure query language. But if you were interested in jq, that clearly wasn’t a barrier to entry.

I agree the author is happy to show off their tool, but disagree that that is somehow disqualifying. They made a cool thing, they’re allowed to be proud about it.

mdaniel4y ago

And it's BSD 3 Clause, for those interested: https://github.com/brimdata/zed/blob/v1.0.0/LICENSE.txt

j / k navigate · click thread line to collapse

223 comments

mccanneOP4y ago

Hi, all. Author here. Thanks for all the great feedback.

I've learned a lot from your comments and pointers.

Thanks a million!

https://github.com/brimdata/zed https://www.brimdata.io/join-slack/

preferjq4y ago

"cobbled-together" jq as it often appears in the wild will often compare badly with crafted solutions because the writer's goal is usually GSD and not write pretty code.

People with the time and inclination to slow down and think a little more about how the tools work will produce cleaner solutions.

In your example to convert

    {"name":"foo","vals":[1,2,3]}

    {"name":"foo","val":1}
    {"name":"foo","val":2}
    {"name":"foo","val":3}

All you need is this jq filter

    {name:.name, val:.vals[]}

To me this is much better than the proposed zq or jq solution you're using as a basis for comparison. You could almost use the shorter

    .vals = .vals[]

if the name in the output didn't change.

.[] - https://stedolan.github.io/jq/manual/#Array/ObjectValueItera...

MarkMarine4y ago

hyperpallium24y ago

can also use name instead of name:.name

I think jq is very elegant - genius even - but whenever I use it, I have to look up the docs for syntax. But I guess that's true for any infrequently used tool.

1 more reply

1vuio0pswjnm74y ago

Thank you for your work on tcpdump, (original) bpf and the pcap library. I benefit from those projects everyday.

ZSON looks way better than JSON. I pray that the Zed project becomes more popular.

mccanneOP4y ago

Wow, thanks.

https://github.com/brimdata/zbpf

The nice thing about Zed here is any value can be a group-by key so it's easy, for example, to use kernel stacks (an array of strings) in a grouping aggregate.

(p.s. for the record, the only thing I have to do with the modern linux BPF system is the tiny vestige of origin story it shares with the original work I did in the BSD kernel around 1990)

rienko4y ago

Ever since my team started using Splunk (circa 2012), we claimed for a more open version we could tinker with and not cost an arm and a leg to ingest multiple terabytes of daily data.

Positioning as an opensource Splunk would be an interesting play. Going through your docs the union() function looks like it returns a set, akin to splunk values(), is there the equivalent to list()?

mccanneOP4y ago

echo '1 2 2 3 3' | zq 'u:=union(this) | cast(u,<[int64]>) ' -

[1,2,3]

(Note that <[int64]> is a type value that represents array of int64.)

gauravphoenix4y ago

there is Dassana[1] if someone wants to try out json native,index-free, schema-less solution built on top of ClickHouse.

ShowHN post(FAQ)[2]

disclaimer- I'm founder/CEO of Dassana.

[1] https://lake.dassana.io/

[2] https://news.ycombinator.com/item?id=31111432

noborus4y ago

I wrote about how to solve with SQL. https://noborus.github.io/blog/jqsql/

weinzierl4y ago

jq is incredibly powerful and I'm using it more and more. Even better, there is a whole ecosystem of tools that are similar or work in conjunction with jq:

* jq (a great JSON-wrangling tool)

* jc (convert various tools’ output into JSON)

* jo (create JSON objects)

* yq (like jq, but for YAML)

* fq (like jq, but for binary)

* htmlq (like jq, but for HTML)

List shamelessly stolen from Julia Evans[1]. For live links see her page.

Just a few days ago I needed to quickly extract all JWT token expiration dates from a network capture. This is what I came up with:

    fq 'grep("Authorization: Bearer.*" ) | print' server.pcap | grep -o 'ey.*$' | sort | uniq | \
    jq -R '[split(".") | select(length > 0) | .[0],.[1] | gsub("-";"+") | gsub("_";"/") | @base64d | fromjson]' | \
   jq '.[1]' | jq '.exp' | xargs -n1 -I! date '+%Y-%m-%d %H:%M:%S' -d @!

It's not a beauty but I find the fact that you can do it in one line, with proper parsing and no regex trickery, remarkable.

[1] https://jvns.ca/blog/2022/04/12/a-list-of-new-ish--command-l...

kitd4y ago

Also highly recommended is gron [0], to make json easily searchable

[0] https://github.com/TomNomNom/gron

spudlyo4y ago

Most of the time I can get what I need with gron and traditional UNIX tools, without needing to reach for jq, and without having to re-learn its somewhat arcane syntax.

zikduruqe4y ago

I came here looking for a gron recommendation. I use this very often.

toxik4y ago

Is your example not easier to write and read as a 10-something line Python script? I never understood the appeal of jq etc because of this very reason.

1 more reply

hoherd4y ago

I would definitely add dasel to that list. It's become my de facto serialized data converter, and regularly use it to convert between csv, toml, yaml, json, and xml using jq-ish syntaxes.

https://github.com/tomwright/dasel

chriswarbo4y ago

The yq tool also provides 'xq', which works on XML :)

stormbrew4y ago

samcal4y ago

wwader4y ago

Hi, lover of jq and author of fq here! Just wanted to mention that fq is jq so you can do things like `fq 'grep("Authorization: Bearer.*") | split("\n") | ...' file.pcap`.

Also i'm working and prototyping some kind of http decoding support that will make things like select on headers and automatic/manual decoding body possible.

chris378794y ago

I need someone to make a `Q` wrapper that amalgamates all of them. And if that's already taken by a common utility, I vote we name it deLancie, instead.

msluyter4y ago

Whenever jq comes up I feel obligated to mention 'gron'[1]. If all you're doing is trying to grep some deeply nested field, it's way easier with gron, IMHO.

[1] https://github.com/tomnomnom/gron

RulerOf4y ago

Gron and jq are complementary tools IMO. I frequently use gron to trim down large json files such that I can determine what my ultimate jq query is going to look like.

radicality4y ago

https://glom.readthedocs.io/en/latest/

zimpenfish4y ago

[1] "the person who was retweeted" in lieu of a better word.

psacawa4y ago

Since no one seems to know about it, jq is described in great detail on the github wiki page [0]. That flattens the learning curve a lot. It's not as arcane as it seems.

The touted claim that is fundamentally stateless is not true. jq is also stateful in the sense that it has variables. If you want, you can write regular procedural code this way. Some examples [1]

The real problem of jq is that it is currently lacking a maintainer to assess a number of PRs that have accumulated since 2018.

[0] https://github.com/stedolan/jq/wiki/jq-Language-Description

[1] https://github.com/fadado/JBOL/blob/master/fadado.github.io/...

Beltalowda4y ago

> It's not as arcane as it seems.

The issue with jq is that I use it maybe once a month, or even less. The syntax is "arcane enough" that I keep forgetting how to use it because I use it so sporadically.

In comparison awk – which I also don't use that often – has a much easier syntax that I can mostly remember.

Not entirely convinced by the zq syntax either though; it also seems "arcane enough" that I would keep forgetting it.

hiram1124y ago

Bingo.

But if I do end up using it for a few days in a row, it starts to make sense, and I improve each time I use it.

But not with jq.

About the only other syntax or language that gives me such problems is Elastic Search DSL.

1 more reply

laurent1234564y ago

I wonder if someone tried to use plain JS as a filtering language? It would be more verbose but it would be easy to remember. For example:

   [1,2,3] | js "out = 0; for (const n of this) out += n"

That would print "6". `out` would be a special variable you write to to print the result, and `this` would be the input.

5 more replies

ts00004y ago

Interesting, for me it's the exact opposite.

I've tried a couple of times to get into awk, but still find the syntax arcane.

1 more reply

taude4y ago

Same issue. However, I do successfully rely on using ctrl-r a lot to search prior invoked commands. And have a few core aliases that I've cobbled together....

1 more reply

zeroimpl4y ago

I use both awk and jq infrequently enough that I tend to struggle with anything non-trivial. I think zq would be the same.

> Not entirely convinced by the zq syntax either though; it also seems "arcane enough" that I would keep forgetting it.

ar_lan4y ago

This is ironic - I use `awk` so infrequently, I have no idea how to use it without reading its man page or using Google. But I use `jq` often and find it simple.

j1elo4y ago

oblio4y ago

What? That's crazy! Does Github block indexing?

2 more replies

jdnier4y ago

Here's podcast interview with the creator of jq about what he's been working on at Jane Street: https://signalsandthreads.com/memory-management/

klysm4y ago

I didn't realize jq was missing a maintainer, it's one of my most used CLI tools.

ethanwillis4y ago

3 more replies

skybrian4y ago

In this case it doesn't seem too critical? It means jq remains stable, which is probably what should happen once a tool like this gets a lot of users.

adamgordonbell4y ago

I found it hard to approach at first, but I think it was just the lack of material that worked through simple examples step by step.

I ended up writing my own guide to it, that in my unbiased opinion makes it easier to get the point where in-depth examples and language descriptions are easier to understand.

Edit: Oh, wow, it's even mentioned in this article. Maybe I should read before commenting.

https://earthly.dev/blog/jq-select/

sfink4y ago

Thank you!

dilap4y ago

From that page:

> The jq documentation is written in a style that hides a lot of important detail because the hope is that the language feels intuitive.

Yeah, not so much boys! Also, that disclaimer should really be at the top of the manual, with a link to the wiki, rather than vice-versa, as it is now.

The wiki is like secret information -- "oh, hey, here's the page that actually tells you how it works!"

eatonphil4y ago

If jq is getting too slow for you (that's never happened for me), it really seems like it's time to put your data in a database like sqlite or duckdb at least.

Incidentally there are many tools that help you do this like dsq [0] (which I develop), q [1], textql [2], etc.

[0] https://github.com/multiprocessio/dsq

[1] https://github.com/harelba/q

[2] https://github.com/dinedal/textql

jeffbee4y ago

Rewriting all or parts of it in C++ would make it dramatically faster. I would start by ripping out the asserts and using a different strtod which they spend an awful lot of time in.

eatonphil4y ago

Fair point! I don't mean to say jq performance can't or shouldn't be improved.

Just that jq does two things: 1) ingest and 2) query.

Then you can query as many times as you want and not worry about ingest again until your data changes.

1 more reply

algesten4y ago

I don't get it. "Instead of learning jq DSL, learn zq DSL".

To me they look similarly complicated and the examples stresses certain aggregation operations that are harder to do in jq (due to it being stateless).

loeg4y ago

> "Instead of learning jq DSL, learn zq DSL"

jerrysievert4y ago

> They also make some performance claims towards the end of the article.

essentially a marginal speed increase they think on json, but a much bigger speed increase (5x-100x they claim) if you switch to their native format ZNG.

if I'm switching formats completely, I'm not sure why I care about jq vs zq in json performance ...

1 more reply

p5a0u9l4y ago

Yes, but fortunately, your efforts will pay dividends when parsing all the 'z*' boutique formats that it supports, zson, zst, zng, the list goes on. /s

mattnibs4y ago

Not sure if this came across in the article, but all the "boutique" z* formats are all representations of the same zed model https://zed.brimdata.io/docs/formats/zed/

enriquto4y ago

> "Instead of learning jq DSL, learn zq DSL".

A saner approach is to gron the damn json and just use regular unix tools on the data.

knome4y ago

These guys must really hate functional programming.

I can see where jq might confuse someone new to it, but their replacement is irregular, stateful, still difficult, and I don't even see variable binding or anything.

it's a bit unintuitive if you come in thinking of them as regular pipelines, but it's a constant in the language that once learned always applies.

this zed thing has what appears to be a series of workarounds for its own awkwardness, where they kept tacking on new forms to try to bandaid those that came before.

johnday4y ago

No kidding!

This part in particular jumped out at me:

aarchi4y ago

In fact, jq already has `map`, which would replace the article's pattern of `[.[]|add]` with `map(add)`. It is defined as such:

    def map(f): [.[] | f];

Many built-in functions in jq are implemented in jq, in terms of a small set of core primitives. The implementations can be inspected in builtin.jq.

https://github.com/stedolan/jq/blob/master/src/builtin.jq#L3

thaliaarchi4y ago

I find the stateless streaming paradigm in jq very pleasing.

I built an interpreter for the Whitespace programming language in jq using these concepts and it's easily one of the most complex jq programs out there.

[0]: https://stedolan.github.io/jq/manual/#Generatorsanditerators

[1]: https://github.com/andrewarchi/wsjq

qmacro4y ago

Wow, I've been on the lookout for larger jq programs from which to learn. I'm going to enjoy learning from wsjq, thank you!

mattnibs4y ago

Variables exist in zq, "this" is a reserved word: echo {x:1} | zq 'x := x+1' -

thayne4y ago

micimize4y ago

Their syntax comparison under "So you like chocolate or vanilla?" is disingenuous. You can do variable assignment and array expansion in jq:

  expand_vals_into_independent_records='
    .name as $name | .vals[] | { name: $name, val: . }
  '
  echo '{"name":"foo","vals":[1,2,3]} {"name":"bar","vals":[4,5]}' |
    jq "$expand_vals_into_independent_records"

Also, generally, not a fan of the tone of this article.

lilyball4y ago

Your `.name as $name` was my immediate attempt too, but it turns out you can go even simpler with

  jq '{name, val: .vals[]}'

diehunde4y ago

meowface4y ago

If you're doing a lot of JSON munging every day and have good mastery of something like jq or zq, you can probably get things done faster.

meepmorp4y ago

The same thing could be said for grep, or really any other utility that can have its functionality reproduced in a programing language.

eru4y ago

Indeed! Jq is basically something like grep for JSON.

It might actually make sense to embed jq functionality into your favourite language (as a library or so), as it is quite a nice and well-chosen set of functionality.

1 more reply

folkrav4y ago

Honestly, I've only really used `jq` to quickly parse JSON structures in interactive sessions e.g

  curl -s http://foo.bar | jq .some.nested.value

Anything more complicated I would indeed go for writing a proper script.

eru4y ago

Don't tell anyone, but jq is secretly a pretty well thought out functional programming language.

johnthuss4y ago

As the complexity of the input JSON grows or the complexity of your processing, it does makes sense to leave jq behind for a higher level language.

eru4y ago

I agree with most of what you say.

I disagree with 'leaving for a higher level language'. Jq is an extremely high level language.

What it is _not_ is a general purpose language.

aftbit4y ago

ris4y ago

I used to 100% agree with you, but these days I understand why so much stuff ends up being bash and jq.

orthecreedence4y ago

You can spend a few days getting to know jq or you can happily live with your 100+ purpose-built scripts. I know which one I prefer.

Granted, a lot of my job right now is data forensics stuff, so I breath this kind of stuff. You might never need jq.

pantulis4y ago

eatonphil4y ago

I'm a programmer by trade. I use jq. :)

vlunkr4y ago

I imagine this is how it's used 90% of the time, but can do lots more advanced stuff as described in the article.

johnday4y ago

Suppose you write a shell script which is intended for use among colleagues as part of a pipeline.

brushfoot4y ago

  Get-Content cars.json | ConvertFrom-Json | ? { $_.color -eq 'red' }

I prefer this approach to another domain-specific language, as interesting as jq's and zq's are.

ptx4y ago

PowerShell "sends basic telemetry data to Microsoft [...] about the host running PowerShell, and information about how PowerShell is used" [1].

[1] https://docs.microsoft.com/en-us/powershell/module/microsoft...

sandyarmstrong4y ago

> And since it relies on .NET, that also requires its own separate opt-out for its telemetry.

Building a program with .NET does NOT cause that program to send telemetry to Microsoft.

You're thinking of the .NET SDK itself. Using PowerShell does not trigger any use of the .NET SDK.

Disclaimer: I work for Microsoft.

1 more reply

brushfoot4y ago

To me a telemetry opt-out is a small price to pay for what PowerShell brings to the table, but to each their own.

> There might be other components, now or in the future, that also send data to Microsoft

Of course. Do your due diligence on whatever you install. No tool should be exempt from that.

1 more reply

vips7L4y ago

> The beauty of this is that the query syntax applies not just to JSON but to every type of collection,

This is the best part of pwsh. Everything is standardized, you're not guessing at the idioms of each command, and you're working with objects instead of parsing strings!

My second favorite part is having access to the entire C# standard library.

bblb4y ago

I've been using it since day one from 2006, every single day. It has come a long way and the current PS7 is the best shell experience there is. Hands down no contest.

Snover's passionate early presentation about the PS pipeline is a pretty cool tech video. https://www.youtube.com/watch?v=325kY2Umgw8

spiralx4y ago

> PowerShell is "Python interactive done right".

klysm4y ago

vips7L4y ago

Tab behavior is configurable. I have mine set to menu expansion.

    Set-PSReadLineKeyHandler -Key Tab -Function MenuComplete

1 more reply

ilyash4y ago

.. or you can try Next Generation Shell (author here):

fetch("cars.json").filter({"color": "red"})

# or

echo(fetch("cars.json").filter({"color": "red"}))

tl4y ago

Looking at the popularity of VSCode, I don't think Microsoft hatred blocks its adoption.

ComputerGuru4y ago

> Looking at the popularity of VSCode, I don't think Microsoft hatred blocks its adoption.

The people that would use PowerShell would be migrating from the likes of Zsh, Bash, Fish, and other “hard core free” software.

mdaniel4y ago

I conceptually like pwsh, but even as your example shows, I don't have the RSI budget left to spend on typing that extremely verbose expression every day

jq and its unix-y friends allow me to trade off expressiveness against having to memorize arcane invocations

brushfoot4y ago

I hear that, I use and like *nix too. PowerShell aliases help a lot. It comes with some predefined, like `gc` for `Get-Content`. The above example could be rewritten:

  gc cars.json | ConvertFrom-Json | ? color -eq 'red'

`ConvertFrom-Json` doesn't have a default alias, but you can define one in your PowerShell profile. I do that for commands I find myself using frequently. Say we pick convjson:

  gc cars.json | convjson | ? color -eq 'red'

That's more like what my typical pipelines look like.

The nice thing about aliases is you can always switch back to the verbose names when clarity is more important than brevity, like in long-term scripts.

Edit: Seems I've been using too many braces and dollar signs all these years. Thanks to majkinetor for the tip.

1 more reply

Arnavion4y ago

jq can not only process JSON input but also emit JSON output. So on that note, has ConvertTo-Json stopped mangling your JSON yet? https://news.ycombinator.com/item?id=25500632

pxc4y ago

Agreed— PowerShell is really nice for this, as are some of the other shells it has inspired.

AcerbicZero4y ago

anitil4y ago

abledon4y ago

There is also "JP" https://github.com/jmespath/jp

which follows the jmespath standard

mdaniel4y ago

My heartburn with jmespath is that it lacks pipelines, only projections, so doing _crazy_ stuff to the input structure is damn near impossible

NateEag4y ago

I suspect the JMESPath people would argue that if you want to do major transformations to the input, you should write a proper program, and that a CLI query tool should focus on, well, querying.

I'm personally trying to move away from jq and towards jp, because

- there's a standard defining it, not just an implementation, decreasing the odds of being stuck with an unmaintained tool

- there are libraries supporting the syntax for most of the major programming languages

- JMESPath's relative simplicity compared to jq is a good thing, IMO - Turing-completeness is a two-edged sword

- JMESPath is the AWS CLI query language, which is a convenient bonus

1 more reply

remram4y ago

From a computer science point of view, what kind of transformations are impossible to express in jmespath but are possible in jq?

1 more reply

hbbio4y ago

jq is awesome, last time I used it is... today :)

Or rather the pure Go rewrite https://github.com/itchyny/gojq which is a better faster implementation, with bugs fixed

kitd4y ago

The better error messages alone make this an improvement over jq IMHO.

mdaniel4y ago

And if it's maintained, that's also a plus, since I didn't realize jq was unmaintained, I thought it just didn't have any bugs to fix

politelemon4y ago

> HomeBrew for Mac or Linux

xenophonf4y ago

Homebrew isn't any better on macOS. Why people use it instead of MacPorts is beyond me.

sfink4y ago

But for what it does, it's awesome. I can do things like:

  % json somefile.json
  > ls
    0/
    1/
    2/
  > cd 0
  > ls
    info/
    files/
    timings/
    version
  > cat version
  1.2b
  > cat timings/*/mean
  timings/firstPaint/mean = 51
  timings/loadEventEnd/mean = 103
  timings/timeToContentfulPaint/mean = 68
  timings/timeToDomContentFlushed/mean = 67
  timings/timeToFirstInteractive/mean = 658
  timings/ttfb/mean = 6

There are commands for searching, modifying data, aggregating, etc., but those would be better done in a more principled, full-featured syntax like jq's.

I see ijq, and it looks really nice. But it doesn't have the context and restriction of focus that I'm looking for.

eloh4y ago

You could take a look at jless [1], it allows interactive selections/browsing in JSON documents.

[1] https://jless.io/

anitil4y ago

1 more reply

ratorx4y ago

I really like fx (https://github.com/antonmedv/fx) for interactive stuff. It does exactly what I think you want. You can expand individual fields and explore the schema.

However, I really do like jq for queries and scripting, so I keep both around.

ggm4y ago

This is almost exactly how I think about the problem of deciding how to deep-key to a specific field of a nested json structure.

If you can emit the syntactic form as a Python or perl ref, or a jq array ref, then I could use your tool to find the structure and the other ones to stream.

Great example! Thanks for posting this.

lichtenberger4y ago

Detect changes of a specific node and the whole subtree/subtree:

    let $node := jn:doc('mycol.jn','mydoc.jn')=>fieldName[[1]]
    let $result := for $node-in-rev in jn:all-times($node)
                   return
                     if ((not(exists(jn:previous($node-in-rev))))
                          or (sdb:hash($node-in-rev) ne sdb:hash(jn:previous($node-in-rev)))) then
                       $node-in-rev
                     else
                       ()
    return [
      for $jsonItem in $result
      return { "node": $jsonItem, "revision": sdb:revision($jsonItem) }
    ]

Get all diffs between all revisions and serialize the output in an array:

    let $maxRevision := sdb:revision(jn:doc('mycol.jn','mydoc.jn'))
    let $result := for $i in (1 to $maxRevision)
                   return
                     if ($i > 1) then
                       jn:diff('mycol.jn','mydoc.jn',$i - 1, $i)
                     else
                       ()
    return [
      for $diff at $pos in $result
      return {"diffRev" || $pos || "toRev" || $pos + 1: jn:parse($diff)=>diffs}
    ]

Open a specific revision

By datetime:

    jn:open('mycol.jn','mydoc.jn',xs:dateTime('2022-03-01T00:00:00Z'))

By revision number:

    jn:doc('mycol.jn','mydoc.jn',5)

And a view of an outdated frontend:

https://github.com/sirixdb/sirix/raw/master/Screenshot%20fro...

dan-robertson4y ago

endgame4y ago

It feels a lot like the FP idea of a zipper coupled to an interactive shell.

lichtenberger4y ago

http://brackit.io

The language itself borrows a lot of concepts from functional languages as higher order functions, closures... you can also develop modules with functions for easy reuse...

A simple join for instance looks like this:

        let $stores :=
        [
          { "store number" : 1, "state" : "MA" },
          { "store number" : 2, "state" : "MA" },
          { "store number" : 3, "state" : "CA" },
          { "store number" : 4, "state" : "CA" }
        ]
        let $sales := [
           { "product" : "broiler", "store number" : 1, "quantity" : 20  },
           { "product" : "toaster", "store number" : 2, "quantity" : 100 },
           { "product" : "toaster", "store number" : 2, "quantity" : 50 },
           { "product" : "toaster", "store number" : 3, "quantity" : 50 },
           { "product" : "blender", "store number" : 3, "quantity" : 100 },
           { "product" : "blender", "store number" : 3, "quantity" : 150 },
           { "product" : "socks", "store number" : 1, "quantity" : 500 },
           { "product" : "socks", "store number" : 2, "quantity" : 10 },
           { "product" : "shirt", "store number" : 3, "quantity" : 10 }
        ]
        let $join :=
          for $store in $stores, $sale in $sales
          where $store=>"store number" = $sale=>"store number"
          return {
            "nb" : $store=>"store number",
            "state" : $store=>state,
            "sold" : $sale=>product
          }
        return [$join]

Of course you can also group by, count, order by, nest FLWOR clauses...

preferjq4y ago

Here is a straightforward jq translation

    def stores:
      [
        { "store number" : 1, "state" : "MA" },
        { "store number" : 2, "state" : "MA" },
        { "store number" : 3, "state" : "CA" },
        { "store number" : 4, "state" : "CA" }
      ];
    def sales:
      [
        { "product" : "broiler", "store number" : 1, "quantity" : 20  },
        { "product" : "toaster", "store number" : 2, "quantity" : 100 },
        { "product" : "toaster", "store number" : 2, "quantity" : 50 },
        { "product" : "toaster", "store number" : 3, "quantity" : 50 },
        { "product" : "blender", "store number" : 3, "quantity" : 100 },
        { "product" : "blender", "store number" : 3, "quantity" : 150 },
        { "product" : "socks", "store number" : 1, "quantity" : 500 },
        { "product" : "socks", "store number" : 2, "quantity" : 10 },
        { "product" : "shirt", "store number" : 3, "quantity" : 10 }
      ];
    
    [
        {store: stores[], sale: sales[]}
      | select(.store."store number" == .sale."store number")
      | { nb:    .store."store number",
          state: .store.state,
          sold: .sale.product
        }
    ]

Try it online - https://tio.run/##rZPPUsMgEMbP5Sl2ctIZmklbe6HTg@PZJ8jkkD84Rh...

lichtenberger4y ago

The difference might be that Brackit uses sophisticated join algorithms for these kinds of implicit joins as known from relational query processing.

arwineap4y ago

I've never found jq to be particularly hard, or slow

anitil4y ago

cosmiccatnap4y ago

justinsaccount4y ago

jq performance is pretty terrible. Here I'm going to do something super simple like pull out a single field out of a large log file:

  $ wc -l big.log 
    979400 big.log

  $ du -hs big.log 
  570M big.log

`count` is a small program that counts lines on stdin. like `sort|uniq -c |sort -n`

jq takes 12 seconds:

  $ time cat big.log |jq -cr .method |~/bin/count 
  848000 GET
  94800 POST
  34000 HEAD
  2400 OPTIONS
  200 null

  real 0m12.381s
  user 0m12.427s
  sys 0m0.333s

my tool takes .5 seconds

  $ time cat big.log |~/bin/jj method |~/bin/count 
  848000 GET
  94800 POST
  34000 HEAD
  2400 OPTIONS
  200 

  real 0m0.466s
  user 0m0.512s
  sys 0m0.198s

`jj` is a little tool I wrote that uses https://github.com/buger/jsonparser

xg154y ago

A bit OT:

The post links to the tutorial "An Introduction to JQ" at [1].

Somewhere inside the tutorial, array operators are introduced like this:

> jq lets you select the whole array [], a specific element [3], or ranges [2:5] and combine these with the object index if needed.

I think this is the point where one would really need to learn about sequences and about jq's processing model to not get frustrated later.

[1] https://earthly.dev/blog/jq-select/

adamgordonbell4y ago

Oh, I wrote that. I think I get what you mean. There are two different things, and they aren't being delineated. How would you explain it?

knowsuchagency4y ago

jq is a great tool, but my favorite alternative, by far, is jello and the libraries the author has created around it https://blog.kellybrazil.com/2020/03/25/jello-the-jq-alterna...

qmacro4y ago

There's a lot of references here to jq being 'arcane'. For me, one of the challenges in improving my jq fu has been to find examples of larger programs, from which to learn.

Anyway, I was inspired enough by the article in question to write up some of my own thoughts on jq and statelessness: https://qmacro.org/blog/posts/2022/05/02/some-thoughts-on-jq...

29athrowaway4y ago

"Easier" is subjective. For simple use-cases, zq is harder to understand than jq.

I also have never seen jq as a performance bottleneck.

jq is stable, I have never encountered a bug with it and I have never seen it getting stuck after years of usage. It is dependable and practical.

jq has helped me put out countless fires throughout my career. I should donate to it one day.

pm904y ago

It took me a while to grok jq, but now that I do I kinda like it? I don't think I want to learn yet another thing.

I do like tools that complement/supplement jq though, like jid: https://github.com/simeji/jid

ilyash4y ago

While we are at it, I have a list of JSON tools for command line here - https://ilya-sher.org/2018/04/10/list-of-json-tools-for-comm...

eru4y ago

Jq being secretly a sort-of functional programming language is part of what makes it great.

Why would you change that?

gcmeplz4y ago

- https://github.com/thisredone/rb is a widely used ruby version of this idea

- https://github.com/KelWill/nq#readme is something similar that I wrote for my own use

eru4y ago

By Node, you mean JavaScript?

If yes, it's fascinating to me, that jq is so powerful, it's even useful when handling JavaScript Object Notation in JavaScript.

kaliszad4y ago

ducaale4y ago

[1] https://github.com/antonmedv/fx

[2] https://twitter.com/antonmedv/status/1515429017582809090

phibz4y ago

I think of

echo '1 2 3' | jq ....

as creating three separate json documents, each with a single number as their top level "document" , body, or content.

So of course you can't sum them. They are fed as separate documents to the jq pipeline as if you processed three separate jq commands.

Perhaps by stateless you mean no mutuable global state? But it certainly maintains state from the location in the input document to the output of each selector/functor.

IMO it helps if you have a background in some of the concepts of functional programming.

ilyash4y ago

In Next Generation Shell (author here), it is not as ergonomic (yet?) but on the other hand it's a fully fledged no-nonsense programming language... and I claim quite a readable.

good_data = fetch("openlibrary.json").docs.filter({"author_name": Arr, "publish_year":Arr})

good_data.map({{"title": A.title, "author_name": A.author_name[0], "publish_year": A.publish_year[0]}}).group("author_name").mapv(len).sortv((>=)).limit(3)

taude4y ago

I'm surprised no one mentioned rq [1] yet. It's come up before in older HN threads [2] whenever the discussion on jq comes up...

[1] https://github.com/dflemstr/rq [2] https://news.ycombinator.com/item?id=13090604

gzapp4y ago

I'm sure I'm not the only person that got fed up with occasionally needing to do something more advanced and just finding the JQ incantations inscrutable.

Also prob not the first to create a project for personal use that just wraps evals in another language haha: https://www.npmjs.com/package/jsling

bradwood4y ago

Nothing beats gron in my view.

That plus good old fashioned sed/grep/awk give me everything I need to do on the cli.

If I want more, it's python or node.

quotemstr4y ago

As an aside --- isn't the traditional flat namespace of unix command names getting a bit crowded nowadays?

anitil4y ago

We've got space for 26^2 2-letter commands...

> for d in $(echo $PATH | tr ":" "\n") ; do ls $d | grep "^..$"; done | sort -u | wc -l

> 52

I can fit a few more in

(edit: I can't work out how to put code in a comment)

henrydark4y ago

I have recently started to use jq massively, and I love it.

stblack4y ago

Why all the hate HN?

I feel the author makes his case clearly, then presents an alternative. Underneath all this is a ton of work, for which I applaud OP.

It may not scratch your particular itch, but come on!

Being an ass on HN is a choice. It happens far too often, and I wish everyone would just dial it back.

dimitrios14y ago

Do not confuse critique with hate.

Come prepared, and ready to defend your stance. If you can't take the heat, don't come in the kitchen.

eatonphil4y ago

I don't see hate for the project here.

I see criticism for the way they're trying to position it as easier than jq when it's just different than jq.

It looks like a cool project on its own and doesn't need to describe jq as confusing to make that point.

skybrian4y ago

But it is easier, for them.

I kind of think you'd need to use both tools to have an informed opinion about which you think is easier. But most of us aren't going to do that, which is fine.

I think having strong opinions about which is easier without trying them both is weird, though.

1 more reply

dimensionc1324y ago

Simple json tasks .... read from, write to, read a value and save it as a variable in BASH .... where are those examples?

The question for is this; can I do with json files what i can do with Python using Zq?

jrm44y ago

Okay, so I'm a big scripter and not much of a programmer and I definitely have found jq to be mostly worthless to me; but it also looks like zq doesn't much help?

And if you're not going to do that, say so on "the box?"

(Disclaimer, it could be that I'm an idiot when it comes to all of this and I'm missing something big. Kind of feels that way, and I welcome correction)

dymk4y ago

caymanjim4y ago

I almost gave up before I got to the first mention of zq, and then wished I had.

pygar4y ago

I really wish that a jq type program was included in coreutils (or similar). I have wanted to to use it a few times but could never be sure it was going to be installed.

tus6664y ago

The worst thing about JQ is printing out several values from an object at once. The syntax is so bad I have to look it up on SO every time.

spiralx4y ago

https://docs.microsoft.com/en-us/archive/msdn-magazine/2003/...

Anyway, I've installed ZQ and will look to use it, even my simple usage of JQ had already led to thoughts of writing my own, better version :)

kryptozinc4y ago

Is there a universal json normalizer (to csv for example) that doesn't require learning a terse language?

omaranto4y ago

There is gron [1], which prints json as a series of assignment statements that recreate the json value. It's pretty handy.

[1] https://github.com/TomNomNom/gron

harbor110124y ago

btw, in case you don't know, you can actually run jq using a curl command:

https://xbin.io/w/tool/jq

tzury4y ago

yq uses jq like syntax but works with YAML, JSON and XML.

https://github.com/mikefarah/yq

Aeolun4y ago

Both zq and jq seem like black magic to me.

xg154y ago

So, admitted jq fanboy here, but I found a lot of the criticism from the articale really sensible.

I think jq has a pretty elegant data model, but the syntax is often very clunky to work with.

So here is a half thought-out idea how you might improve the syntax for the "stateful operations" usecase the OP outlined:

I think it's not quite true that different elements of a sequence can never interact. The OP mentioned reduce/foreach, but it's also what any function that takes argument does:

In the same way, you could write a function 'add_all(x)' which calls x and adds up all emitted elements to a sum.

e.g., writing 'a >> b' would desugar to 'b(a)'.

Writing 'a | b >> c' would desugar to 'c(a | b)'.

Any steps further to the right are not affected:

'a | b >> c | d' would desugar to 'c(a | b) | d'.

Scope to the left could be controlled with parantheses:

'a | (b >> c)' would desugar to 'a | c(b)'.

To make this more useful for aggregating on input lines, you could add a special rule that, if the operator is used with no parantheses, it will implicitly prepend '(., inputs)' as the first step.

So if the entire top-level expression is 'a | b >> c', it would desugar to 'c((., inputs) | a | b)'.

Summing up all the bazzes could be written as '.baz >> add_all' which would desugar to 'add_all((., inputs) | .baz)'

...and so on.

Any thoughts about the idea?

spiralx4y ago

marmada4y ago

I see a lot of JQ experts on this thread, so I'll bite the bullet here as a novice.

We're not there yet, but in the meantime if there's another tool that allows me to know less in exchange for doing more, I'll gladly use it.

preferjq4y ago

I completely agree when your goal is GSD just use the tools you have.

When you have time to sharpen the saw come back and dig into the details of how jq and tools like it work and where their limits are. Looking at the jq builtins[1] can be very enlightening

1- https://github.com/stedolan/jq/blob/master/src/builtin.jq

boyter4y ago

phil2944y ago

This is one of the purposes I think Deno should have been built for: Use JavaScript for oneliners in the command line. We had

    ... | deno xeval '...stdin processing code using special var $'

which was close to xargs in terms of conciseness. Unfortunately, it was removed as being considered "too niche" [1].

[1] https://github.com/denoland/deno/issues/3230

dotopotoro4y ago

> know less in exchange for doing more

That is very rare event with established tooling.

Most of the time complexity is just shifted around.

nixpulvis4y ago

No, not ideally.

English descriptions will never be completely unambiguous and unique keys into a JSON data structure. There is a very good reason programming languages (and other forms of languages) exist.

ctur4y ago

It takes a while to get to the point so I’ll save others some time and tldr this very lengthy and agenda-driven blog post:

Edit: mea culpa, turns out you can download the source (revealed half way through the article).

loeg4y ago

Closed source? https://github.com/brimdata/zed/blob/main/runtime/query.go

Yes, it’s an obscure query language. But if you were interested in jq, that clearly wasn’t a barrier to entry.

I agree the author is happy to show off their tool, but disagree that that is somehow disqualifying. They made a cool thing, they’re allowed to be proud about it.

mdaniel4y ago

And it's BSD 3 Clause, for those interested: https://github.com/brimdata/zed/blob/v1.0.0/LICENSE.txt

j / k navigate · click thread line to collapse