Probably onto something though. Try different diacritical symbols and see what sticks. Given how '"' looks, maybe combining above or below needs to vary by character. Above probably looks awful for '.' and ','.
Really I think the strikethrough might suffice. The only way to know for sure is to take away the color highlighting, so my brain doesn't use it as a crutch, and see if people can still read the diff.
There has to be a better method.
https://gist.github.com/jrockway/73982949b3d2ce9b443528042c4...
My program runs in less than 10 milliseconds (/usr/bin/time reports 0.00 seconds), and graphtage takes 5 minutes and 17 seconds. I'm 317,000x faster! (Not including the time to write the program; if you do that, then it's about even assuming graphtage took 0 seconds to write.)
Graphtage prints the entire file in JQ colors, with diffs inside fields colored red and green, which I love:
...
"hostIP": "10.136.13̟9̟2̶1̶.139̟1̶",
"podIP": "10.244.1.18̟6̟7̶",
...
(My terminal can't display the dots under the numbers and the strikethrough, but it looks great on HN! I really love it.)My program produces relatively boring line-by-line diffs:
- "hostIP": string("10.136.121.131"),
+ "hostIP": string("10.136.139.139"),
"phase": string("Running"),
- "podIP": string("10.244.1.17"),
+ "podIP": string("10.244.1.186"),
Honestly, I get what I want out of mine, and wait 317,000x less time, so... I probably won't be using this on a daily basis. But I will be stealing those dots and Unicode strikethroughs.Usecases might be a little bit different, but please allow me to share my solution.
The problem with diffing JSON and yaml is, that these formats aren't line based and hashes don't need to be ordered. But there is gron to turn json into a greppable line-based format [1]. Then you can sort. The sorted output is possible to diff now and then you can color the diff output with delta or a similar tool [2].
diff -u <(kubectl get pod pod1 -o json | gron | sort) <(kubectl get pod pod2 -o json | gron | sort) | delta --light --word-diff-regex="\W+"
This output provides a lot of context for me to see and understand the differences.[1] gron https://github.com/tomnomnom/gron
[2] delta https://github.com/dandavison/delta
According to their readme they don't just match on keys, but even try to detect changed keys for the same content, even when the two files have a different inner order of elements.
Your diff is probably equivalent to a pretty print and then running regular diff on it, i.e. not even sorting the file.
Having said that and assuming your file wasn't extraordinary large, a 5min runtime makes this tool kinda unusable.
Proper tree diffing is a really hard (I would say unsolved) problem. The "standard" algorithm is O(N^4)!
Diffing: 0% ... 0/93195 [00:00<?, ?it/s]
Tightening Fringe Diagonal 48 of 792: ...To understand why the sequences problem is quadratic, consider a sequence A of length m being doffed with a sequence B of length n. We want to express our diff in the minimum number of operations where an operation is removing, adding, or editing an element in the sequence. Construct a graph as follows: the nodes will be the points on an mxn lattice corresponding to points in the two sequences. An edge going right means “delete this item from sequence A,” and costs (eg 1). An edge going down means “add this item from sequence B” and has a similar cost. An edge going diagonally down and right means to edit the item in A into the item in B and it’s cost depends on how different they are. The problem is to find the shortest path from the top left to the bottom right.
If you could compute the entire graph for free and then applied something like Dijkstra’s algorithm you would be worst-case quadratic (if all the diagonal costs were 2 or more, you would need to touch every node).
There are a few ways you could try to improve this:
1. Look for easy opportunities to optimise. Eg you could have a patience style strategy of cutting off any common prefix or suffix. This won’t help in the worst case.
2. Limit to a fixed width diagonal. This might mean worse diffs but means the graph search problem becomes more linear. I suspect something is going on with the diagonal based on the description
3. Somehow develop some good heuristics and use a better search algorithm like A*. This might not help in the worst case
4. Something else.
Diffing: 0%| | 153/93195 [30:02<151:31:17, 5.86s/it]
Tightening Fringe Diagonal 82 of 792: 48%|...| 7453/15514 [00:30<00:35, 226.08it/s]https://trailofbits.github.io/graphtage/latest/howitworks.ht...
So not a tree algo, but an adaptation of a list-diff algo? Or is this just a note on how the tree-diff compares sequences?
It's difficult to search for HTML diff libraries these days because all the hits are vdom like things, instead of diffing HTML text for development / testing.
https://github.com/Teamwork/visual-dom-diff
Which is quite good and fast. Encodes HTML tags as Unicode chars, calls diff-match-path and uses diff to build a final visual output.
A few weeks back I started my semantic JSON compare too: https://paldys.github.io/semantic-json/
Your data stays in the browser. The list compare is pretty naive at the moment, and it doesn't allow key changes as Graphtage promises.