1) Your C++ example, we have "namespace 'v8'" -- which has been removed and reinserted. That made me scratch my head a bit. Has it been refactored enough that this is kind of a rewrite? If so, there might be a third or fourth color here for "hey, this part didn't /change/ per se, but everything under it did"
2) Your Python example: right at the top we have a new class, PairIterator, with "class" rightly green. However I'm fighting against years of reading textual diffs that suggest the whole block is really the new thing. Could the block highlight the class as-is (where it changed) and subtly color the rest of the tree (what else changed with it)?
3) I get your whole in-the-future-we-store-ASTs argument, but that's certainly not the case today. Today we store text, and I don't see that changing. Could there also be a diff, perhaps even another mode that, counter to just dealing with structure, deals with formatting? ie, find the ranges that contribute nothing to the AST and then diff those textually.
I like the ideas and the paradigm shifting -- and the real answer is likely somewhere in between, because, at the end of the day, programmers are still editing things in text, even if they are manipulating greater structures.
As a diff tool, I'd introduce this to my workflow in a heartbeat if I wasn't working so hard to interpret what the diff means.
Agree. I think it's likely code will be stored as text, but parsed into ASTs easily. Consider Go standard library has a full language parser built right in, so to get an AST from a .go text source file is about 3 lines of code.
Because you can get the code back from AST easily (while maintaining spacing), it really doesn't matter what you save. You can use tools to edit the text form or AST form without any duplication.
When you bind something (reference a variable), you just point at that variable's node, rather than mentioning it by string. This removes a huge class of artificial problems brought on by plaintext source code (symbol name clashes, namespaces, overeager imports, var name typos, shadowing, and other programmer-compiler miscommunications), but introduces some editor UI concerns (e.g. indicating shadowing).
There are a whole host of other advantages of saving as ASTs directly. One of them is granular, semantic diffs like in the OP. I'm convinced it's the future... there are a lot of UI problems to solve in a solid, practical editor for it though.
The theory goes that students who share solutions will probably change the variable names, do some reformatting etc, which would fool a text diff. But the actual structure of the AST will be the same or similar.
So if you find close matches, you inspect them more closely.
Some quick Googling reveals that plagiarism detection using tree comparisons is a common idea.
Basically it was a fun project that got me a little into Lex, the kind of odd stuff I do in Saturday afternoons
The one interesting addition my project had was merging. The neat trick was that we reused the same tree diff algorithm to find conflicts :P. With a bit of work, we would have some very neat features, including the ability to resolve certain conflicts which physically overlap.
[1]: http://jelv.is/cow
The hurdle is that hashes aren't user-friendly in a text-based code editor. We need an editor that lets us view and work at the right level of abstraction.
Just create an executable named git-WHATEVER and put it somewhere in $PATH (or $GIT_EXEC_PATH).
The coder who wrote it (@wmacura) now works at Tumblr.
[0] http://git-scm.com/book/en/Git-Internals-Git-Objects#Tree-Ob...
Related to this, I wish there was a way to navigate a programme based on a tree, not just line and character navigation.
EDIT (fixing formatting, and …): I knew something else was nagging at me. The idea of a structural grep makes me think of Rob Pike's structural regular expressions (http://doc.cat-v.org/bell_labs/structural_regexps). The Sam editor (http://sam.cat-v.org) is based on this, and I thought I remembered a more modern version called something like 'e'; but, of course, such a name is impossible to Google.
at the end of void ArmDebugger::Stop(Instruction* instr) (rhs of http://www.cs.indiana.edu/~yw21/demos/simulator-mips-simulat... )
we have an alignment of ' 2 * Instruction::kInstrSize' to some random code at the end