I rarely use functional programming but I certainly see its appeal for certain things.
I think the concept of immutability confuses people. It really clicked for me when I stopped thinking of it in terms of things not being able to change and started instead to think of it in terms of each version of things having different names, somewhat like commits in version control.
Functional programming makes explicit, not only which variables you are accessing, but which version of it.
It may seem like you are copying variables every time you want to modify them but really you are just giving different mutations, different names. This doesn't mean things are actually copied in memory. The compiler doesn't need to keep every versions. If it sees that you are not going to reference a particular mutation, it might just physically overwrite it with the next mutation. In the background "var a=i, var b=a+j", might compile as something like "var b = i; b+=j";
I think it confuses people because it’s framed oddly. Immutability isn’t about being unable to mutate state, it’s about no longer using containers (registers) as variables, such that the equal operator actually means “equals” as opposed to “store”.
In most programming languages, the equal operator works as a “store” operation, which stores a value in a named container/register. In “immutable by default” languages like Haskell, the equal operator actually means “equals”, as in “is synonymous with”.
The essence of immutability is referencing values directly, through synonyms, as opposed to storing them in named registers for later retrieval. When it’s done this way, immutability no longer makes sense: is the number 3 mutable? Can the number 3 be mutated into 4, or are they just two distinct numbers?
Believe it or not, many compilers internally represent mutable variables as a sequence of immutable variables: https://en.wikipedia.org/wiki/Static_single_assignment_form
Edit: I should clarify that I mean "immutable" in the context of primitive values like integers.
Basically, if you use persistent data structures and a unified application state, you can keep a list of all previous application states and you can browse it or ship it for debugging, it's not that expensive.
[0] in debug mode, you get a list of all events having occurred in your application and can instantly move back to that state, and you can import/export that state history: http://elm-lang.org/blog/the-perfect-bug-report
Git lets you do version control via full snapshots as opposed to just tracking diffs (even though it does actually do this too behind the scene).
You can think of a full snapshot as saving a copy of your project structure every time you do a commit. The key trick is that git doesn't actually create new copies of the content for each commit but simply maintains a tree structure whose nodes are pointers (via hashing) to the content they represent.
The complication from git is not in understanding the core concept but knowing how best to apply them. There are all sorts of crazy workflows you could implement by manipulating git pointers and their associated patches. As with anything that is flexible, difficulty comes in knowing how to constraint yourself when using it.
More precisely, it doesn't create new copies of content that you didn't change. For example, if you have 100 files in your repo and you change one of them and then commit, git creates a new copy of the content of the file you changed--a new blob storing the new file content--and a new tree object that references the new blob instead of the old one, plus the other 99 blobs that store the contents of files you didn't change; the new commit object then references the new tree object (plus the message and metadata). But git never stores diffs between old and new content; it just creates a new blob every time the content of a file changes.
Git pack files compress objects by storing them as diff files going backwards. That is, it stores the most recent state in full, then uses patches to go backwards. Because you're more likely to need a recent version in full than an older one.
One more step is to point that futures / promises, and even lists, are monads that a e.g. JS programmer uses every day, too. It reminds me of the old literary character who did not know that he's been speaking prose all his life.
That's completely orthogonal to whether the version history is immutable with each branch being like singly linked list where we "cons" new things onto the front.
> the version history is immutable with each branch being like a singly linked list where we “cons” new things onto the front
Just contributes to the general confusion around git where people decide it’s too complicated to learn.
(Apologies if my sarcasm detector is just broken today)
http://tom.preston-werner.com/2009/05/19/the-git-parable.htm...
if git killed svn, datomic kills postgres
Hash trees are used in the IPFS, Btrfs and ZFS file systems, BitTorrent protocol, Dat protocol, Apache Wave protocol, Git and Mercurial distributed revision control systems, the Tahoe-LAFS backup system, the Bitcoin and Ethereum peer-to-peer networks, the Certificate Transparency framework, and a number of NoSQL systems like Apache Cassandra, Riak and Dynamo.
A Git history is a DAG[0] (each commit can have multiple parents) and beyond that a polytree (it can have multiple roots); while a blockchain is an arborescence[2] (there's a single root — the "genesis block"; and each block can only have a single parent).
Further, beyond the technicalities blockchains are generally very linear (the side-chains tend to be pretty short, forks aside) while Git repositories can be extremely broad (have lots of concurrent branches).
[0] https://en.wikipedia.org/wiki/Directed_acyclic_graph
[1] https://en.wikipedia.org/wiki/Polytree
[2] https://en.wikipedia.org/wiki/Arborescence_(graph_theory)
The answer depends entirely on definition. What properties of bitcoin are essential to a blockchan vs which properties are simply how bitcoin happens to use a blockchain?
If blockchain just means the Merkle tree, then yes.
If it means Merkle tree + a computational consensus system for adding nodes, then no.
Git also uses a Merkle-DAG, but it is not a blockchain.
The Best, when it comes to data structures is a Directed, Acyclic Graph. For instance your typical linux filesystem is a DAG. But there's one problem with DAGs: When they reach a certain complexity human brains are not fit enough to parse them anymore. (programs still can though)
So in many circumstances at least a human programmer needs to take a look at the state of your program and make assumptions about its correctness, which is called debugging. And that's why in Good programs we often use Good data structures instead of The Best.
Good data structures are key->value stores (which you may know as "hash tables" or "dictionaries"), trees, and trees in a simplified special form: lists, each of them being somewhat able to represent the other two, if one can accept a performance hit and/or increased complexity in source code. Dictionaries, trees, lists. That's it. And you do that in every programming language that is at least a little bit interested in being Good.
So there's nothing special or functional about git's data structures, it's just normal Good programming, and a few programmers who are so good at programming that they don't even need to mention it anymore, they breath good programms.
Then of course to the normal bread-earning coder good programs are a rare sight. But the reason is not that they are really rare, the reason is that successful business doesn't really require Good programs to succeed. Mediocre programs are good enough to earn their rent, and most of us spend most of our coding hours to earn our rent.
All that being said, if you don't just want to make money, go and spend some time studying git internals. It will teach you a lot more than most of your teachers/professors taught you combined. Sadly the source code is written by Linux gurus, who like to encrypt their source code with a very special key that only people from their tribe can understand. But the Git Book is actually good enough that you can study quite a lot of the internals from that book. I also suggest writing your own git in your favorite programming language once, to really understand it.
Why is it that whenever functional programming comes up, "real programmers" come out of the woodwork to bloviate, only to expose just how little they actually know about the topic?
"Purely functional" in this case is a jargon term that relates to Okasaki's 1996 PhD thesis, available online (https://www.cs.cmu.edu/~rwh/theses/okasaki.pdf). It's a classification for data structures much in the way "regular" or "context-free" or "turing-complete" serve as classifications for grammars.
For a data structure to be "purely functional" (and no, article author does not mean "good" here) simply means that it's implementable in a pure functional language without mutation. Examples of non-functional data structures would be ones where mutation is intrinsic to how its algorithms work: traditional linear probing hash tables, union-find for graphs, etc.
By this technical criterion, the git object graph is clearly a purely functional data structure, sorry.
Beyond that, you seem to have shown misunderstanding of what that means (despite your claims of "I understand but I disagree"), as the other examples you give DAGs, dictionaries, trees, lists, and this comment thread, do not have the interesting properties of purely functional data structures that make them worth discussing in the context of Git. Namely, the immutability of existing entries.
The thesis of the article isn't that purely functional data structures are Good, or that functional programming is Good. Those are quality judgments that are not present in the article. The thesis is: Explaining Git in terms of the purely functional data structure it uses might be helpful to learning Git.
I believe mutability is a trade off. You can make objects more immutable by being less efficient in terms of storage size, search time, source code simplicity. In exchange for these negative attributes you get source code you can proof more easily. It's good when that attribute is needed, but not most of the time.
Git is only immutable on that meta level where we talk about the object key-value store. The implementation uses mutable files though, and even steps away from that immutable key-value store when the size gets too big and needs to store more efficiently.
And on the user level you can amend, edit, squash and delete commits easily, which is actually a strength of git not a weakness. People love it for being more mutable than SVN.
I think the author is using the term "purely functional data structure" to mean a data structure that lends itself to an implementation in a purely functional language [1,2].
[1] https://en.wikipedia.org/wiki/Purely_functional_data_structu...
[2] https://www.amazon.com/Purely-Functional-Structures-Chris-Ok...
A round wheel is more useful for a car than a triangular wheel. That doesn't mean it's a "car wheel". It's just as good on a horse wagon or a bike.
[1] man git-rebase
What I've gathered so far from reading the article and the comments is that some people who are in the know about a very specific paper agree that Git is a purely functional data structure. And that others look at the ways you can use Git and point a finger and say, "Look! It can be mutated! Therefore it cannot be functional!" And the response to that is, "Don't be so technical about how you define functional. Or immutable. You know it when you see it."
Is this some kind of Obi-wan Kenobi from a certain point of view stuff? Why is this so difficult to get a handle on?
If a thing says immutable on the tin, and it's mutable, how is that purely functional? I know, read the paper. I know. But still, it's a legit question.
It seems to me that a data structure so amazing as being purely functional shouldn't be so easy to misunderstand as what we're seeing here. And it's clearly being misunderstood. And not only by me.
This stuff isn't obscure just because you don't happen to know about it already. Chris Okasaki's publications have been cited over 1000 times, mostly for his PhD work -- those papers, especially on functional data structures and amortization analysis in lazy languages, are considered foundational for a whole research area in Computer Science.
> Is this some kind of Obi-wan Kenobi from a certain point of view stuff? Why is this so difficult to get a handle on?
Did you learn calculus overnight, or expect to understand a technical conversation about differential equations and Cauchy sequences without taking two years of classes or reading a couple of really thick books first? Why should this be any different?
> If a thing says immutable on the tin, and it's mutable, how is that purely functional? .... It seems to me that a data structure so amazing as being purely functional shouldn't be so easy to misunderstand as what we're seeing here. And it's clearly being misunderstood. And not only by me.
Sigh. Explaining it properly would involve a tour though the untyped lambda calculus, simply-typed lambda calculus and the Curry-Howard isomorphism, a discussion on denotational vs. operational semantics (Hoare logic, functional interpreters, type-preserving compilation, small-step vs large-step operational semantics, etc).
TL;DR: "purely functional" is a description of the program's meaning (in a technical sense), not its implementation.
Got it. Thanks for the attitude.
I think I can offer something to the discussion here, as I'm straddling the 2 camps - having a fairly intricate understanding of git (I've held training seminars at my company about it), and basic familiarity with functional (technically "persistent") data structures thanks to a long-standing interest in Clojure.
What the functional camp is talking about, when they refer to git as a "purely functional data structure", is the core of git's implementation - commits are immutable snapshots of the repository arranged in a directed acyclic graph (or a tree, if there are no merge commits), and branches are "just" movable pointers to some commit. In this view, everything other than this core model of history as a graph is extra details layered on top.
The other camp (let's call them the "git camp") looks at git in its entirety, including branches, tags, the index, the working tree, stashes, and so on, and they can't help but come to the conclusion you mentioned - "Branches are mutable references! The index is mutable! The working tree is mutable! How can git possibly be called a functional data structure?"
While this 2nd perspective is understandable and technically correct, I think it misses the point the other camp is trying to make: that the immutable commit is the "fundamental unit of git" so to speak, that we interact with everyday, and that most of git's history-manipulating commands (git commit, git commit --amend, git rebase, git merge, git cherry-pick) can be described in terms of purely-functional operations on the commit graph. Once you understand this, many complex git scenarios become easier to understand (rebase, and filter-branch, for example), so this view does have practical value. Then branches, tags, the index, stashes, etc can indeed be understood separately.
In fact, this perspective of git as a data structure is so useful, that I rarely use "git log" as it is, preferring these additional flags to view the entire graph instead:
git log --graph --all --decorate --oneline
I hope that helps, cheers!