1. PaC-trees: Supporting Parallel and Compressed Purely-Functional Collections - 2022: This paper introduces PaC-trees, a purely functional data structure that supports parallel and compressed collections. PaC-trees are designed to be efficient in terms of space and time complexity while maintaining the benefits of functional data structures.
2. Proving tree algorithms for succinct data structures - 2019: This paper discusses the development of tree algorithms for succinct data structures, which are compact representations of data that support efficient query and update operations.
These is some work in other fields too:
1. CyBy2: a strongly typed, purely functional framework for chemical data management - 2019-12-30: This paper presents CyBy2, a purely functional framework for managing chemical data. The framework is designed to be strongly typed and enforce referential transparency through the type system.
2. chemf: A purely functional chemistry toolkit - 2012-12-20: This paper introduces chemf, a purely functional chemistry toolkit that provides a set of algorithms and data structures for working with chemical data in a functional programming context.
What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=11056704 - Feb 2016 (42 comments)
What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=7081191 - Jan 2014 (17 comments)
What's new in purely functional data structures since Okasaki? - https://news.ycombinator.com/item?id=1983461 - Dec 2010 (2 comments)
What's new in purely functional data structures since Okasaki - https://news.ycombinator.com/item?id=1713594 - Sept 2010 (1 comment)
I've been working a lot with Trees in Clojure, and have been hitting serious limitations of my understanding.
I also found this YouTube video from a Clojure conference that reviews some different strategies for tree traversal in Clojure: https://youtu.be/YgvJqWiyMRY
I thought that learning a Functional Lisp would make it really easy to traverse trees, since that is that the language is doing to actually execute its code.
Turns out all the stuff I want to do is simply hard.
Functional data structures are really awesome though, it just seems to take a bit of up front investment.
I really enjoyed Friedman's book `Scheme and the Art of Programming` because it filled in some pieces missing from "The Little Schemer" (and "Seasoned Schemer"). Building stuff like `kons`, `map` and all that `letrec` stuff.
But the big difference between Scheme and Clojure is that in Scheme, while it's "functional by concept," you get to escape with `(set! ...)` whenever you want to have a more traditional ds/algo (like say doing the n queens problem).
In Clojure you can kind of do that by either escaping to Java, using protocols (with atoms), or using transients, but I often feel like there's a "more functional way to do stuff that hasn't been taught to me."
I've opened up either the Okasaki dissertation or book or both, but I've always had trouble reading it, and then sticking with it. And some stuff like looking at Rosetta code and saying "to reverse a linked list in a lisp is easy... because it's a cons cell" seems like cheating. Almost like showing up to an interview, saying your "linked list" is implemented in an array structure and then calling `reverse()` on it.
Will watch that talk from 2014, must not have seen it before.
I guess, conceptually, day to day things in Clojure does feel pretty natural, even easier, and I think I have a decent understanding of it. But then when I look at leetcode type problems, or something more involved, it takes a lot of mental effort to translate to it. Especially things like `big O` gets thrown away in my mental model. I get it, persistent data structures and all that, but there's still a mystery there.
I think there is real value in starting with the pure functional versions, then swapping out when needed. One problem that seems fairly common in large codebases is using an object as a hash key. but as the code grows, that object sort of gets passed around and eventually somebody updates it without rehashing. That's a tough bug find. They are for me anyway.
This is one of those rare cases where you can actually make it faster later without trashing the overall design.
I'd encourage starting with the pure functional version first, every time. I'd go further and say, leave some opportunity to flip back to the pure functional version in dev builds for isolating heisenbugs.
Blah blah, grain of salt, free advice, your milage may vary, Games have different requirements than webpages, everything is contextual.
This is one rare case where it's always worth having the capability of swapping back and forth is worth it. Just use it, and think hard before giving it up is a really good default.
You might not be running GHC's RTS on a resource-constrained target platform but you can generate the C code that could run on that platform from Haskell, leveraging the guarantees you get with using Haskell's advanced type system.
Update: point is that GHC can optimize code and some of those optimizations can avoid pointer chasing! But like most optimizing compilers, some times GHC can miss them, and you have to do a bit of work to tell GHC where it can avoid chasing unnecessary pointers (or where it should/should-not inline, give it some help to know where it can do fusion, etc).
Tracking diffs and versions of data over time: Version control systems and RDMS just don't cut it.
Bitemporal databases are interesting to me as well.
One of the key points of immutable datastructures is that they're easy to reason about given that the invariants are upheld. Designing an efficient memory representation that upholds those invariants is an entirely separate problem domain. Not every pointer dereference is slow, and not every tree is represented with pointers.
Okasaki fleshes out a couple of examples and explains his thought processes and heuristics for developing such data structures quite well; but they're very much still notes on the early-stage exploration of the concept.
From a historical perspective it's still a fascinating read though and definitely recommend it if you want to read up on the origins of immutable data structures.
It's not weird, this is standard in academic computer science. It would be weird to do otherwise. In a theoretical dissertation/paper like this you can't just randomly bring up compiled assembly, it's completely and utterly off topic, it's not any more on topic than bringing up if the code was ran by an interpreter, or JVM, or transpiled to Haskell, or ran on GPU etc...
-- Edsger W. Dijkstra
However, I do believe that astronomers put references to the actual instruments they used in their publications.
There are definitely plenty of pure theory papers as well, and pure theory is still a contribution. However one would at least hope to see subsequent work that does focus on the practical aspect.
Yes, but that's not a concern of a computer scientist. Implementation and execution of the algorithm are up to the reader.
It's like complaining that engineers don't do enough novel computer science research; of course they don't! It's not their job.
One other thing I want to add, though, is that this book was published in 1996. The CPU landscape at the time was far more diverse, both in terms of ISA (which assembly?) and also in terms of capabilities. Even on the x86 line, plenty of people were still using 486s, which weren't even superscalar, and thus had pretty strikingly different performance characteristics than, say, a Pentium Pro (a "modern", OoO CPU). Tying your algorithmic analysis to a particular piece of hardware would be pretty useless; providing a satisfactory look at the performance on so many different (micro-)architectures could risk overwhelming the content.
Moreover, at the time, performance was, maybe, a little more straightforward, making your concerns perhaps not quite as important. Memory was slower than the CPU but not the way it is now. Caches were much smaller and less useful. Branch prediction was more primitive. CPUs had relatively weak reordering capabilities, if any at all, and simpler pipelines. There were few dedicated SIMD instructions. This was a world where linked lists still often made sense, because the hardware penalties you paid for them were smaller. In other words: in 1996 the on-paper performance wasn't at quite as big a risk of diverging quite so wildly from the real-world performance, so keeping things simple and focusing on the on-paper performance was less objectionable.
Mathematics is the science of modeling and abstraction and that abstraction exists in order to make the process of knowledge gathering easier. It works very well for the sciences, so I would say the method has been proven. Of course, if the models turn out to be completely inappropriate, that would be different, but so far, the simple machine models used (implicitly or explicitly) in the analysis of algorithms seem to be rather reasonable.
The alternative that you suggest, trying to benchmark concrete implementations of algorithms has so many confounders that it would be very hard to execute properly. I'm sure people are doing this too, but the formal Big O proof has a different quality to it because it does not depend on those particulars.
Is there? If so, please do share a reference.
It is a good book for learning. It is a decent book for reference. When you want to really fly you will want to reach for more recent work.
In my case, yes.
Of course it is of contemporary relevance; functional programming (FP) is all the rage.
The tricky bit are questions like
How does this jive with existing JS functional constructs like "fantasy land," for example.
How to "recruit" more folks to FP, or even a hybrid approach of objects interacting functionally
Game jams using more FP-like data structures? Or more HN submissions like that.
The harder things to evaluate are a lot of other topics, news-like but investigative and curious, or sites that are essentially selling a service (versus teaching the mechanism behind it).
For SaaS stuff, since HN is about startups, I have to let it slide. But the hacking piece is when one person accomplishes something with persistent decomposition of sequential problems, or does something clever using tools or ideas from a different context.
To Queue a Mockingbird
The Call-stack of the Wild
Tesselation of the d'Urbervilles
One Flew Over the Cuckoo Hash
The Data of the Woosters
Brideshead Re-visitor
The Catcher in the Trie
Les Miser-tables
The Nat-elim of Monte Cristohttps://en.wikipedia.org/wiki/Trie#History,_etymology,_and_p...
and the counterpart, Purely Fictional Languages:
The Catcher in the *Try*My very direct experience recently has been Scala + cats resulted in the same buggy nonperformant software it was meant to prevent. I understand that bad programmers produce bad programs, regardless of language, but I feel pretty strongly that good tools prevent some amount of the typical "bad" that makes bad programs bad (ignoring the obviously maliciously bad examples). So I don't really understand, and would like to understand, if, how and when pure FP (and I suppose FP in general) actually improve quality of code/life outside of toy examples.
When you debug, you just give the program the same input which was problematic and you get to reproduce the error.
Persistent data structures make it less wildly inefficient to do so.
reads a bit weird otherwise - sounds like it's discussing a particular purely functional data structure when it's actually a survey of many (pretty much the canonical survey of them, in fact).
I'm making a broad, high-level presentation about immutability in technology. At my company we have folks who have heard of it in the context of ransomware-resilient backups, others who have heard of it in the context of infrastructure as code, and very few who have heard of it in terms of data structures (distributed and non-distributed). My goal is to showcase the concept in various contexts so that people can better understand its role as a key design choice in technology.
Personally I have no experience working on software that utilizes these, so if others here do, I would appreciate your input on how these make your software more reliable.
The emphasis on software reliability and bugs-avoided is because the audience works under the company's risk-management division.
Programming with immutable values and the more declarative style they support does design out at least one infamous source of bugs: shared mutable state. That has become increasingly important in recent years, for example because modern chips support ever increasing numbers of concurrent threads in hardware and because a lot of the software we write is now part of distributed systems.
That programming style has other advantages in terms of readability and reasoning about code. If you can be confident that a particular name in the code always refers to the same value once it’s been defined, tracing the flow of data through a function or through an entire system often becomes much easier. Having more understandable code naturally tends to improve reliability, among other advantages.
If we adopt that programming style then we still need to represent collections of values and different relationships within our data, but we can’t always use “traditional” data structures to do that any more because many of them rely on mutability, which we have excluded. So we need other ways to represent structured data that are compatible with immutability and declarative style, and that is what purely functional data structures give us.
Once you're running a distributed system, this kind of stuff comes into its own.
At large, in distributed software, this is a distraction. Since most passing of messages around from one distributed piece to the other was already doing a copy across mediums. Such that sent data was already immutable from the senders perspective. (Granted, the preparation step can have you step on your own feet.)
Immutable structures can help significantly in reducing race conditions in parallel code, as it is impossible for one thread to modify part of a data structure while another thread is accessing it. Each thread sees a consistent 'version' of the data structure until the code explicitly replaces it with a new version.
Immutability can also help with memory optimizations: A node of one data structure can be safely re-used in another data structure. For example, if you need to retain older versions of a tree, you can share common nodes (this is how GIT encodes version changes).
At a high level, immutability forces you to be extremely deliberate about state changes in your application. This improves reasoning/understanding, reduces bugs, and eases debugging.
An example of immutability that you might be familiar with would be react props/state. You don’t modify your state. This makes reasoning about state much more simple.
The high-level explanation I recall reading years ago somewhere (Joel on Software, I think??) was that a) Functional programs have no side effects, leading to the notion that b) They can, without much effort, be adapted for parallel tasks.
The example here was MapReduce, which was the original guts of the Google Search algorithm.
E.g, Things are fast because we can send your query to many, many computers to find an answer, and chances are good that that machine is (relatively) close to you, which means low wait times to get your answer.
You wouldn't drop these into an ecosystem where the programmers are used to dealing with mutable structures.
Now, whether using a purely functional language correlates with writing less bugs is a separate discussion, not sure what the modern consensus is.
Too good a phrase to pass up!
Modern consensus is done by constructing a replicated log - an append only data-structure shared across multiple workers. It's what you use Paxos for, and it's what Raft streamlines.
They seem magical! To be able to combine data from the past with current data, efficiently!
They are almost always trees, with the exception of skip-lists, with all operations O(log(n)), .
After creating my own programming language Enchilada that is based on immutable data structures, I started considering what I deemed "next level":
Uniquely represented confluently persistent data structures
Combined with a Merkle tree encoding of such uniquely represented data structures (they are almost always trees), you can efficiently and incrementally authenticate them. Think 'block chain' on steroids, with incremental cryptographic hashes. Or torrents, if you are into that kind of thing.
You can also safely re-use sub-structures without performing a deep copy. For example, if you want to keep a sub-tree around for later you can do that in O(1) time because it's safe to keep a reference to it. If it is a mutable tree you don't know what's going to happen to it so you need to do a deep copy of the entire sub-tree you're holding on to. This can save a lot on memory allocation and copying depending on your use case.
Traditionally, OO code is written all the time.
But after I learned JAX/Flax, a light turned on inside my head and I now write functional Deep Learning code as much as I can. All my side projects and new code are purely functional in JAX/Flax. PyTorch has functional API known as functorch, and I have used it one project.
Where lots and lots of data in 3,4,5 dimensional tensors exist, and you need to run lots of transformation on them, and then you need to multiply a huge number of them thousand times in each second- functional code makes much more sense and gives immense sanity and peace of mind.
Those of you writing Deep Learning code, learn functional programming principles (immutable data, pure functions, leaving no side effect, etc.), and apply them to DL via functorch or JAX.
Your life will never be the same.
Where to learn about it other than the documentation?
There are obviously other trade offs as well, like performance and memory usage.
var m = new HashMap<K,V>()
m.add(k, v)
the Scala one you "update" by creating new maps, var m = Map.empty[K, V]
m += (key, value)
In practice it's mostly the same except you can share the immutable one around without being scared someone will mutate it, but it takes more memory per instance than the mutable version and creates more GC churn, which may or may not be an issue.E.g., in this case, to describe a data structure as "purely functional" makes zero sense to me intuitively at first. You need to go read the thesis and realise they're referring to data structures implemented as algebraic data types, which in the context of a purely functional language can themselves be thought of as functions, and can therefore be described as 'functional' ...
But unless you do that, the first thought is going to be "huh? can arrays be further split into imperative vs functional?" "Does he mean immutable?" "Can I use functional arrays in c?" "Are they faster/safer than normal arrays?".
By contrast, I think "Purely Functional Data-Structure Types" would have been a far more intuitive term ... but I suppose the author may have felt that clarifying further wasn't punchy enough, and could have made the thesis more wordy...
That's like saying that Norwegian is much less readable than Italian. It is in the eye of the beholder. They can both express the same concepts but which one is more readable depends on which one you already know.
Purely Functional Data Structures in Elm – course lecture notes (2015) - https://news.ycombinator.com/item?id=12145741 - July 2016 (15 comments)
What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=11056704 - Feb 2016 (42 comments)
Purely Functional Data Structures (1996) [pdf] - https://news.ycombinator.com/item?id=10486481 - Nov 2015 (13 comments)
Okasaki: Purely Functional Data Structures (1996) [pdf] - https://news.ycombinator.com/item?id=8327838 - Sept 2014 (1 comment)
What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=7081191 - Jan 2014 (17 comments)
Ten Years of Purely Functional Data Structures (2008) - https://news.ycombinator.com/item?id=5701396 - May 2013 (24 comments)
What's new in purely functional data structures since Okasaki? - https://news.ycombinator.com/item?id=1983461 - Dec 2010 (2 comments)
What's new in purely functional data structures since Okasaki - https://news.ycombinator.com/item?id=1713594 - Sept 2010 (1 comment)
"Purely Functional Data Structures" by Chris Okasaki [pdf] - https://news.ycombinator.com/item?id=1138979 - Feb 2010 (12 comments)
Teaching, Playing, and Programming: Ten Years of Purely Functional Data Structures - https://news.ycombinator.com/item?id=112270 - Feb 2008 (2 comments)
Chris Okasaki's PhD thesis on purely functional data structures (pdf) - https://news.ycombinator.com/item?id=8221 - April 2007 (1 comment)