Purely Functional Data Structures (1996) [pdf] (opens in new tab)

(cs.cmu.edu)

364 pointsdebanjan162y ago96 comments

96 comments

There's also a nice addendum on cstheory.stackexchange, "What's new in purely functional data structures since Okasaki?" - https://cstheory.stackexchange.com/questions/1539/whats-new-...

mefarza1232y ago

Since Okasaki's work there have been several advancements and new developments in the field:(Source: MirrorThink.ai)

1. PaC-trees: Supporting Parallel and Compressed Purely-Functional Collections - 2022: This paper introduces PaC-trees, a purely functional data structure that supports parallel and compressed collections. PaC-trees are designed to be efficient in terms of space and time complexity while maintaining the benefits of functional data structures.

2. Proving tree algorithms for succinct data structures - 2019: This paper discusses the development of tree algorithms for succinct data structures, which are compact representations of data that support efficient query and update operations.

These is some work in other fields too:

1. CyBy2: a strongly typed, purely functional framework for chemical data management - 2019-12-30: This paper presents CyBy2, a purely functional framework for managing chemical data. The framework is designed to be strongly typed and enforce referential transparency through the type system.

2. chemf: A purely functional chemistry toolkit - 2012-12-20: This paper introduces chemf, a purely functional chemistry toolkit that provides a set of algorithms and data structures for working with chemical data in a functional programming context.

bafe2y ago

I'm not so sure about cyby2, the persistence is based on "conventional" relational DBs. Judging by the paper, the main goal wasn't developing a specialised functional data structure to store molecules

dang2y ago

A few discussions:

What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=11056704 - Feb 2016 (42 comments)

What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=7081191 - Jan 2014 (17 comments)

What's new in purely functional data structures since Okasaki? - https://news.ycombinator.com/item?id=1983461 - Dec 2010 (2 comments)

What's new in purely functional data structures since Okasaki - https://news.ycombinator.com/item?id=1713594 - Sept 2010 (1 comment)

anfelor2y ago

I would also recommend Koen Claessen's simplified finger trees (https://dl.acm.org/doi/abs/10.1145/3406088.3409026) and Zip trees (https://arxiv.org/pdf/1806.06726.pdf) for purely functional skip lists.

smasher1642y ago

RRB trees in particular allow you to benefit from some of the cache efficiency of regular vectors, while supporting operations like immutable update. They are used in languages like Clojure and Scala.

aeonik2y ago

I'm reading this book right now. It's really great so far!

I've been working a lot with Trees in Clojure, and have been hitting serious limitations of my understanding.

I also found this YouTube video from a Clojure conference that reviews some different strategies for tree traversal in Clojure: https://youtu.be/YgvJqWiyMRY

I thought that learning a Functional Lisp would make it really easy to traverse trees, since that is that the language is doing to actually execute its code.

Turns out all the stuff I want to do is simply hard.

Functional data structures are really awesome though, it just seems to take a bit of up front investment.

jmkr2y ago

This has been a topic I've wanted to get into for a few years now, specifically because of Clojure! So if you have any additional recommendations I'd appreciate it.

I really enjoyed Friedman's book `Scheme and the Art of Programming` because it filled in some pieces missing from "The Little Schemer" (and "Seasoned Schemer"). Building stuff like `kons`, `map` and all that `letrec` stuff.

But the big difference between Scheme and Clojure is that in Scheme, while it's "functional by concept," you get to escape with `(set! ...)` whenever you want to have a more traditional ds/algo (like say doing the n queens problem).

In Clojure you can kind of do that by either escaping to Java, using protocols (with atoms), or using transients, but I often feel like there's a "more functional way to do stuff that hasn't been taught to me."

I've opened up either the Okasaki dissertation or book or both, but I've always had trouble reading it, and then sticking with it. And some stuff like looking at Rosetta code and saying "to reverse a linked list in a lisp is easy... because it's a cons cell" seems like cheating. Almost like showing up to an interview, saying your "linked list" is implemented in an array structure and then calling `reverse()` on it.

Will watch that talk from 2014, must not have seen it before.

I guess, conceptually, day to day things in Clojure does feel pretty natural, even easier, and I think I have a decent understanding of it. But then when I look at leetcode type problems, or something more involved, it takes a lot of mental effort to translate to it. Especially things like `big O` gets thrown away in my mental model. I get it, persistent data structures and all that, but there's still a mystery there.

smabie2y ago

I feel like they are.. not so awesome: they are grossly inefficient due to all the pointer chasing and are pretty much guaranteed to be slower than the alternative since they trash your cache.

jfoutz2y ago

I wish I could remember the exact book, I think it's _writing solid code_. There was a long example about the excel evaluation engine back in the day when it was shipped on CD's and had to be perfect. The approach was, use one big dumb slow, but understandable and correct implementation, in parallel, use the lightning fast super whiz bang new implementation. Since both shared the same interface, they could be swapped out, or both run at the same time.

I think there is real value in starting with the pure functional versions, then swapping out when needed. One problem that seems fairly common in large codebases is using an object as a hash key. but as the code grows, that object sort of gets passed around and eventually somebody updates it without rehashing. That's a tough bug find. They are for me anyway.

This is one of those rare cases where you can actually make it faster later without trashing the overall design.

I'd encourage starting with the pure functional version first, every time. I'd go further and say, leave some opportunity to flip back to the pure functional version in dev builds for isolating heisenbugs.

Blah blah, grain of salt, free advice, your milage may vary, Games have different requirements than webpages, everything is contextual.

This is one rare case where it's always worth having the capability of swapping back and forth is worth it. Just use it, and think hard before giving it up is a really good default.

1 more reply

agentultra2y ago

You feel this way but if you do the math and benchmarks you will be surprised.

You might not be running GHC's RTS on a resource-constrained target platform but you can generate the C code that could run on that platform from Haskell, leveraging the guarantees you get with using Haskell's advanced type system.

Update: point is that GHC can optimize code and some of those optimizations can avoid pointer chasing! But like most optimizing compilers, some times GHC can miss them, and you have to do a bit of work to tell GHC where it can avoid chasing unnecessary pointers (or where it should/should-not inline, give it some help to know where it can do fusion, etc).

aeonik2y ago

Most of the problems I'm trying to solve require those pointers anyway.

Tracking diffs and versions of data over time: Version control systems and RDMS just don't cut it.

Bitemporal databases are interesting to me as well.

duped2y ago

There are entire fields of research dedicated to studying cache oblivious persistent data structures, and they're almost always trees.

One of the key points of immutable datastructures is that they're easy to reason about given that the invariants are upheld. Designing an efficient memory representation that upholds those invariants is an entirely separate problem domain. Not every pointer dereference is slow, and not every tree is represented with pointers.

wtetzner2y ago

They're only grossly inefficient if you wouldn't otherwise need to frequently make copies of large mutable data structures. It all depends on what you're trying to do.

mtrimpe2y ago

I agree that it's a great book; but I wouldn't recommend it as an entry point into understanding purely functional data structures.

Okasaki fleshes out a couple of examples and explains his thought processes and heuristics for developing such data structures quite well; but they're very much still notes on the early-stage exploration of the concept.

From a historical perspective it's still a fascinating read though and definitely recommend it if you want to read up on the origins of immutable data structures.

okennedy2y ago

This is the canonical guide to reasoning about amortized runtimes. The exponentially growing ArrayBuffer (O(1) amortized append) is the classical data structure used to teach amortized runtimes, but the O(1) amortized functional queue Okasaki presents here gives a much better intuition for what amortized runtimes are all about.

solomatov2y ago

There's a book which looks like it's based on this dissertation, which probably is a better source to read if you are interested in this topic: https://www.amazon.com/Purely-Functional-Data-Structures-Oka...

malkia2y ago

For C++ check this one out - https://github.com/arximboldi/immer and this talk from the author - https://www.youtube.com/watch?v=_oBx_NbLghY (CppCon 2018: Juan Pedro Bolivar Puente “The Most Valuable Values”)

tmtvl2y ago

It's weird that there are so many claims in here that the data structures and algorithms are perfectly performant yet there isn't even one look at generated assembly or any acknowledgement of the underlying system that is supposed to run the code. Proving things are Big O performant is neat, but at some point the code has to hit hardware.

gnulinux2y ago

> It's weird

It's not weird, this is standard in academic computer science. It would be weird to do otherwise. In a theoretical dissertation/paper like this you can't just randomly bring up compiled assembly, it's completely and utterly off topic, it's not any more on topic than bringing up if the code was ran by an interpreter, or JVM, or transpiled to Haskell, or ran on GPU etc...

amelius2y ago

"Computer science is no more about computers than astronomy is about telescopes."

-- Edsger W. Dijkstra

However, I do believe that astronomers put references to the actual instruments they used in their publications.

1 more reply

nerdponx2y ago

I can't speak for other fields, but in statistics and machine learning, it's not uncommon to see at least some practical experiments or simulations alongside theoretical results, or at least some discussion of practical considerations.

There are definitely plenty of pure theory papers as well, and pure theory is still a contribution. However one would at least hope to see subsequent work that does focus on the practical aspect.

shepherdjerred2y ago

> at some point the code has to hit hardware

Yes, but that's not a concern of a computer scientist. Implementation and execution of the algorithm are up to the reader.

It's like complaining that engineers don't do enough novel computer science research; of course they don't! It's not their job.

Gravityloss2y ago

Which profession is expected to make progress on actual software performance?

1 more reply

fjeifisjf2y ago

That's a very naive view of computer science. That is the attitude of people who have given up and decided that computers are too big for science now.

2 more replies

wk_end2y ago

Other commenters have addressed the key point - it's not weird. CLRS doesn't look at generated assembly either.

One other thing I want to add, though, is that this book was published in 1996. The CPU landscape at the time was far more diverse, both in terms of ISA (which assembly?) and also in terms of capabilities. Even on the x86 line, plenty of people were still using 486s, which weren't even superscalar, and thus had pretty strikingly different performance characteristics than, say, a Pentium Pro (a "modern", OoO CPU). Tying your algorithmic analysis to a particular piece of hardware would be pretty useless; providing a satisfactory look at the performance on so many different (micro-)architectures could risk overwhelming the content.

Moreover, at the time, performance was, maybe, a little more straightforward, making your concerns perhaps not quite as important. Memory was slower than the CPU but not the way it is now. Caches were much smaller and less useful. Branch prediction was more primitive. CPUs had relatively weak reordering capabilities, if any at all, and simpler pipelines. There were few dedicated SIMD instructions. This was a world where linked lists still often made sense, because the hardware penalties you paid for them were smaller. In other words: in 1996 the on-paper performance wasn't at quite as big a risk of diverging quite so wildly from the real-world performance, so keeping things simple and focusing on the on-paper performance was less objectionable.

Tainnor2y ago

Yeah, Big-O notation is only part of the picture and yes, "galactic algorithms" that are theoretically efficient but only for astronomically big inputs do exist, but in many cases (especially if your algorithm isn't doing anything particularly deep or clever) linear is faster than quadratic which is faster than exponential on real world input, too.

Mathematics is the science of modeling and abstraction and that abstraction exists in order to make the process of knowledge gathering easier. It works very well for the sciences, so I would say the method has been proven. Of course, if the models turn out to be completely inappropriate, that would be different, but so far, the simple machine models used (implicitly or explicitly) in the analysis of algorithms seem to be rather reasonable.

The alternative that you suggest, trying to benchmark concrete implementations of algorithms has so many confounders that it would be very hard to execute properly. I'm sure people are doing this too, but the formal Big O proof has a different quality to it because it does not depend on those particulars.

agumonkey2y ago

I'd even say that maths is what appears when you stop willing to pay for concretion.

jove_2y ago

Also, while I haven't read any further yet, based on the table on page 3, their big O performance is pretty bad.

jakeinspace2y ago

How helpful would it be to see 30 year old generated assembly and benchmarks for an i486 with a tiny cache, no prefetching, and relatively tiny memory vs instruction latency compared to today’s CPUs?

2-718-281-8282y ago

is this being upvoted onto the homepage based on upvoters actually understanding that this paper from 1996 is of contemporary relevance and interest or more due to keywords like "pure", "functional", "data" and "structure"?

paulgb2y ago

I can’t speak for everyone, but I upvoted it from nostalgia, having read the book version over a decade ago. I happened to be thinking about ordering a copy for the office just yesterday.

dllthomas2y ago

I have a copy on my desk. The bit about designing data structures by analogy to number systems (and limiting carry propagation) is really fun.

ufo2y ago

It is is still the go-to textbook for immutable data structures. Worth the read.

schaefer2y ago

This comment reads as if there is a clear, contemporary successor for learning pure functional data structures.

Is there? If so, please do share a reference.

jstrieb2y ago

I've recently used a number of structures that I learned from this book. Though I don't know if the text represents the state of the art in purely functional data structures, it's a pretty seminal work in the area.

UncleMeat2y ago

Nowhere near the state of the art. Lots of improvements since this was published.

It is a good book for learning. It is a decent book for reference. When you want to really fly you will want to reach for more recent work.

1 more reply

bradrn2y ago

> is this being upvoted onto the homepage based on upvoters actually understanding that this paper from 1996 is of contemporary relevance and interest …?

In my case, yes.

turtleyacht2y ago

The book is on Amazon, but this submission had those keywords, plus it's a PDF.

Of course it is of contemporary relevance; functional programming (FP) is all the rage.

The tricky bit are questions like

How does this jive with existing JS functional constructs like "fantasy land," for example.

How to "recruit" more folks to FP, or even a hybrid approach of objects interacting functionally

Game jams using more FP-like data structures? Or more HN submissions like that.

The harder things to evaluate are a lot of other topics, news-like but investigative and curious, or sites that are essentially selling a service (versus teaching the mechanism behind it).

For SaaS stuff, since HN is about startups, I have to let it slide. But the hacking piece is when one person accomplishes something with persistent decomposition of sequential problems, or does something clever using tools or ideas from a different context.

solomatov2y ago

My understanding is that the book is based on this PhD dissertation.

1 more reply

bluepod42y ago

Your comment is giving early 2010s hipster “you probably never heard of it” vibes.

mklauber12y ago

Definitely read that headline as "Purely Fictional Data Structures". My disappointment is immense.

0823498723498722y ago

Purely Fictional Data Structures include:

  To Queue a Mockingbird
  The Call-stack of the Wild
  Tesselation of the d'Urbervilles
  One Flew Over the Cuckoo Hash
  The Data of the Woosters
  Brideshead Re-visitor
  The Catcher in the Trie
  Les Miser-tables
  The Nat-elim of Monte Cristo

gowld2y ago

> The Catcher in the Trie

https://en.wikipedia.org/wiki/Trie#History,_etymology,_and_p...

and the counterpart, Purely Fictional Languages:

   The Catcher in the *Try*

alex_lav2y ago

I would love to be educated. I've seen claims about the merit and value of functional programming throughout my (nowadays relatively long) programming career. In practice I've never once seen those values or merits come to fruition - just the same cycle all software goes through.

My very direct experience recently has been Scala + cats resulted in the same buggy nonperformant software it was meant to prevent. I understand that bad programmers produce bad programs, regardless of language, but I feel pretty strongly that good tools prevent some amount of the typical "bad" that makes bad programs bad (ignoring the obviously maliciously bad examples). So I don't really understand, and would like to understand, if, how and when pure FP (and I suppose FP in general) actually improve quality of code/life outside of toy examples.

mrkeen2y ago

The bottom line is that pure FP means that the same input to a function gives you the same output.

When you debug, you just give the program the same input which was problematic and you get to reproduce the error.

Persistent data structures make it less wildly inefficient to do so.

rntz2y ago

title typo: "Purely Functional Data Structure" -> "Purely Functional Data Structures" (pluralization)

reads a bit weird otherwise - sounds like it's discussing a particular purely functional data structure when it's actually a survey of many (pretty much the canonical survey of them, in fact).

debanjan16OP2y ago

Thanks for pointing it out. I totally missed it. Sorry.

1MachineElf2y ago

What are software bugs that can be avoided by choosing data structures like these?

I'm making a broad, high-level presentation about immutability in technology. At my company we have folks who have heard of it in the context of ransomware-resilient backups, others who have heard of it in the context of infrastructure as code, and very few who have heard of it in terms of data structures (distributed and non-distributed). My goal is to showcase the concept in various contexts so that people can better understand its role as a key design choice in technology.

Personally I have no experience working on software that utilizes these, so if others here do, I would appreciate your input on how these make your software more reliable.

The emphasis on software reliability and bugs-avoided is because the audience works under the company's risk-management division.

Chris_Newton2y ago

Purely functional data structures are a means, not an end in themselves.

Programming with immutable values and the more declarative style they support does design out at least one infamous source of bugs: shared mutable state. That has become increasingly important in recent years, for example because modern chips support ever increasing numbers of concurrent threads in hardware and because a lot of the software we write is now part of distributed systems.

That programming style has other advantages in terms of readability and reasoning about code. If you can be confident that a particular name in the code always refers to the same value once it’s been defined, tracing the flow of data through a function or through an entire system often becomes much easier. Having more understandable code naturally tends to improve reliability, among other advantages.

If we adopt that programming style then we still need to represent collections of values and different relationships within our data, but we can’t always use “traditional” data structures to do that any more because many of them rely on mutability, which we have excluded. So we need other ways to represent structured data that are compatible with immutability and declarative style, and that is what purely functional data structures give us.

red_admiral2y ago

_Immutable_ data structures (not 100% the same thing, but a lot of overlap) avoid all kinds of concurrency problems, because you can safely pass them around and you'll never get a data race. You don't even need any locking (just to make things complicated, _lock-free_ data structures are another closely related but not identical concept).

Once you're running a distributed system, this kind of stuff comes into its own.

taeric2y ago

Careful with this wording. They avoid shared memory mutations. They don't necessarily change data races. Rather, they just change them since, by definition, every edit is now creating stale data.

At large, in distributed software, this is a distraction. Since most passing of messages around from one distributed piece to the other was already doing a copy across mediums. Such that sent data was already immutable from the senders perspective. (Granted, the preparation step can have you step on your own feet.)

okennedy2y ago

As the peer comment points out, functional data structures force you to be deliberate about where/when state changes take place. In particular, they force you to be cognizant of which version of a data structure a particular piece of code is referencing.

Immutable structures can help significantly in reducing race conditions in parallel code, as it is impossible for one thread to modify part of a data structure while another thread is accessing it. Each thread sees a consistent 'version' of the data structure until the code explicitly replaces it with a new version.

Immutability can also help with memory optimizations: A node of one data structure can be safely re-used in another data structure. For example, if you need to retain older versions of a tree, you can share common nodes (this is how GIT encodes version changes).

thethimble2y ago

Purely functional data structures are very common in purely functional languages like Haskell but are also used in non functional languages via libraries like immutable.js.

At a high level, immutability forces you to be extremely deliberate about state changes in your application. This improves reasoning/understanding, reduces bugs, and eases debugging.

An example of immutability that you might be familiar with would be react props/state. You don’t modify your state. This makes reasoning about state much more simple.

chrisfinazzo2y ago

Disclaimer: Not a programmer - I do not have a CS degree. One of my minors as an undergrad was MIS and while I took a database programming course as a senior, my knowledge is limited.

The high-level explanation I recall reading years ago somewhere (Joel on Software, I think??) was that a) Functional programs have no side effects, leading to the notion that b) They can, without much effort, be adapted for parallel tasks.

The example here was MapReduce, which was the original guts of the Google Search algorithm.

E.g, Things are fast because we can send your query to many, many computers to find an answer, and chances are good that that machine is (relatively) close to you, which means low wait times to get your answer.

otto-edgar2y ago

I think the point is that these data structures can be the foundation of a pure-functional style of programming.

You wouldn't drop these into an ecosystem where the programmers are used to dealing with mutable structures.

Now, whether using a purely functional language correlates with writing less bugs is a separate discussion, not sure what the modern consensus is.

mrkeen2y ago

> not sure what the modern consensus is.

Too good a phrase to pass up!

Modern consensus is done by constructing a replicated log - an append only data-structure shared across multiple workers. It's what you use Paxos for, and it's what Raft streamlines.

odipar2y ago

Okasaki got me interested in confluently persistent data-structures, way back in the 2000s.

They seem magical! To be able to combine data from the past with current data, efficiently!

They are almost always trees, with the exception of skip-lists, with all operations O(log(n)), .

After creating my own programming language Enchilada that is based on immutable data structures, I started considering what I deemed "next level":

Uniquely represented confluently persistent data structures

Combined with a Merkle tree encoding of such uniquely represented data structures (they are almost always trees), you can efficiently and incrementally authenticate them. Think 'block chain' on steroids, with incremental cryptographic hashes. Or torrents, if you are into that kind of thing.

EddieEngineers2y ago

Why would we want to use purely functional data structures? When do the pros of functional data structures outweigh the additional complexity? Are there scenarios when a project would want to pivot from a regular data structure to a purely functional one?

haroldl2y ago

The most common point is that they're safe to share between threads making parallel algorithms easier to invent, understand, and implement correctly.

You can also safely re-use sub-structures without performing a deep copy. For example, if you want to keep a sub-tree around for later you can do that in O(1) time because it's safe to keep a reference to it. If it is a mutable tree you don't know what's going to happen to it so you need to do a deep copy of the entire sub-tree you're holding on to. This can save a lot on memory allocation and copying depending on your use case.

rg1112y ago

Deep Learning applications is one area.

Traditionally, OO code is written all the time.

But after I learned JAX/Flax, a light turned on inside my head and I now write functional Deep Learning code as much as I can. All my side projects and new code are purely functional in JAX/Flax. PyTorch has functional API known as functorch, and I have used it one project.

Where lots and lots of data in 3,4,5 dimensional tensors exist, and you need to run lots of transformation on them, and then you need to multiply a huge number of them thousand times in each second- functional code makes much more sense and gives immense sanity and peace of mind.

Those of you writing Deep Learning code, learn functional programming principles (immutable data, pure functions, leaving no side effect, etc.), and apply them to DL via functorch or JAX.

Your life will never be the same.

debanjan16OP2y ago

Do JAX and functorch have the same level of builtin functionalities (operations) as the original Pytorch library?

Where to learn about it other than the documentation?

1 more reply

wtetzner2y ago

There's a tradeoff between the complexity of the implementation of a data structure, and the use of one. While the complexity of implementing purely functional data structures is often (except maybe in the case of singly linked lists) higher than their mutable counterparts, actually using them in a program is simpler and less error prone.

There are obviously other trade offs as well, like performance and memory usage.

toolslive2y ago

they have some nice properties that you might want to benefit from. For example, they're persistent: every previous state of the data structure is still in there.. aka snapshots!

xmcqdpt22y ago

There is no additional programming complexity in using Scala's Map vs Java's HashMap. They are both maps with keys and values. The Java one you update in-place

    var m = new HashMap<K,V>()
    m.add(k, v)

the Scala one you "update" by creating new maps,

    var m = Map.empty[K, V]
    m += (key, value)

In practice it's mostly the same except you can share the immutable one around without being scared someone will mutate it, but it takes more memory per instance than the mutable version and creates more GC churn, which may or may not be an issue.

yawaramin2y ago

Do you use git? The git commit graph is a purely functional data structure.

nimih2y ago

As is explained in the abstract, purely functional data structures are most useful when you want to avoid assignment, either due to the restrictions of your programming environment (f.ex. you're programming in Haskell), or because that is a property you're interested in for other reasons (e.g. you want to ensure immutability due to concurrency concerns, or exploit persistence to improve performance).

weeksie2y ago

This book is near and dear to my heart. Back in the mists of time (~05?) when I was learning Haskell I reimplemented several of these in order to get my head around things. Lots of fond memories.

eointierney2y ago

I remember encountering an early draft when he was still doing his masters? Absolutely cracking paper, one for the ages. Thanks Chris :)

sriku2y ago

For me, the most mind-blowing part of Okasaki's book was the chapter on "numerical representations". Never looked at it like that before I read that chapter. While the other chapters certainly introduced material that was new to me at the time, this one took some things I knew and added a whole new dimension to them.

tpoacher2y ago

I have a personal pet peeve about the misuse of terminology when dealing with such names, for which the only solution is to go read the original reference to figure out what they meant by it.

E.g., in this case, to describe a data structure as "purely functional" makes zero sense to me intuitively at first. You need to go read the thesis and realise they're referring to data structures implemented as algebraic data types, which in the context of a purely functional language can themselves be thought of as functions, and can therefore be described as 'functional' ...

But unless you do that, the first thought is going to be "huh? can arrays be further split into imperative vs functional?" "Does he mean immutable?" "Can I use functional arrays in c?" "Are they faster/safer than normal arrays?".

By contrast, I think "Purely Functional Data-Structure Types" would have been a far more intuitive term ... but I suppose the author may have felt that clarifying further wasn't punchy enough, and could have made the thesis more wordy...

Joker_vD2y ago

A data structure is a type, so "data-structure type" is a pleonasm. Please don't misuse terminology :)

tpoacher2y ago

Nominative determinism aside, was that second part really necessary?

pyuser5832y ago

I thought it said “Purely Fictional Data Structures” - which would have been fascinating.

eigenhombre2y ago

Borges meets Knuth.

user23422y ago

Thanks. I have the printed book, but a PDF is much more comfortable!

xupybd2y ago

Gah, why did I just buy this as an ebook if it's free.

paddw2y ago

It would be cool if someone could translate this into Typescript or the like, I think it would make it a lot more readable.

haskellandchill2y ago

TypeScript is much less readable than Haskell and OCaml, but you can easily find translations to TypeScript such as https://github.com/skeate/lambdata.

nequo2y ago

> TypeScript is much less readable than Haskell and OCaml

That's like saying that Norwegian is much less readable than Italian. It is in the eye of the beholder. They can both express the same concepts but which one is more readable depends on which one you already know.

2 more replies

lemper2y ago

this literature was my "serious" introduction to fp. implemented some of the data structure on f#.

dang2y ago

Related. Others?

Purely Functional Data Structures in Elm – course lecture notes (2015) - https://news.ycombinator.com/item?id=12145741 - July 2016 (15 comments)

What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=11056704 - Feb 2016 (42 comments)

Purely Functional Data Structures (1996) [pdf] - https://news.ycombinator.com/item?id=10486481 - Nov 2015 (13 comments)

Okasaki: Purely Functional Data Structures (1996) [pdf] - https://news.ycombinator.com/item?id=8327838 - Sept 2014 (1 comment)

What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=7081191 - Jan 2014 (17 comments)

Ten Years of Purely Functional Data Structures (2008) - https://news.ycombinator.com/item?id=5701396 - May 2013 (24 comments)

What's new in purely functional data structures since Okasaki? - https://news.ycombinator.com/item?id=1983461 - Dec 2010 (2 comments)

What's new in purely functional data structures since Okasaki - https://news.ycombinator.com/item?id=1713594 - Sept 2010 (1 comment)

"Purely Functional Data Structures" by Chris Okasaki [pdf] - https://news.ycombinator.com/item?id=1138979 - Feb 2010 (12 comments)

Teaching, Playing, and Programming: Ten Years of Purely Functional Data Structures - https://news.ycombinator.com/item?id=112270 - Feb 2008 (2 comments)

Chris Okasaki's PhD thesis on purely functional data structures (pdf) - https://news.ycombinator.com/item?id=8221 - April 2007 (1 comment)

chalcolithic2y ago

my dream language is rebol with immutable data structures only

j / k navigate · click thread line to collapse

96 comments

gexahaha2y ago

There's also a nice addendum on cstheory.stackexchange, "What's new in purely functional data structures since Okasaki?" - https://cstheory.stackexchange.com/questions/1539/whats-new-...

mefarza1232y ago

Since Okasaki's work there have been several advancements and new developments in the field:(Source: MirrorThink.ai)

These is some work in other fields too:

bafe2y ago

dang2y ago

A few discussions:

What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=11056704 - Feb 2016 (42 comments)

What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=7081191 - Jan 2014 (17 comments)

What's new in purely functional data structures since Okasaki? - https://news.ycombinator.com/item?id=1983461 - Dec 2010 (2 comments)

What's new in purely functional data structures since Okasaki - https://news.ycombinator.com/item?id=1713594 - Sept 2010 (1 comment)

anfelor2y ago

smasher1642y ago

aeonik2y ago

I'm reading this book right now. It's really great so far!

I've been working a lot with Trees in Clojure, and have been hitting serious limitations of my understanding.

I also found this YouTube video from a Clojure conference that reviews some different strategies for tree traversal in Clojure: https://youtu.be/YgvJqWiyMRY

I thought that learning a Functional Lisp would make it really easy to traverse trees, since that is that the language is doing to actually execute its code.

Turns out all the stuff I want to do is simply hard.

Functional data structures are really awesome though, it just seems to take a bit of up front investment.

jmkr2y ago

This has been a topic I've wanted to get into for a few years now, specifically because of Clojure! So if you have any additional recommendations I'd appreciate it.

Will watch that talk from 2014, must not have seen it before.

smabie2y ago

I feel like they are.. not so awesome: they are grossly inefficient due to all the pointer chasing and are pretty much guaranteed to be slower than the alternative since they trash your cache.

jfoutz2y ago

This is one of those rare cases where you can actually make it faster later without trashing the overall design.

Blah blah, grain of salt, free advice, your milage may vary, Games have different requirements than webpages, everything is contextual.

This is one rare case where it's always worth having the capability of swapping back and forth is worth it. Just use it, and think hard before giving it up is a really good default.

1 more reply

agentultra2y ago

You feel this way but if you do the math and benchmarks you will be surprised.

aeonik2y ago

Most of the problems I'm trying to solve require those pointers anyway.

Tracking diffs and versions of data over time: Version control systems and RDMS just don't cut it.

Bitemporal databases are interesting to me as well.

duped2y ago

There are entire fields of research dedicated to studying cache oblivious persistent data structures, and they're almost always trees.

wtetzner2y ago

They're only grossly inefficient if you wouldn't otherwise need to frequently make copies of large mutable data structures. It all depends on what you're trying to do.

mtrimpe2y ago

I agree that it's a great book; but I wouldn't recommend it as an entry point into understanding purely functional data structures.

From a historical perspective it's still a fascinating read though and definitely recommend it if you want to read up on the origins of immutable data structures.

okennedy2y ago

solomatov2y ago

malkia2y ago

tmtvl2y ago

gnulinux2y ago

> It's weird

amelius2y ago

"Computer science is no more about computers than astronomy is about telescopes."

-- Edsger W. Dijkstra

However, I do believe that astronomers put references to the actual instruments they used in their publications.

1 more reply

nerdponx2y ago

There are definitely plenty of pure theory papers as well, and pure theory is still a contribution. However one would at least hope to see subsequent work that does focus on the practical aspect.

shepherdjerred2y ago

> at some point the code has to hit hardware

Yes, but that's not a concern of a computer scientist. Implementation and execution of the algorithm are up to the reader.

It's like complaining that engineers don't do enough novel computer science research; of course they don't! It's not their job.

Gravityloss2y ago

Which profession is expected to make progress on actual software performance?

1 more reply

fjeifisjf2y ago

That's a very naive view of computer science. That is the attitude of people who have given up and decided that computers are too big for science now.

2 more replies

wk_end2y ago

Other commenters have addressed the key point - it's not weird. CLRS doesn't look at generated assembly either.

Tainnor2y ago

agumonkey2y ago

I'd even say that maths is what appears when you stop willing to pay for concretion.

jove_2y ago

Also, while I haven't read any further yet, based on the table on page 3, their big O performance is pretty bad.

jakeinspace2y ago

2-718-281-8282y ago

paulgb2y ago

I can’t speak for everyone, but I upvoted it from nostalgia, having read the book version over a decade ago. I happened to be thinking about ordering a copy for the office just yesterday.

dllthomas2y ago

I have a copy on my desk. The bit about designing data structures by analogy to number systems (and limiting carry propagation) is really fun.

ufo2y ago

It is is still the go-to textbook for immutable data structures. Worth the read.

schaefer2y ago

This comment reads as if there is a clear, contemporary successor for learning pure functional data structures.

Is there? If so, please do share a reference.

jstrieb2y ago

UncleMeat2y ago

Nowhere near the state of the art. Lots of improvements since this was published.

It is a good book for learning. It is a decent book for reference. When you want to really fly you will want to reach for more recent work.

1 more reply

bradrn2y ago

> is this being upvoted onto the homepage based on upvoters actually understanding that this paper from 1996 is of contemporary relevance and interest …?

In my case, yes.

turtleyacht2y ago

The book is on Amazon, but this submission had those keywords, plus it's a PDF.

Of course it is of contemporary relevance; functional programming (FP) is all the rage.

The tricky bit are questions like

How does this jive with existing JS functional constructs like "fantasy land," for example.

How to "recruit" more folks to FP, or even a hybrid approach of objects interacting functionally

Game jams using more FP-like data structures? Or more HN submissions like that.

The harder things to evaluate are a lot of other topics, news-like but investigative and curious, or sites that are essentially selling a service (versus teaching the mechanism behind it).

solomatov2y ago

My understanding is that the book is based on this PhD dissertation.

1 more reply

bluepod42y ago

Your comment is giving early 2010s hipster “you probably never heard of it” vibes.

mklauber12y ago

Definitely read that headline as "Purely Fictional Data Structures". My disappointment is immense.

0823498723498722y ago

Purely Fictional Data Structures include:

  To Queue a Mockingbird
  The Call-stack of the Wild
  Tesselation of the d'Urbervilles
  One Flew Over the Cuckoo Hash
  The Data of the Woosters
  Brideshead Re-visitor
  The Catcher in the Trie
  Les Miser-tables
  The Nat-elim of Monte Cristo

gowld2y ago

> The Catcher in the Trie

https://en.wikipedia.org/wiki/Trie#History,_etymology,_and_p...

and the counterpart, Purely Fictional Languages:

   The Catcher in the *Try*

alex_lav2y ago

mrkeen2y ago

The bottom line is that pure FP means that the same input to a function gives you the same output.

When you debug, you just give the program the same input which was problematic and you get to reproduce the error.

Persistent data structures make it less wildly inefficient to do so.

rntz2y ago

title typo: "Purely Functional Data Structure" -> "Purely Functional Data Structures" (pluralization)

reads a bit weird otherwise - sounds like it's discussing a particular purely functional data structure when it's actually a survey of many (pretty much the canonical survey of them, in fact).

debanjan16OP2y ago

Thanks for pointing it out. I totally missed it. Sorry.

1MachineElf2y ago

What are software bugs that can be avoided by choosing data structures like these?

Personally I have no experience working on software that utilizes these, so if others here do, I would appreciate your input on how these make your software more reliable.

The emphasis on software reliability and bugs-avoided is because the audience works under the company's risk-management division.

Chris_Newton2y ago

Purely functional data structures are a means, not an end in themselves.

red_admiral2y ago

Once you're running a distributed system, this kind of stuff comes into its own.

taeric2y ago

Careful with this wording. They avoid shared memory mutations. They don't necessarily change data races. Rather, they just change them since, by definition, every edit is now creating stale data.

okennedy2y ago

thethimble2y ago

Purely functional data structures are very common in purely functional languages like Haskell but are also used in non functional languages via libraries like immutable.js.

At a high level, immutability forces you to be extremely deliberate about state changes in your application. This improves reasoning/understanding, reduces bugs, and eases debugging.

An example of immutability that you might be familiar with would be react props/state. You don’t modify your state. This makes reasoning about state much more simple.

chrisfinazzo2y ago

Disclaimer: Not a programmer - I do not have a CS degree. One of my minors as an undergrad was MIS and while I took a database programming course as a senior, my knowledge is limited.

The example here was MapReduce, which was the original guts of the Google Search algorithm.

otto-edgar2y ago

I think the point is that these data structures can be the foundation of a pure-functional style of programming.

You wouldn't drop these into an ecosystem where the programmers are used to dealing with mutable structures.

Now, whether using a purely functional language correlates with writing less bugs is a separate discussion, not sure what the modern consensus is.

mrkeen2y ago

> not sure what the modern consensus is.

Too good a phrase to pass up!

Modern consensus is done by constructing a replicated log - an append only data-structure shared across multiple workers. It's what you use Paxos for, and it's what Raft streamlines.

odipar2y ago

Okasaki got me interested in confluently persistent data-structures, way back in the 2000s.

They seem magical! To be able to combine data from the past with current data, efficiently!

They are almost always trees, with the exception of skip-lists, with all operations O(log(n)), .

After creating my own programming language Enchilada that is based on immutable data structures, I started considering what I deemed "next level":

Uniquely represented confluently persistent data structures

EddieEngineers2y ago

haroldl2y ago

The most common point is that they're safe to share between threads making parallel algorithms easier to invent, understand, and implement correctly.

rg1112y ago

Deep Learning applications is one area.

Traditionally, OO code is written all the time.

Those of you writing Deep Learning code, learn functional programming principles (immutable data, pure functions, leaving no side effect, etc.), and apply them to DL via functorch or JAX.

Your life will never be the same.

debanjan16OP2y ago

Do JAX and functorch have the same level of builtin functionalities (operations) as the original Pytorch library?

Where to learn about it other than the documentation?

1 more reply

wtetzner2y ago

There are obviously other trade offs as well, like performance and memory usage.

toolslive2y ago

they have some nice properties that you might want to benefit from. For example, they're persistent: every previous state of the data structure is still in there.. aka snapshots!

xmcqdpt22y ago

There is no additional programming complexity in using Scala's Map vs Java's HashMap. They are both maps with keys and values. The Java one you update in-place

    var m = new HashMap<K,V>()
    m.add(k, v)

the Scala one you "update" by creating new maps,

    var m = Map.empty[K, V]
    m += (key, value)

yawaramin2y ago

Do you use git? The git commit graph is a purely functional data structure.

nimih2y ago

weeksie2y ago

This book is near and dear to my heart. Back in the mists of time (~05?) when I was learning Haskell I reimplemented several of these in order to get my head around things. Lots of fond memories.

eointierney2y ago

I remember encountering an early draft when he was still doing his masters? Absolutely cracking paper, one for the ages. Thanks Chris :)

sriku2y ago

tpoacher2y ago

I have a personal pet peeve about the misuse of terminology when dealing with such names, for which the only solution is to go read the original reference to figure out what they meant by it.

Joker_vD2y ago

A data structure is a type, so "data-structure type" is a pleonasm. Please don't misuse terminology :)

tpoacher2y ago

Nominative determinism aside, was that second part really necessary?

pyuser5832y ago

I thought it said “Purely Fictional Data Structures” - which would have been fascinating.

eigenhombre2y ago

Borges meets Knuth.

user23422y ago

Thanks. I have the printed book, but a PDF is much more comfortable!

xupybd2y ago

Gah, why did I just buy this as an ebook if it's free.

paddw2y ago

It would be cool if someone could translate this into Typescript or the like, I think it would make it a lot more readable.

haskellandchill2y ago

TypeScript is much less readable than Haskell and OCaml, but you can easily find translations to TypeScript such as https://github.com/skeate/lambdata.

nequo2y ago

> TypeScript is much less readable than Haskell and OCaml

2 more replies

lemper2y ago

this literature was my "serious" introduction to fp. implemented some of the data structure on f#.

dang2y ago

Related. Others?

Purely Functional Data Structures in Elm – course lecture notes (2015) - https://news.ycombinator.com/item?id=12145741 - July 2016 (15 comments)

What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=11056704 - Feb 2016 (42 comments)

Purely Functional Data Structures (1996) [pdf] - https://news.ycombinator.com/item?id=10486481 - Nov 2015 (13 comments)

Okasaki: Purely Functional Data Structures (1996) [pdf] - https://news.ycombinator.com/item?id=8327838 - Sept 2014 (1 comment)

What's new in purely functional data structures since Okasaki? (2010) - https://news.ycombinator.com/item?id=7081191 - Jan 2014 (17 comments)

Ten Years of Purely Functional Data Structures (2008) - https://news.ycombinator.com/item?id=5701396 - May 2013 (24 comments)

What's new in purely functional data structures since Okasaki? - https://news.ycombinator.com/item?id=1983461 - Dec 2010 (2 comments)

What's new in purely functional data structures since Okasaki - https://news.ycombinator.com/item?id=1713594 - Sept 2010 (1 comment)

"Purely Functional Data Structures" by Chris Okasaki [pdf] - https://news.ycombinator.com/item?id=1138979 - Feb 2010 (12 comments)

Teaching, Playing, and Programming: Ten Years of Purely Functional Data Structures - https://news.ycombinator.com/item?id=112270 - Feb 2008 (2 comments)

Chris Okasaki's PhD thesis on purely functional data structures (pdf) - https://news.ycombinator.com/item?id=8221 - April 2007 (1 comment)

chalcolithic2y ago

my dream language is rebol with immutable data structures only

j / k navigate · click thread line to collapse