Literate programming is much more than just commenting code (opens in new tab)

(justinmeiners.github.io)

173 pointsincrescent4y ago107 comments

107 comments

My favorite literate program still has to be the book "Physically Based Rendering". An optimized, feature rich ray tracer in the form of a textbook.

That said, I wouldn't personally want to try and collaborate on such a program with more than one other person. It would make for a great single-contributer OSS library though. Rubber duck debugging built right into the prose.

taeric4y ago

https://smile.amazon.com/gp/product/1541259335 is also a great book. As is Stanford GraphBase.

My personal bet is that it is probably easier to collaborate on something like this than you would think. The imposed structure of programs, in general, already makes a lot of collaboration tough.

eunoia4y ago

Great book! It’s available online, for free at https://www.pbrt.org/.

You can also find older, physical editions on EBay for $10-$15.

svat4y ago

I would go further: literate programming is not just "much more than" commenting code, because you can do LP without commenting much. The main thing in LP is the idea/orientation of writing as if you're writing something for a human reader. This does often lead to more comments, but even something like "here's the code" followed by lots of code can be LP, if you deem it sufficient for your intended audience. (Earlier comment of mine about target audience and not over-commenting: https://news.ycombinator.com/item?id=29871047)

This works well for people who are writers by nature (like Knuth who's always making edits and improvements to his books https://news.ycombinator.com/item?id=30149221). One problem though (and there are several) is that because this is so personal, nearly everyone who seriously tries LP ends up writing their own LP tool (including the author of this post!).

WorldMaker4y ago

I'm somewhat hopeful the growing ubiquity of especially Jupyter notebooks leads to better, more universal tools for literate programming. Notebooks have always been a form of literate programming. Jupyter and its underlying formats are now ubiquitous enough with a lot of strong IDE support (across a variety of IDEs) that I'm hopeful a better convergence as a "general literate programming platform" from the notebook side may just be a matter of time. (Other than that a lot of strong LP proponents so far seem to mostly be oblivious to the happenings in Notebook spaces and vice versa, despite there being so much cross-over.)

svat4y ago

Yes I agree (and share the hope). In fact, earlier today I was thinking about Peter Norvig's "pytudes" (https://github.com/norvig/pytudes) as good examples of literate programming — and they are notebooks. Also, last weekend I picked up some code I had written a few months ago, threw it all away and started writing it in a notebook (Colab) precisely for this "literate programming" reasons.

There's also "nbdev" (https://github.com/fastai/nbdev) which seems like it should be the best of both worlds, but I couldn't quite get it to work.

antirez4y ago

I think likewise. When I had to write the radix tree implementation for Redis I faced two problems:

- I needed a stable implemention as soon as possible, I had a performance issued that needed to be solved by range queries.

- The radix tree was full of corner cases.

So I resorted to literate programming, which is in general very near to my usual programming style. You can find it in the rax.c file inside the Redis source code, as you can see as the algorithm is enunciated, the corresponding code is inplenented.

Other than that I wrote a very extensive fuzzer for the implementation. Result: after the initial development I don't think it was never targeted by serious bugs, and now the implementation is very easy to modify if needed.

lioeters4y ago

For those curious, the extensively commented source code can be seen here:

https://github.com/redis/redis/blob/unstable/src/rax.c

BeetleB4y ago

The problems one will run into with literate programming:

1. Lack of tooling.

2. Refactoring becomes nontrivial

3. How one would write a program in literate style will vary widely from person to person. If you write your code in literate style, it may be easy for you to follow it years later and modify it, but it likely will not be the case for a coworker. If they have to modify the code, the cognitive load will not be too different from that of just dealing with well written code.

Disclaimer: I've written two nontrivial programs literate style that I continue to rely on and occasionally modify years after writing them. It works as advertised.

klibertp4y ago

For point 3. - it's exactly the same with code that's not literate. Writing code is ultimately about expressing ideas using a language, which is really much closer to writing a novel than to drawing a plan for a bridge. As such, if you want to make your code understandable to other, you have to learn to write well. Just like with novels, there's no problem with having a personal style, or a specific flavor that comes from how you use the language, how you structure your sentences and paragraphs, how you guide a reader through the story.

In other words, the style varying between people is not a problem - bad writing is. And, unfortunately, in my experience very few programmers are capable of consciously producing good writing. The fact that most of the docs out there are barely-legible trash is a proof of this.

I'm sure that reading literate code from Charles Stross would be a blast. It would be exciting, sometimes surprising, but still clear, easy to navigate, structured in a way allowing for extension within a well thought-out framework. Unfortunately, when people without his talent try to use LP, they produce things on par with that unfinished fantasy novel you started writing in 8th grade.

Programming requires a bit of talent, but you can get by with lots of hard work. Literate programming is much harder than that and requires a lot of talent to be beneficial to the codebase. Without that, your LP code will be Fifty Shades of Twilight, and honestly, we don't need more of things like that.

BeetleB4y ago

It's a nice perspective, but the fact that more people have read Fifty Shades/Twilight than the sum total of all who have read any of Charles Stross's works undercuts your point.

And while you add to point 3, it wasn't my main point.

Take any two exceptionally good writers who have very different styles. If one of them produces literate code, the other may be able to understand it very well, but it is unlikely that he can modify it, along with the prose, and maintain the quality of the literate document.

It's not just about bad writers, but incompatibly good ones.

klibertp4y ago

I believe you're wrong - writers can, and do, steal styles and ideas from each other all the time. Not to mention, the editors are basically doing this:

> can modify it, along with the prose, and maintain the quality of the literate document.

for a living. As long as you're accomplished enough of a writer, you'll be able to analyze the works of others and copy them easily. It's coming up with your own style that's a problem.

To summarize: exceptionally good writers will be able to modify and expand each other's work without much effort. Or at least that's what I believe, based on some personal experiences with writing and writers. It should be quite similar with literate programming, too.

1 more reply

WorldMaker4y ago

I feel like problem 1 is on the cusp of solutions given the amount of money poured into tooling for "notebooks" like Jupyter. Notebooks are a form of literate programming. Projects written in Jupyter Notebooks are getting larger and scaling harder. I think a convergence should eventually happen that larger scale literate programming tasks can benefit immensely from the tooling investments in notebooks like Jupyter.

throwaquestion54y ago

Problems 1 and 3 I could imagine. I would need to learn how to be a better writer to share a literate program.

As someone experienced in the topic, What's the biggest hurdle when trying to refactor the code?

shakna4y ago

Refactoring becomes the dual problem space of both programming and editing.

It's simply more work - but that "more work" is vitally important, tedious, and resistant to any kind of automated help.

User234y ago

With Emacs org-babel I just do the refactor using the normal tools for whatever language on the tangled files and then detangle it back to the literate document. There's no problem here.

1 more reply

taeric4y ago

I'd also say that a lot of the reason for refactoring is just... different. Literate programs are typically made to be fairly self contained works. If you have some general purpose code that is of use in the entire codebase, you can make that its own section and cover it when it is needed. Otherwise, you likely won't reach for the same coding strategies that are common outside of literate code.

I'm trying to find a way to describe this a bit better than the above. I think the easiest way to think about it, is that in most software projects you have a separate document that is the general architecture of the software. It is rare that you will need or want to refactor the architecture, so you try to keep that somewhat faithful to what the code is doing. In literate software, that high level architecture view is part of how you organize the code.

ilammy4y ago

> What's the biggest hurdle when trying to refactor the code?

I'd guess it's updating cross-references in prose and rewriting chapters of documentation which no longer make sense after your refactoring.

yumiris4y ago

Literate programming has been particularly useful for my "dotfile" configurations, such as .emacs, .vimrc, .zshrc and even the .gitconfig file.

I use one .org file to declare all of my configurations, and tangle them together into the aforementioned files. This keeps things pretty portable, and makes up for the unintuitive readability of many dotfiles.

It can also work for rudimentary shell scripts and other single-file goodies; however, scaling it to proper multi-file programs proves to be difficult, especially when multiple developers are involved.

syntaxfree4y ago

This is a cool idea. Also so if you switch tools like WMs you know what you used to have even if it takes some work to reconstruct what that was. But have such a tangle of glued together and custom written tiling WM rice that I can never switch to anything every again.

sritchie4y ago

Literate programming is going to feel far more powerful when we expand the definition to include:

- Smalltalk-ish things like writing suites of custom viewers for various types, - demos and examples in-line inside of a library - multiple stories about the same piece of code, but all with the ability to IMPORT the story as a library

I've been writing sicmutils[0] as a "literate library"; see the automatic differentiation implementation as an example[1].

A talk I gave yesterday at ELS[2] demos a much more powerful host that uses Nextjournal's Clerk[3] to power physics animations, TeX rendering etc, but all derived from a piece of Clojure source that you can pull in as a library, ignoring all of these presentation effects.

Code should perform itself, and it would be great if when people thought "LP" they imagined the full range of media through which that performance could happen.

[0] sicmutils: https://github.com/sicmutils/sicmutils

[1] autodiff namespace: https://github.com/sicmutils/sicmutils/blob/main/src/sicmuti...

[2] Talk code: https://github.com/sritchie/programming-2022

[3] Clerk: https://github.com/nextjournal/clerk

WorldMaker4y ago

Don't forget to include "notebooks" in the expanded view of literate programming. The amount of code being written in Jupyter notebooks alone today in practice dwarves much of literate programming in preceding years.

QuikAccount4y ago

I like literate programming in theory but the most common response I see to it is that writing self documenting code is better because as you are working on a code base with many people, it is unlikely they will keep your prose up to date as the code is changed.

mannykannot4y ago

> self documenting code is better because as you are working on a code base with many people, it is unlikely they will keep your prose up to date as the code is changed.

There is no reason to believe they are any more likely to keep code self-documenting (or to succeed even if they try) - it is not as if it will not compile or run unless it is.

I see literate programming to be an attempt to put some rigor into the otherwise terminally vague concept of self-documenting code (conceptually, it is way beyond the platitudes in 'clean code', even though it came first.) It is, however, doomed to failure in practice because it always takes less information (and less skill) to merely specify what a program will do than it does to not only specify what it will do but also explain and justify that as a correct and efficient solution to a problem that matters.

Neither 'literate' nor 'self-documenting' code are objective concepts.

Koshkin4y ago

Self-documenting code is fine - until someone starts wondering why code does what it does, or if someone wants to generate documentation. (No, the lazy style, "OpenFile - opens a file", does not cut it.)

QuikAccount4y ago

Python resolves this with docstrings but in general, assuming it is not self explanatory while a function exist, is it really necessary to go the whole nine with literate programming instead of just adding a few comments to explain why this exist. Self-documenting code with explanations when necessary is how most codebases already are. At least most good ones.

falcolas4y ago

I think the one potential mitigating factor is that new features can be entirely new "chapters". Thanks to the tangling, a feature that needs to be added in 10 different places in the code can be written completely separately from the rest of the code.

Additionally, bugs can be fixed in-situ, refactoring can occur at will, and neither would require the prose around them to change, since code being talked about (despite moving or undergoing small changes) still fulfills the original, documented, purpose.

ravel-bar-foo4y ago

It seems like adding new code as "Chapters," unless pursued with a bit of self-discipline, may result in spahgetti which is worse than a non-literate style.

Imagine a multi-person project where every little feature gets its own file, and now the programmer has to find the source of the bug between interacting blocks of in code fragments split across multiple files, ehich are combined together by tooling.... oh wait, I think that describes just about any sufficiently large C or C++ project.

admax88qqq4y ago

> unless pursued with a bit of self-discipline

Any programming unless pursued with a bit of self-discipline may result in spaghetti code.

1 more reply

taeric4y ago

The problem with self documenting code is that it doesn't help justify all of the parts into the whole. This is particularly troublesome in code where a refactor effectively isolated entire sections of the code, but the person that did the refactor didn't realize it, and now you have code that exists only for the sake of existing tests.

t0suj44y ago

I interpret self documenting as writing the what in the code and writing comments about the why. While minimizing the places where you need to explain your code.

My role of thumb is that if it's not obvious why that particular line is there and removing it would break functionality, add a comment.

taeric4y ago

I typically see "self documenting code" refer to code where function and variable names indicate the code without added comments.

Think:

    function square(x) ...

Versus

    function f(a) ...

Often includes all functions as named things, with little to no lambda usage, since names are seen as for the programmer, not for the computer.

1 more reply

doliveira4y ago

How are newcomers handled in those "self-documenting codebases"?

lf-non4y ago

I am not a big fan of the complex literate programming style involving code-generation which this article talks about.

But I recently discovered that Google's zx [1] scripting utility supports executing scripts in markdown documents and I combined it with httpie [2] and usql [3] for a bit of quick and dirty automation testing and api verification code and it worked out pretty well.

I imagine for most people nowadays jupyter or vscode notebooks are the closest it comes to practical literate programming.

[1] https://github.com/google/zx#markdown-scripts

[2] https://github.com/httpie/httpie

[3] https://github.com/xo/usql

andrewshadura4y ago

ifupdown, the Debian tool to manage network interfaces, used to be written in literate C using noweb. When I took it over from the original author, I struggled to understand how it worked. I had to print out the weaved version of it, and read it making notes on the paper. I eventually managed to make sense of it, but making any change was very difficult, so I ended up converting it to plain C, adding some comments from the original literate source and reindenting.

foxdeploy4y ago

This talked about writing code for humans then immediately jumped into some arcane mathematic scrawl like the stuff when Sephiroth casts supernova

travisgriggs4y ago

Exactly my experience. I’m a big fan of self documenting code (intention revealing selectors, useful variable names, etc). But I also recognize (and include) the value of low noise comments that guide through the higher abstractions of the code; a sort of yen-yang complementary approach. So I was with the author through the first paragraph and then saw something that looked like Aspect Oriented Programming from the late 90’s complete with the weave and intermediate compilers.

vim-guru4y ago

I've written a fair share of literate code.

It works well for personal stuff where you would like to leave some bits of information for yourself (typically, configuration files).

It works well for small libraries where good documentation is important.

It works well for visualisation-work, where you may combine multiple languages and data-formats without writing API's for each.

In larger scale apps though and with collaboration; you run into problems with tooling on multiple levels. I am working on tackling scale, but collaboration is tricky. Mostly because you need structure to collaborate and then you will likely end up with an outline that's pretty close to a directory-tree and then you've lost one of the good bits of literate code in my opinion.

atweiden4y ago

I’d like to see a literate programming version of GitHub where the community standardizes around an eminently-readable Markdown-like syntax. srcweave [1] looks like a great start.

[1]: https://github.com/justinmeiners/srcweave

mbo4y ago

You may be in luck! I think a "GitHub like community" of literate programmers could be found at Observable.

See https://observablehq.com/@observablehq/a-taste-of-observable... as a quick overview.

To plug my own work, I have written https://observablehq.com/@mjbo/genre-map-explorer-for-spotif... in a literate style, and many of the Observable community are similar adherents to literate programming.

izzygonzalez4y ago

I didn't know I was doing literate programming when I started using Observable. It's changed the way I code in a huge way. The code isn't tucked away, so it's like repos in VS Code are hidden caverns, and Observable collections are canyons in the light of the Sun. They can remain private, of course, but the barriers to sharing are negligible compared to the common Git-centered workflows.

Codesandbox and Google Colab come close but they still feel like tanks. I can code something up on my phone with Observable while waiting in line at the DMV...

fjfaase4y ago

I have started working on a program that can parse Markdown files with fragments of C code and weave those fragments into a C program that can be compiled. For an example input, see https://github.com/FransFaase/RawParser#documentation

rektide4y ago

Building a Habitable Computing Environment[1] was a recent blush i had with a "literate" computing project, this time less about programming specifically & about system setup/config.

I confess I'd rather forgotten what literate specifically meant beyond code comments describing the flow, but i did find it to be a remarkably comprehensive & understandable document, a prime example of how we might teach & understand computing. Even if it did leave me puzzling out what a number of the many many many scripts were for!

Certainly the overall project of computing needs a lot of help, ways to explain itself. Ive seen tons and tons and tons of "dotfiles" projects, but none have gotten anywhere near to as comprehensible as this literate programming project, from what I've seen.

[1] https://tess.oconnor.cx/config/hobercfg.html https://news.ycombinator.com/item?id=30748033 (19 points, 1d ago, 0 comments)

mci4y ago

IMHO, attempts to show literate programming on screen are doomed to meet with mediocre success. DEK invented literate programming with printed books in mind. I dare say that the only successful literate programs are books printed on paper.

First, in a printed book, it is easier to find a previous page and compare a fragment on it with the current fragment. Second, a printed book has no links tempting you with the words "CLICK ME" to disrupt the flow so you can read it from cover to cover with fewer distractions. Third, anecdotally, I can see flaws much easier on a printout than on screen, both in programs and in texts.

kwhitefoot4y ago

> in a printed book, it is easier to find a previous page and compare a fragment on it with the current fragment.

This is why I like plain text for everything (or Emacs Org Mode) because then I can have multiple frames showing different parts of the same buffer in Emacs.

medstrom4y ago

You can also create several browser tabs visiting the same page. Of course, no one thinks to do that.

davegauer4y ago

Make that several browser windows. And that's a good point. I sometimes end up inefficiently scrolling up and down a document to reference different parts of it. It almost never occurs to me to open another browser window with the same page for side-by-side reference.

floodyberry-4y ago

I always feel like I'm missing something about literate programming because every example is "godawful". Transitioning between text and context-less code fragments littered with markup is incredibly jarring, and instead of something "written for humans", you now have something that is neither text _or_ code that you have to piece together and hold in your head as you go.

nonrandomstring4y ago

This is great stuff. It's how all code and data research should be presented, where the document is the program and you can reproduce it as easily as you can read it. After years of using Pure Data (a visual dataflow) whose unofficial motto was "The diagram is the program" I got this philosophy stuck deep in my brain. Today I use Org-Mode (In Emacs) for tangling (with something called Babel) that can run source code from many languages as part of an active document.

michaelrpeskin4y ago

I remember my first day of advanced computer graphics (25 years ago, so advanced then wasn’t advanced now) the whole class was shocked when the professor did his dramatic start of class saying “you should never document your code!”…dramatic pause…”you should code your document”.

I still think of that today when I’m writing complex algorithms. I write everything first in prose. Then translate that to a more list like structure. And then I fill in the code around that.

Works really well when I have to come back months later and figure out what I was thinking.

syntaxfree4y ago

What you describe sounds like top down development (“in the small”). When you try to use it for bottom-up reasoning, you get Jupyter notebooks.

dwohnitmok4y ago

Are there any large (> 5 people teams) projects written with literate programming?

Also are there any IDE plugins or error stack trace/debuggers for literate programming?

I haven't really paid attention to literate programming in a long long time and I'm curious if the field has advanced.

(Also I don't understand this: "A typical literate file produces many source files." Why? Why would you care about having multiple source files? Isn't the literate file the source at this point?)

guitarbill4y ago

I worked on a larger project where some of the code was "literate" programming. It was an absolute pain to modify anything. Debugging, not so much, since tangle produces raw source code. This you can work with. The problem is working with the original files.

Syntax highlighting? Good luck! But possibly you could work around this, e.g. via custom highlighting syntax. Same with any auto-complete, contextual IDE help, etc. Refactoring was painful.

Also, the text absolutely destroys being able to scan and reason about the control flow quickly. Especially bad when a dev decides something needs "a lot of documentation" and writes a small novel.

Needless to say, it was truly awful.

flukus4y ago

IMHO the two biggest problems I find with existing tools are that they assume the documentation is the source of truth and that the tangling will leave artifacts in code. Both of these make them poorly suited to the sort of projects that most of us work on, I'm not Knuth writing a dead tree tome but I would like some better ways to add documentation to existing projects and have to integrate with others.

I wrote a PoC of a tangling tool that worked through "virtual" files and had a syntax aware handler, there's enough information that it could possible work with language servers too, but sadly I haven't had time to take things further: https://gitlab.com/lusher/tanglemd

ParetoOptimal4y ago

> they assume the documentation is the source of truth and that the tangling will leave artifacts in code.

Leo avoids this by keeping a "shadow copy" of the artifacts or annotations and doing a diff on detangling between all 3 versions.

k__4y ago

"Also are there any IDE plugins or error stack trace/debuggers for literate programming?"

Good question!

I always think it would be nice to just write Markdown sprinkled with code, but without IDE/editor support, it's dead in the water :(

Yaina4y ago

I mean rust supports markdown comments and can compile them into documentation.[1] That's pretty good in my book in terms of documentation.

[1]https://doc.rust-lang.org/rust-by-example/meta/doc.html

k__4y ago

I wouldn't call it good. But better than nothing, I guess.

1 more reply

goosedragons4y ago

R-studio supports this for R although you can sprinkle in other languages although they are 2nd class citizens unsurprisingly. Emacs has org-mode with babel that combined with poly-mode gets you every feature you'd expect in the major mode of the language you're writing in. Emacs also supports R-markdown/Sweave quite well.

k__4y ago

Ah, yes I played around with R-studio last year.

Pretty nice experience.

ParetoOptimal4y ago

We need a polished debugger using robust detangling and source maps for literate programming.

goosedragons4y ago

You might still want multiple tangled files to follow a familiar app structure, or for languages that need to be compiled or if you don't want users to have to add the extra step of tangling the source before they build your program.

hzhou3214y ago

What are the key differences between a human audience and a machine interpreter? It is not the language or prose. It is the structure and order. For machines, details comes first. You declare all the actors and types with every non-forgiving annotations first. You may tuck the details into a header, but it still needs be ordered according to compilers and structured in the way that machines gets the details first. On the other hand, for human, it is top-down context oriented. The details are important, but not after we establish the right context.

So for literate programming, if you just think it is how you write the code (e.g. self-documenting or not), or you think it is the amount of commenting (e.g. doc string or not), if you are not first and constantly thinking about how to structure your code and establish context, you are not getting literate programming.

Now, once you understand your ends, the means (tangle or weave), will come along. It is easy to invent one if you don't have one. On the other hand, getting your coworkers to agree and work together, that's hard. It is easy to get machines to work together and it is easy for human to cope.

syntaxfree4y ago

Context can be expresses with thin, somewhat redundant interfaces that group code around how they map to desiderata. Eg the meat in “matrix pseudoinverse” and “linear regression” is the same, but you can expose domain-meaningful abstractions. Regular OOP does “tangle and weave” without the messy text transformation mechanics.

hzhou3214y ago

I guess the word "context" has been misused in computer science. An interface is a computer context. In particular, an interface is full of non-negotiating details. A human context is about why (we need it), what (we want), and how (to achieve the goal at high level). Most details at high level is understood as the ways things can go wrong. This understanding provides context when we later dive into the details.

An interface with all its details is an end product. With literate programming, that shouldn't show up as a single piece up-front. It should be developed with layers, each layer with its context (why, what, how), each layer with design and implementation.

Certainly we still want a view of complete interface with all its details in one place. This is the same as the compiler still wants the entire code in its expected structure and order. That's the job of tangle.

copperx4y ago

It would be great if IDEs supported literate programming; the tangle/weave commands, simple as they are, create many possible points for navigation. An IDE would be ideal to go back and forth from the prose to the code.

something984y ago

I found this years ago: http://leoeditor.com/

iluvblender4y ago

Leo editor is my literate programming editor of choice.

BeetleB4y ago

It's the one editor I've found that has something that people have struggled to replicate in Emacs.

1 more reply

taeric4y ago

You will be shocked to know that emacs and org-mode can do exactly this. You can tangle source, and go from the tangled source back to the section that generated that source.

If you are wanting to just do cweb, then the debugging symbols already let you step through the source line by line without having to look at the tangled source.

krageon4y ago

> You will be shocked to know that emacs and org-mode can do exactly this. You can tangle source, and go from the tangled source back to the section that generated that source.

If you actually use noweb and desire autocompletion or type reminders (or really anything an IDE does), then functionally it cannot. Literate programming (and noweb) is great for configs, but as set up it simply doesn't work right for real programming.

taeric4y ago

Orgmode can get you these things. Since the tangled source is full source, you will just have to index the tangled code and detangle after edits.

So, yes. The outline code will be less prone to this. But this is no different than the architecture document being ignored by the ide.

ParetoOptimal4y ago

> If you actually use noweb and desire autocompletion or type reminders (or really anything an IDE does), then functionally it cannot.

dabbrev completes noweb-ref names and you can automate this with yasnippet.

ggm4y ago

I have always felt a literate program is probably for many of us, a future deliverable on the hack we've implemented up front.

Very very few people can start from the abstraction and get TO a literate outcome without a lot of false steps along the way.

Or, as an alternative, the LOC of a literate program has to include the 100x cost of exploring how to carve it out of the block of mud we start from, including making our own tools.

JasonFruit4y ago

The cool thing is that if it's valuable, you can leave the iteration artifacts in the literate document, not as commented-out blocks like we often see, but as a part of the explanation of how we arrived at the final version of our solution. Not all code that's in the document has to end up in the compiled/executed files.

ggm4y ago

As appendix, maybe. A program which rehearsed the mistakes before arriving at the conclusion could be a good read, in linear order but probably not what you expect. I tend to think this is more like a rolls-royce, you want to be able to look in the boot at the old water pump but only to clarify how closely it resembles the current one, if the current one breaks: if the current water pump works, thats what you want to see first if you open the bonnet.

"Reader, she married him" as the first words of the book, not the last basically.

majewsky4y ago

I had an instance of this where I built up a rather contrived data structure over time and ended up putting what is basically a blog post into the repo to explain the history of that data structure: https://github.com/majewsky/rust-jmdict/blob/main/CONTRIBUTI...

tomjen34y ago

>Very very few people can start from the abstraction and get TO a literate outcome without a lot of false steps along the way.

But don't writers face the same issue with their text? Am I the only one who writes more code than what ends up in a PR? Isn't that exactly what the Git history is for?

ggm4y ago

Sure. So how many published works do we ever see the git logs for? Significant literary works, one or two per author, pre digital era. Galley proofs, or Marcel Prousts chaotic revisions which G Scott Moncrieff fought with doing his translation, and then Terence Gilmartin. Dickens, some of them. Most writing? Never seen.

derangedHorse4y ago

Maybe I'm missing something, but I didn't find his way of programming to be all that more useful than just having well written code. The small code snippets are labeled and shown where they are referenced but this seems to mimic the functionality of functions which, when using an IDE like Visual Studio, can have where it's referenced identified through tooling.

zozbot2344y ago

The original literate programming was designed for early programming languages like Pascal, where support for forward references would've gotten in the way of efficient parsing and compiling. As you point out, it's less useful nowadays and it's been largely superseded by "lightweight" literate systems where formatted and structured documentation is programmatically extracted from the code rather than the converse.

ParetoOptimal4y ago

I think having vastly different presentation order and different cloned views over the same chunks of code are the biggest potential wins in literate programming.

silcoon4y ago

Another solution has been implemented in Marginalia. Notes near complete source code.

source code: https://github.com/gdeer81/marginalia example: http://gdeer81.github.io/marginalia/

kkfx4y ago

IMVHO literate programming is "describing an algorithm" like writing a book, witch is absolutely good, but demand much more time than directly writing code. That means: or we change actual "quick" development model to a new/old "slow" one, perhaps additive and coherent like classic systems (SmallTalk and LispM systems, the OS as a single application easy to change at runtime, anything available as a function/method anywhere) to keep the overall development speed useful enough or there is no room for literate programming.

Now, seen actual overall software quality (far less hacky than the past, but also unable to innovate, bloated, with gazillions of deps) we need to change back to days of the real innovation BUT that means we need to completely erase actual economical model centered on giants, witch can be "a little bit" difficult since they are giants and they do not like the idea to be thrown out of the window...

fanf24y ago

It seems to me that literate programming was partly invented to escape from the rigid structure of Pascal programs; if Knuth had been using a language that allows use before declaration, literate programming would be just comments. Like literate Haskell.

eterevsky4y ago

I tried reading some literate code, and I have troubles understanding it, compared to well structured normal code with moderate amount of comments.

Sometimes when you are writing an article it may make sense to write LP-like snippets of code like

    int my_function() {
      // Initialize variables
      return 0;
    }

but you don't really need to invent the whole "literate programming" concept to do this and you don't need to write all of your code like that.

shp0ngle4y ago

All bigger programs I have seen that used literate programming were unreadable and I always wished they used something else when reading the source code.

Maybe I saw bad examples though.

nesarkvechnep4y ago

I think Literate Programming is fantastic when used to teach computational thinking.

On a side note, it seems a lot of the other commenters miss one of the best "features" of LP - minimizing repetition. Chunks of code can be reused and so patterns can become clearly evident. Also, chunks can be defined "out-of-order".

mtm4y ago

One of my favorite examples of a literate-style program is "cl-6502, A Readable CPU Emulator" by Brit Butler

0des4y ago

OK, I'll bite.

I stopped using 1 letter variables, abbreviations, and non descriptive function names. If a function or block can't be read and followed like a story, and without comments, it probably can be simplified.

krick4y ago

Ok, that will be unpopular.

"Literate programming" is a non-invention by somebody (Knuth), who is very much revered by many programmers (many of whom never even actually read him), but who was — let's admit it — just terrible at writing readable code. I'm very much not a fan of the "Clean Code" by Martin, but he had a very nice example of refactoring some of Knuth's code to show you what I mean (although, it's kind of evident that writing clearly wasn't in Knuth's DNA just by reading his famous books). Today, this is an attempt to solve a problem, which you created yourself by avoiding using tools that already exist to solve that problem. Then you invent all sorts of tooling and mental tricks to make solving this problem your way more comfortable. But if you would just use these already existing tools, there would be no need in making up a new name for what you do. It wouldn't be some "literate programming", it would be just programming, the sane way.

First off, what tools I'm talking about: well, that's everything PL developers invented over the decades, and it obviously depends on which PL you are going to use. If this is some pseudo-assembly language like what Knuth uses in his TAOCP, then, well, there aren't many such tools, so creating your own template-preprocessor (which is, in a sense, making a new PL with additional features on top of your pseudo-assembly) perhaps would be an okay-ish idea. But if you use something that people actually use for programming, then you surely have functions, some kind of advanced data structures, perhaps classes and inheritance, perhaps some templating features as well (like… traits?).

Going back to the example at hand (the code author "simplifies"): all that "simplifying" consists of a top-down description of what he's going to do. Really, the code he ends up with (in "transpiled" form) isn't that much harder to read and understand than his "LP" version of it. Inline some comments to explain what he explains in the "LP" version, and you and up with the same thing, but much more concise (so, faster to read — and easier to edit!). If it was a bit more complicated: you do the same thing that he did with his "templating", but simply by doing what programmers actually do in such cases — extract complicated fragments of a function into smaller functions, and give them proper names. Maybe add some comments — yes, they are a part of your PL for a reason.

Moreover, the most complicated thing in his example isn't how the algorithm is written down, but the very algorithm itself. It is ok as long as you never actually run this code, but if you actually use it in some useful program, where it can cause problems, a programmer coming across this thing would need to stop to wrap his head around what this is doing, if it's actually all subsets and how fast the call stack may grow (as it so often turns out when you use recursion to write down "an elegant" solution). I mean, I'm only suggesting, but wouldn't this be a little bit more straight-forward?

    function subsets(elements) {
        results = []

        // All subsets of a set of 5 elements are basically binary numbers 
        // from 00000 to 11111, which is from 0 to 2⁵-1
        for (i in range(0, 2^(len(elements)) - 1) {
            results.add(get_subset_by_binary_number(elements, i)))
        }
        return results
    }

    // Blah-blah
    // Given [1, 2, 3, 4] and a number with binary representation 0101
    // will return [2, 4]
    function get_subset_by_binary_number(set, number) { ... }

This isn't my main point, though. My main point is, that people write code for a reason. There can be number of reasons, but usually it fits into a range from "doing some enerprisy-boilerplaity stuff I'll need to redo over again next week" and "writing a book, which has code, because it's about programming, and code describes programming better than english". In the first case it probably won't need a lot of "LP-kind of explanations", and where it needs to go over "why the fuck did I do it like that" a bit more extensively, you'll just link Jira issue in a comment. In the second case it might look a bit moe like LP, but it's just called "writing a book".

In all of the cases in between you'll add some amount of comments, always trying to minimize overall amount of stuff other people will have to read (and, well, you to write), which is expressing as much as you can with words you cannot avoid to write (i.e., code that actually does things, explaining them both to humans and to a computer) and minimizing what you can avoid (i.e. English). (Closer to "a book" on this spectrum it will also include some Jupyter Notebooks.)

svat4y ago

This is a surprising comment, and especially this part:

> it's kind of evident that writing clearly wasn't in Knuth's DNA just by reading his famous books

— my experience, from reading (parts of) several of Knuth's books and papers, is the very opposite: Knuth is one of the finest writers, and writing is clearly in Knuth's DNA — even among the many hats he has worn (mathematician, programmer, computer scientist, teacher), at heart of everything is writing. (His Selected Papers on Fun and Games includes some stuff he wrote in high school and college; even those show his spirit.)

IMO, every page of his is a delight to read. I think the issue for those who find it otherwise may be that he writes in a very personal way (his personality shines through), and for those who are looking for something bland or generic, this can be a surprise.

Then again, this may be one of the chief problems with literate programming in general (why it works so well with one author, and doesn't seem to have had much success with a large team): writing is very personal, and for many-person codebases something "generic" may in fact work better.

DeathArrow4y ago

>Code should be written for humans not machines.

Unfortunately, machines have a different way of understanding code than humans.

j / k navigate · click thread line to collapse

107 comments

falcolas4y ago

My favorite literate program still has to be the book "Physically Based Rendering". An optimized, feature rich ray tracer in the form of a textbook.

taeric4y ago

https://smile.amazon.com/gp/product/1541259335 is also a great book. As is Stanford GraphBase.

My personal bet is that it is probably easier to collaborate on something like this than you would think. The imposed structure of programs, in general, already makes a lot of collaboration tough.

eunoia4y ago

Great book! It’s available online, for free at https://www.pbrt.org/.

You can also find older, physical editions on EBay for $10-$15.

svat4y ago

WorldMaker4y ago

svat4y ago

There's also "nbdev" (https://github.com/fastai/nbdev) which seems like it should be the best of both worlds, but I couldn't quite get it to work.

antirez4y ago

I think likewise. When I had to write the radix tree implementation for Redis I faced two problems:

- I needed a stable implemention as soon as possible, I had a performance issued that needed to be solved by range queries.

- The radix tree was full of corner cases.

lioeters4y ago

For those curious, the extensively commented source code can be seen here:

https://github.com/redis/redis/blob/unstable/src/rax.c

BeetleB4y ago

The problems one will run into with literate programming:

1. Lack of tooling.

2. Refactoring becomes nontrivial

Disclaimer: I've written two nontrivial programs literate style that I continue to rely on and occasionally modify years after writing them. It works as advertised.

klibertp4y ago

BeetleB4y ago

It's a nice perspective, but the fact that more people have read Fifty Shades/Twilight than the sum total of all who have read any of Charles Stross's works undercuts your point.

And while you add to point 3, it wasn't my main point.

It's not just about bad writers, but incompatibly good ones.

klibertp4y ago

I believe you're wrong - writers can, and do, steal styles and ideas from each other all the time. Not to mention, the editors are basically doing this:

> can modify it, along with the prose, and maintain the quality of the literate document.

for a living. As long as you're accomplished enough of a writer, you'll be able to analyze the works of others and copy them easily. It's coming up with your own style that's a problem.

1 more reply

WorldMaker4y ago

throwaquestion54y ago

Problems 1 and 3 I could imagine. I would need to learn how to be a better writer to share a literate program.

As someone experienced in the topic, What's the biggest hurdle when trying to refactor the code?

shakna4y ago

Refactoring becomes the dual problem space of both programming and editing.

It's simply more work - but that "more work" is vitally important, tedious, and resistant to any kind of automated help.

User234y ago

With Emacs org-babel I just do the refactor using the normal tools for whatever language on the tangled files and then detangle it back to the literate document. There's no problem here.

1 more reply

taeric4y ago

ilammy4y ago

> What's the biggest hurdle when trying to refactor the code?

I'd guess it's updating cross-references in prose and rewriting chapters of documentation which no longer make sense after your refactoring.

yumiris4y ago

Literate programming has been particularly useful for my "dotfile" configurations, such as .emacs, .vimrc, .zshrc and even the .gitconfig file.

syntaxfree4y ago

sritchie4y ago

Literate programming is going to feel far more powerful when we expand the definition to include:

I've been writing sicmutils[0] as a "literate library"; see the automatic differentiation implementation as an example[1].

Code should perform itself, and it would be great if when people thought "LP" they imagined the full range of media through which that performance could happen.

[0] sicmutils: https://github.com/sicmutils/sicmutils

[1] autodiff namespace: https://github.com/sicmutils/sicmutils/blob/main/src/sicmuti...

[2] Talk code: https://github.com/sritchie/programming-2022

[3] Clerk: https://github.com/nextjournal/clerk

WorldMaker4y ago

QuikAccount4y ago

mannykannot4y ago

> self documenting code is better because as you are working on a code base with many people, it is unlikely they will keep your prose up to date as the code is changed.

There is no reason to believe they are any more likely to keep code self-documenting (or to succeed even if they try) - it is not as if it will not compile or run unless it is.

Neither 'literate' nor 'self-documenting' code are objective concepts.

Koshkin4y ago

QuikAccount4y ago

falcolas4y ago

ravel-bar-foo4y ago

It seems like adding new code as "Chapters," unless pursued with a bit of self-discipline, may result in spahgetti which is worse than a non-literate style.

admax88qqq4y ago

> unless pursued with a bit of self-discipline

Any programming unless pursued with a bit of self-discipline may result in spaghetti code.

1 more reply

taeric4y ago

t0suj44y ago

I interpret self documenting as writing the what in the code and writing comments about the why. While minimizing the places where you need to explain your code.

My role of thumb is that if it's not obvious why that particular line is there and removing it would break functionality, add a comment.

taeric4y ago

I typically see "self documenting code" refer to code where function and variable names indicate the code without added comments.

Think:

    function square(x) ...

Versus

    function f(a) ...

Often includes all functions as named things, with little to no lambda usage, since names are seen as for the programmer, not for the computer.

1 more reply

doliveira4y ago

How are newcomers handled in those "self-documenting codebases"?

lf-non4y ago

I am not a big fan of the complex literate programming style involving code-generation which this article talks about.

I imagine for most people nowadays jupyter or vscode notebooks are the closest it comes to practical literate programming.

[1] https://github.com/google/zx#markdown-scripts

[2] https://github.com/httpie/httpie

[3] https://github.com/xo/usql

andrewshadura4y ago

foxdeploy4y ago

This talked about writing code for humans then immediately jumped into some arcane mathematic scrawl like the stuff when Sephiroth casts supernova

travisgriggs4y ago

vim-guru4y ago

I've written a fair share of literate code.

It works well for personal stuff where you would like to leave some bits of information for yourself (typically, configuration files).

It works well for small libraries where good documentation is important.

It works well for visualisation-work, where you may combine multiple languages and data-formats without writing API's for each.

atweiden4y ago

I’d like to see a literate programming version of GitHub where the community standardizes around an eminently-readable Markdown-like syntax. srcweave [1] looks like a great start.

[1]: https://github.com/justinmeiners/srcweave

mbo4y ago

You may be in luck! I think a "GitHub like community" of literate programmers could be found at Observable.

See https://observablehq.com/@observablehq/a-taste-of-observable... as a quick overview.

izzygonzalez4y ago

Codesandbox and Google Colab come close but they still feel like tanks. I can code something up on my phone with Observable while waiting in line at the DMV...

fjfaase4y ago

rektide4y ago

Building a Habitable Computing Environment[1] was a recent blush i had with a "literate" computing project, this time less about programming specifically & about system setup/config.

[1] https://tess.oconnor.cx/config/hobercfg.html https://news.ycombinator.com/item?id=30748033 (19 points, 1d ago, 0 comments)

mci4y ago

kwhitefoot4y ago

> in a printed book, it is easier to find a previous page and compare a fragment on it with the current fragment.

This is why I like plain text for everything (or Emacs Org Mode) because then I can have multiple frames showing different parts of the same buffer in Emacs.

medstrom4y ago

You can also create several browser tabs visiting the same page. Of course, no one thinks to do that.

davegauer4y ago

floodyberry-4y ago

nonrandomstring4y ago

michaelrpeskin4y ago

I still think of that today when I’m writing complex algorithms. I write everything first in prose. Then translate that to a more list like structure. And then I fill in the code around that.

Works really well when I have to come back months later and figure out what I was thinking.

syntaxfree4y ago

What you describe sounds like top down development (“in the small”). When you try to use it for bottom-up reasoning, you get Jupyter notebooks.

dwohnitmok4y ago

Are there any large (> 5 people teams) projects written with literate programming?

Also are there any IDE plugins or error stack trace/debuggers for literate programming?

I haven't really paid attention to literate programming in a long long time and I'm curious if the field has advanced.

(Also I don't understand this: "A typical literate file produces many source files." Why? Why would you care about having multiple source files? Isn't the literate file the source at this point?)

guitarbill4y ago

Syntax highlighting? Good luck! But possibly you could work around this, e.g. via custom highlighting syntax. Same with any auto-complete, contextual IDE help, etc. Refactoring was painful.

Also, the text absolutely destroys being able to scan and reason about the control flow quickly. Especially bad when a dev decides something needs "a lot of documentation" and writes a small novel.

Needless to say, it was truly awful.

flukus4y ago

ParetoOptimal4y ago

> they assume the documentation is the source of truth and that the tangling will leave artifacts in code.

Leo avoids this by keeping a "shadow copy" of the artifacts or annotations and doing a diff on detangling between all 3 versions.

k__4y ago

"Also are there any IDE plugins or error stack trace/debuggers for literate programming?"

Good question!

I always think it would be nice to just write Markdown sprinkled with code, but without IDE/editor support, it's dead in the water :(

Yaina4y ago

I mean rust supports markdown comments and can compile them into documentation.[1] That's pretty good in my book in terms of documentation.

[1]https://doc.rust-lang.org/rust-by-example/meta/doc.html

k__4y ago

I wouldn't call it good. But better than nothing, I guess.

1 more reply

goosedragons4y ago

k__4y ago

Ah, yes I played around with R-studio last year.

Pretty nice experience.

ParetoOptimal4y ago

We need a polished debugger using robust detangling and source maps for literate programming.

goosedragons4y ago

hzhou3214y ago

syntaxfree4y ago

hzhou3214y ago

copperx4y ago

something984y ago

I found this years ago: http://leoeditor.com/

iluvblender4y ago

Leo editor is my literate programming editor of choice.

BeetleB4y ago

It's the one editor I've found that has something that people have struggled to replicate in Emacs.

1 more reply

taeric4y ago

You will be shocked to know that emacs and org-mode can do exactly this. You can tangle source, and go from the tangled source back to the section that generated that source.

If you are wanting to just do cweb, then the debugging symbols already let you step through the source line by line without having to look at the tangled source.

krageon4y ago

> You will be shocked to know that emacs and org-mode can do exactly this. You can tangle source, and go from the tangled source back to the section that generated that source.

taeric4y ago

Orgmode can get you these things. Since the tangled source is full source, you will just have to index the tangled code and detangle after edits.

So, yes. The outline code will be less prone to this. But this is no different than the architecture document being ignored by the ide.

ParetoOptimal4y ago

> If you actually use noweb and desire autocompletion or type reminders (or really anything an IDE does), then functionally it cannot.

dabbrev completes noweb-ref names and you can automate this with yasnippet.

ggm4y ago

I have always felt a literate program is probably for many of us, a future deliverable on the hack we've implemented up front.

Very very few people can start from the abstraction and get TO a literate outcome without a lot of false steps along the way.

Or, as an alternative, the LOC of a literate program has to include the 100x cost of exploring how to carve it out of the block of mud we start from, including making our own tools.

JasonFruit4y ago

ggm4y ago

"Reader, she married him" as the first words of the book, not the last basically.

majewsky4y ago

tomjen34y ago

>Very very few people can start from the abstraction and get TO a literate outcome without a lot of false steps along the way.

But don't writers face the same issue with their text? Am I the only one who writes more code than what ends up in a PR? Isn't that exactly what the Git history is for?

ggm4y ago

derangedHorse4y ago

zozbot2344y ago

ParetoOptimal4y ago

I think having vastly different presentation order and different cloned views over the same chunks of code are the biggest potential wins in literate programming.

silcoon4y ago

Another solution has been implemented in Marginalia. Notes near complete source code.

source code: https://github.com/gdeer81/marginalia example: http://gdeer81.github.io/marginalia/

kkfx4y ago

fanf24y ago

eterevsky4y ago

I tried reading some literate code, and I have troubles understanding it, compared to well structured normal code with moderate amount of comments.

Sometimes when you are writing an article it may make sense to write LP-like snippets of code like

    int my_function() {
      // Initialize variables
      return 0;
    }

but you don't really need to invent the whole "literate programming" concept to do this and you don't need to write all of your code like that.

shp0ngle4y ago

All bigger programs I have seen that used literate programming were unreadable and I always wished they used something else when reading the source code.

Maybe I saw bad examples though.

nesarkvechnep4y ago

I think Literate Programming is fantastic when used to teach computational thinking.

mtm4y ago

One of my favorite examples of a literate-style program is "cl-6502, A Readable CPU Emulator" by Brit Butler

0des4y ago

OK, I'll bite.

krick4y ago

Ok, that will be unpopular.

    function subsets(elements) {
        results = []

        // All subsets of a set of 5 elements are basically binary numbers 
        // from 00000 to 11111, which is from 0 to 2⁵-1
        for (i in range(0, 2^(len(elements)) - 1) {
            results.add(get_subset_by_binary_number(elements, i)))
        }
        return results
    }

    // Blah-blah
    // Given [1, 2, 3, 4] and a number with binary representation 0101
    // will return [2, 4]
    function get_subset_by_binary_number(set, number) { ... }

svat4y ago

This is a surprising comment, and especially this part:

> it's kind of evident that writing clearly wasn't in Knuth's DNA just by reading his famous books

DeathArrow4y ago

>Code should be written for humans not machines.

Unfortunately, machines have a different way of understanding code than humans.

j / k navigate · click thread line to collapse