The Beauty of Unix Pipelines (opens in new tab)

(prithu.xyz)

649 points0x4FFC8F5y ago374 comments

374 comments

Pipes are wonderful! In my opinion you can’t extol them by themselves. One has to bask in a fuller set of features that are so much greater than the sum of their parts, to feel the warmth of Unix:

(1) everything is text

(2) everything (ish) is a file

(3) including pipes and fds

(4) every piece of software is accessible as a file, invoked at the command line

(5) ...with local arguments

(6) ...and persistent globals in the environment

A lot of understanding comes once you know what execve does, though such knowledge is of course not necessary. It just helps.

Unix is seriously uncool with young people at the moment. I intend to turn that around and articles like this offer good material.

jcranmer5y ago

> (1) everything is text

And lists are space-separated. Unless you want them to be newline-separated, or NUL-separated, which is controlled by an option that may or may not be present for the command you're invoking, and is spelled completely differently for each program. Or maybe you just quote spaces somehow, and good luck figuring out who is responsible for inserting quotes and who is responsible for removing them.

gorgoiler5y ago

To criticize sh semantics without acknowledging that C was always there when you needed something serious is a bit short sighted.

There are two uses of the Unix “api”:

[A] Long lived tools for other people to use.

[B] Short lived tools one throws together oneself.

The fact that most things work most of the time is why the shell works so well for B, and why it is indeed a poor choice for the sort of stable tools designed for others to use, in A.

The ubiquity of the C APIs of course solved [A] use cases in the past, when it was unconscionable to operate a system without cc(1). It’s part of why they get first class treatment in the Unix man pages, as old fashioned as that seems nowadays.

5 more replies

hnlmorg5y ago

That's a POSIX shell thing rather than a Unix pipeline thing. Some non-POSIX shells don't have this problem while still passing data long Unix pipes

Source: I wrote a shell and it solves a great many of the space / quoting problems with POSIX shells.

1 more reply

gricardo995y ago

>good luck figuring out ...

luck, or simply execute the steps you want to check by typing them on the command line. I find the pipes approach incredibly powerful and simple because you can compose and check each step, and assemble. That's really the point, and the power, of a simple and extensible approach.

6gvONxR4sf7o5y ago

God help you if your paths have spaces in them.

4 more replies

jlg235y ago

Or you just pipe through the program of your choice for which you know the syntax:

  echo "foo bar fnord" | cut --delimiter " " --output-delimiter ":" -f1-

  # foo:bar:fnord

1 more reply

fomine35y ago

The fact that filename can contain anything except NULL/slash is really pain. I often write a shell script that treats LF as separator but I know it's not good.

2 more replies

ComputerGuru5y ago

I think this was a failure on behalf of early terminal emulator developers. End-of-field and end-of-record should have been supported by the tty (either as visual symbols or via some sort of physical distancing, etc even if portrayed just as a space and a new line respectively) so that the semantic definition could have been preserved/distinguished.

1 more reply

chooseaname5y ago

All of which are so much easier to deal with than some binary format you have no spec on.

XML? SMH.

michaelcampbell5y ago

Ah yes, perfect is the enemy of the good.

pwdisswordfish25y ago

Works for me.

derefr5y ago

People struggle so much with this, but I don't see what the point is at all. The fundamental problem that playing with delimiters solves, is passing arbitrary strings through as single tokens.

Well, that's easy. Don't escape or quote the strings; encode them. Turn them into opaque tokens, and then do Unix things to the opaque tokens, before finally decoding them back to being strings.

There's a reason od(1) is in coreutils. It's a Unix fundamental when working with arbitrary data. Hex and base64 are your friends (and Unix tools are happy to deal with both.)

1 more reply

laumars5y ago

> everything is text

Everything is a byte stream. Usually that means text but sometimes it doesn't. Which means you can do fun stuff like:

- copy file systems over a network: https://docs.oracle.com/cd/E18752_01/html/819-5461/gbchx.htm...

- stream a file into gzip

- backup or restore an SD card using `cat`

jrumbut5y ago

Also dd, which may be the disk destroyer but is also a great tool for binary file miracles.

See one here: https://unix.stackexchange.com/questions/6852/best-way-to-re...

2 more replies

dvirsky5y ago

> Unix is seriously uncool with young people at the moment.

Those damn kids with their loud rockn'roll music and their Windows machines. Back in my day we had he vocal stylings of Dean Martin and the verbal stylings of Linus Torvalds let me tell ya.

Seriously though, I'm actually seeing younger engineers really taking the time to learn how to do shell magic, using vim, etc. It's like the generation of programmers who started up until the late 90s used those by default, people who like me started 15-20 years ago grew up on IDEs and GUIs, but I've seen a lot of 20-30 something devs who are really into the Unix way of doing things. The really cool kids are in fact doing it.

abnry5y ago

I guess I am one of the young kids who think the unix command line is wicked cool. It makes the user experience on my laptop feel so much more powerful.

1 more reply

xnyan5y ago

I grew up all-GUI windows kid. I actually had a revulsion to the shell, mostly because it seemed unfriendly and scary. In my early 20s I tried to run Plex on a Windows 7 box and it was miserable. I forced myself to lean by switching to a headless arch linux box.

Giving up the idea that CLI = pain (i.e figuring out how to to navigate a file system, ssh keys, etc) for sure was a learning curve, but now I can't imagine using computers without it.

1 more reply

bityard5y ago

> (1) everything is text

Not at all, you can pipe around all the binary you want. Until GNU tar added the 'z' option, the way to extract all files in a tarball was:

`gunzip -c < foo.tar.gz | tar x`

However, "text files" in Unix do have a very specific definition, if you want them to work with standard Unix text manipulation utilities like awk, sed, and diff:

All lines of text end are terminated by a line feed, even the last one.

I can't tell you how many "text files" I run across these days that don't have a trailing newline. It's as bad as mixing tabs and spaces, maybe worse.

1 more reply

ssivark5y ago

Serialized bytestreams do compose better than graphical applications. But that is setting a very low bar. For example, allowing passing around dicts/maps/json (and possibly other data structures) would already be a massive improvement. You know what might be even better — passing around objects you could interact with by passing messages (gasp! cue, Alan Kay).

While posix and streams are nice (if you squint, files look like a poor man’s version of objects imposed on top of bytestreams), it’s about time we moved on to better things, and not lose sight of the bigger picture.

fao_5y ago

Yeah, but that makes every single file an object. Then you need plugins to deal with plain text objects, plugins to deal with every single type of object and their versions.

Objects aren't human readable and if they get corrupted the object's format is very difficult to recover, you need an intimate knowledge of both the specific format used (of which there might be thousands of variations). Plaintext, however, if that gets corrupted it's human-readable. The data might not be more compact (Although TSV files tend to come out equal), but it's more robust against those kinds of changes.

Have you ever examined a protobuf file without knowing what the protobuf structure is? I have, as part of various reverse-engineering I did. It's a nightmare, and without documentation (That's frequently out of date or nonexistent) it's almost impossible to figure out what the data actually is, and you never know if you've got it right. Even having part of the recovered data, I can't figure out what the rest of it is.

2 more replies

flukus5y ago

> For example, allowing passing around dicts/maps/json (and possibly other data structures) would already be a massive improvement.

It is allowed, you can pass around data in whatever encoding you desire. Not many do though because text is so useful for humans.

> You know what might be even better — passing around objects you could interact with by passing messages (gasp! cue, Alan Kay)

That's a daemon/service and basic unix utils make it easy to create, a shell script reading from a FIFO file fits this definition of object, the message passing is writing to the file and the object is passed around with the filename. Unix is basically a fulfillment of Alan Kay's idea of OO.

boomlinde5y ago

> For example, allowing passing around dicts/maps/json (and possibly other data structures) would already be a massive improvement.

Well, it is allowed. You can of course serialize some data into a structured format and pass it over a byte stream in POSIX to this end, and I think this is appropriate in some cases in terms of user convenience. Then you can use tools like xpath or jq to select subfields or mangle.

> (if you squint, files look like a poor man’s version of objects imposed on top of bytestreams)

If all you have is a hammer...

cobalt5y ago

try powershell, its on linux too

1 more reply

jiggawatts5y ago

> Pipes are wonderful!

It's wonderful only if compared to worse things, pretending that PowerShell is not a thing, and that Python doesn't exist.

UNIX pipes are a stringly-typed legacy that we've inherited from the 1970s. The technical constraints of its past have been internalised by its proponents, and lauded as benefits.

To put it most succinctly, the "byte stream" nature of UNIX pipes means that any command that does anything high-level such as processing "structured" data must have a parsing step on its input, and then a serialization step after its internal processing.

The myth of UNIX pipes is that it works like this:

    process1 | process2 | process3 | ...

The reality is that physically, it actually works like this:

   (process1,serialize) | (parse,process2,serialize) | ...

Where each one of those "parse" and "serialise" steps is unique and special, inflexible, and poorly documented. This is forced by the use of byte streams to connect each step. It cannot be circumvented! It's an inherent limitation of UNIX style pipes.

This is not a limitation of PowerShell, which is object oriented, and passes strongly typed, structured objects between processing steps. It makes it much more elegant, flexible, and in effect "more UNIX than UNIX".

If you want to see this in action, check out my little "challenge" that I posed in a similar thread on YC recently:

https://news.ycombinator.com/item?id=23257776

The solution provided by "JoshuaDavid" really impressed me, because I was under the impression that that simple task is actually borderline impossible with UNIX pipes and GNU tools:

https://news.ycombinator.com/item?id=23267901

Compare that to the much simpler equivalent in PowerShell:

https://news.ycombinator.com/item?id=23270291

Especially take note half of that script is sample data!

> Unix is seriously uncool with young people at the moment. I intend to turn that around and articles like this offer good material.

UNIX is entirely too cool with young people. They have conflated the legacy UNIX design of the 1970s with their preferred open source software, languages, and platforms.

There are better things out there, and UNIX isn't the be all and end all of system management.

matheusmoreira5y ago

> Where each one of those "parse" and "serialise" steps is unique and special, inflexible, and poorly documented.

This can also be a security concern. According to research, a great number of defects with security implications occur at the input handling layers:

http://langsec.org

2 more replies

scbrg5y ago

   The reality is that physically, it actually works like this:

      (process1,serialize) | (parse,process2,serialize) | ...

But as soon as you involve the network or persistent storage, you need to do all that anyway. And one of the beauties is that the tools are agnostic to where the data comes from, or goes.

2 more replies

gorgoiler5y ago

Typed streams sound great.

Most data being line oriented (with white space field separators) has historically worked well enough, but I get your point that the wheels will eventually fall off.

It’s important to remember that the shell hasn’t always been about high concept programming tasks.

  $ grep TODO xmas-gifts
  grandma TODO
  auntie Sian TODO
  jeanie MASTODON T shirt

The bug (bugs!) in the above example doesn’t really matter in the context of the task at hand.

imglorp5y ago

More Lego rules:

(7) and common signalling facility

(8) also nice if some files have magic properties like /dev/random or /proc or /dev/null

(9) every program starts with 3 streams, stdin/stdout for work and stderr for out of band errors

enedil5y ago

(9) every program starts with 3 streams, stdin/stdout for work and stderr for out of band errors

This is false - if a process closes stdin/out/err, its children won't have these files open.

1 more reply

hhas015y ago

I defy anyone to build their error-handling logic off the back of stderr…

1 more reply

matheusmoreira5y ago

1. All data is bytes

2. Everything is a file descriptor

3. File descriptors are things that support standard I/O system calls such as read and write

Koshkin5y ago

You are describing Plan 9.

ryandrake5y ago

> (2) everything (ish) is a file

I'm not all that knowledgable about Unix history, but one thing that has always puzzled me was that for whatever reason network connections (generally) aren't files. While I can do:

  cat < /dev/ttyS0

to read from a serial device, I've always wondered why I can't do something like:

  bind /tmp/mysocket 12.34.56.78 80
  cat < /tmp/mysocket

It is weird that so many things in Unix are somehow twisted into files (/dev/fb0??), but network connections, to my knowledge, never managed to go that way. I know we have netcat but it's not the same.

spijdar5y ago

The original authors of "UNIX" considered their creation to be fatally flawed, and created a successor OS called "Plan 9", partially to address this specific deficit, among others like the perceived tacked-on nature of GUIs on *nix.

That's a whole rabbit hole to go down, but Plan 9 networking is much more like your hypothetical example [0]. Additionally, things like the framebuffer and devices are also exposed as pure files, whereas on Linux and most unix-like systems such devices are just endpoints for making ioctl() calls.

[0] http://doc.cat-v.org/plan_9/4th_edition/papers/net/

DSMan1952765y ago

I think you're just seeing the abstraction fall apart a bit - for weirder devices, I would say they are "accessible" via a file, but you likely can't interact with it without special tools/syscalls/ioctls. For example, cat'ing your serial connection won't always work, occasionally you'll need to configure the serial device or tty settings first and things can get pretty messy. That's why programs like `minicom` exist.

For networking, I don't think there's a particular reason it doesn't exist, but it's worth noting sockets are a little special and require a bit extra care (Which is why they have special syscalls like `shutdown()` and `send()` and `recv()`). If you did pass a TCP socket to `cat` or other programs (Which you can do! just a bit of fork-exec magic), you'd discover it doesn't always work quite right. And while I don't know the history on why such a feature doesn't exist, with the fact that tools like socat or curl do the job pretty well I don't think it's seen as that necessary.

1 more reply

jandrese5y ago

Berkeley sockets were kind of bolted on as an afterthought. If the AT&T guys developed them they would probably look a lot more like that.

1 more reply

philsnow5y ago

elsewhere in this thread somebody mentions socat, but you can do it entirely within bash. from https://www.linuxjournal.com/content/more-using-bashs-built-... :

    exec 3<>/dev/tcp/www.google.com/80
    echo -e "GET / HTTP/1.1\r\nhost: http://www.google.com\r\nConnection: close\r\n\r\n" >&3
    cat <&3

https://news.ycombinator.com/item?id=23422423 has a good point that "everything is a file" is maybe less useful than "everything is a file descriptor". the shell is a tool for setting up pipelines of processes and linking their file descriptors together.

1 more reply

moonchild5y ago

In the c api, you can read/write to sockets. Wouldn't be difficult to implement something like your example with a named pipe.

Additionally, there is /dev/tcp in bash.

yjftsjthsd-h5y ago

Sounds like you want netcat?

1 more reply

devchix5y ago

>(1) everything is text

>(2) everything (ish) is a file

I'm having a moment. I have to get logs out of AWS, the thing is frustrating to no end. There's a stream, it goes somewhere, it can be written to S3, topics and queues, but it's not like I can jack it to a huge file or many little files and run text processing filters on it. Am I stupid, behind the times, tied to a dead model, just don't get it? There's no "landing" anywhere that I can examine things directly. I miss the accessibility of basic units which I can examine and do things to with simple tools.

macintux5y ago

I worked around that by writing a Python script to let me pick the groups of interest, download all of the data for a certain time period, and collapse them into a CSV file ordered by time stamp.

Yes, it’s very annoying I have to do all that, but I commend Bezos for his foresight in demanding everything be driven by APIs.

quickthrower25y ago

Can you give me any clue as to what execve does? I looked at the man page but none the wiser. Sounds like magic from what I read there. I'm from a Windows backgroud and not used to pipes.

kccqzy5y ago

It replaces the executable in the current process with a different executable. It's kind of like spawning a new process with an executable, except it's the same PID, and file descriptors without CLOEXEC remain open in their original state.

2 more replies

JdeBP5y ago

Well you have a Windows background, so you know BASIC, right? And you know that BASIC has a CHAIN statement, right? (-:

Yes, I know. You've probably never touched either one. But the point is that this is simply chain loading. It's a concept not limited to Unix. execve() does it at the level of processes, where one program can chain to another one, both running in a single process. But it is most definitely not magic, nor something entirely alien to the world of Windows.

* https://en.wikibooks.org/wiki/QBasic/Appendix#CHAIN

If you are not used to pipes on Windows, you haven't pushed Windows nearly hard enough. The complaint about Microsoft DOS was that it didn't have pipes. But Windows NT has had proper pipes all along since the early 1990s, as OS/2 did before it since 1987. Microsoft's command interpreter is capable of using them, and all of the received wisdom that people knew about pipes and Microsoft's command interpreters on DOS rather famously (it being much discussed at the time) went away with OS/2 and Windows NT.

And there are umpteen ways of improving on that, from JP Software's Take Command to various flavours of Unix-alike tools. And you should see some of the things that people do with FOR /F .

Unix was the first with the garden hosepipe metaphor, but it has been some 50 years since then. It hasn't been limited to Unix for over 30 of them. Your operating system has them, and it is very much worthwhile investigating them.

1 more reply

gorgoiler5y ago

Its how you launch (execute) a new program.

Such programs typically don’t have any interaction so the v and the e parts refer to the two ways you can control what the program does.

v: a vector (list) of input parameters (aka arguments)

e: an environment of key-value pairs

The executed program can interpret these two in any way it wishes to, though there are many conventions that are followed (and bucked by the contrarians!) like using “-“ to prefix flags/options.

This is in addition to any data sent to the program’s standard input — hence the original discussion about using pipes to send the output of one command to the input of another.

hhas015y ago

“(1) everything is text”

LOL. What fantasy land are you living in?

Pipes are great. Untyped, untagged pipes are not. They’re a frigging disaster in every way imaginable.

“Unix is seriously uncool with young people at the moment.”

Unix is half a century old. It’s old enough to be your Dad. Hell, it’s old enough to be your Grandad! It’s had 50 years to better itself and it hasn’t done squat.

Linux? Please. That’s just a crazy cat lady living in Unix’s trash house.

Come back when you’ve a modern successor to Plan 9 that is easy, secure, and doesn’t blow huge UI/UX chunks everywhere.

Shared4045y ago

>Unix is seriously uncool with young people...

Not all of us. I much prefer the Unix way of doing things, it just makes more sense then just trying to become another Windows.

pjmlp5y ago

Everything is a file until one needs to do UNIX IPC, networking or high performance graphics rendering.

lsofzz5y ago

> Unix is seriously uncool with young people at the moment. I intend to turn that around and articles like this offer good material.

I offer you my well wishes on that.

ianai5y ago

How’s it uncool? Just not “ mobile”?

gorgoiler5y ago

Exactly.

“Linux, ew!”

A big part of teaching computer science to children is breaking this obsession with approaching a computer from the top down — the old ICT ways, and the love of apps — and learning that it is a machine under your own control the understanding of which is entirely tractable from the bottom up.

Unlike the natural sciences, computer science (like math) is entirely man made, to its advantage. No microscopes, test tubes, rock hammers, or magnets required to investigate its phenomena. Just a keyboard.

3 more replies

ma2rten5y ago

I was also skeptical of that statement. I did a google trends search and indeed there seems to be a slow decline in command-line related searches:

https://trends.google.com/trends/explore?date=all&q=bash,awk...

remexre5y ago

My complaints with unix (as someone running linux on every device, starting to dip my toe into freebsd on a vps); apologies for lack of editing:

> everything is text

I'd often like to send something structured between processes without needing both sides to have to roll their own de/serialization of the domain types; in practice I end up using sockets + some thrown-together HTTP+JSON or TCP+JSON thing instead of pipes for any data that's not CSV-friendly

> everything (ish) is a file

> including pipes and fds

To my mind, this is much less elegant when most of these things don't support seeking, and fnctls have to exist.

It'd be nicer if there were something to declare interfaces like Varlink [0, 1], and a shell that allowed composing pipelines out of them nicely.

> every piece of software is accessible as a file, invoked at the command line

> ...with local arguments

> ...and persistent global in the environment

Sure, mostly fine; serialization still a wart for arguments + env vars, but one I run into much less

> and common signalling facility

As in Unix signals? tbh those seem ugly too; sigwinch, sighup, etc ought to be connected to stdin in some way; it'd be nice if there were a more general way to send arbitrary data to processes as an event

> also nice if some files have magic properties like /dev/random or /proc or /dev/null

userspace programs can't really extend these though, unless they expose FUSE filesystems, which sounds terrible and nobody does

also, this results in things like [2]...

> every program starts with 3 streams, stdin/stdout for work and stderr for out of band errors

iow, things get trickier once I need more than one input one one output. :)

I also prefer something like syslog over stderr, but again this is an argument for more structured things.

[0]: https://varlink.org/

[1]: ideally with sum type support though, and maybe full dependent types like https://dhall-lang.org/

[2]: https://www.phoronix.com/scan.php?page=news_item&px=UEFI-rm-...

2 more replies

xkucf035y ago

Unix is seriously cool and I consider myself still young :-) But it could be better. See https://relational-pipes.globalcode.info/

cowmix5y ago

I use pipelines as much as the next guy but every time I see post praise how awesome they are, I'm reminded of the Unix Hater's Handbook. Their take on pipelines is pretty spot on too.

http://web.mit.edu/~simsong/www/ugh.pdf

cuddlybacon5y ago

I mostly like what they wrote about pipes. I think the example of bloating they talked about in ls at the start of the shell programming section is a good example: if pipelines are so great, why have so many unix utilities felt the need to bloat?

I think it a result of there being just a bit too much friction in building a pipeline. A good portion tends to be massaging text formats. The standard unix commands for doing that tend to have infamously bad readability.

Fish Shell seems to be making this better by making a string which has a syntax that makes it clear what it is doing: http://fishshell.com/docs/current/cmds/string.html I use fish shell, and I can usually read and often write text manipulations with the string command without needing to consult the docs.

Nushell seems to take a different approach: add structure to command output. By doing that, it seems that a bunch of stuff that is super finicky in the more traditional shells ends up being simple and easy commands with one clear job in nushell. I have never tried it, but it does seem to be movement in the correct direction.

code-faster5y ago

It's less that pipelines are friction, they're really not.

It's more that people like building features and people don't like saying no to features.

The original unix guys had a rare culture that was happy to knock off unnecessary features.

rnestler5y ago

> Nushell seems to take a different approach: add structure to command output. By doing that, it seems that a bunch of stuff that is super finicky in the more traditional shells ends up being simple and easy commands with one clear job in nushell. I have never tried it, but it does seem to be movement in the correct direction.

I tried nushell a few times and the commands really compose better due to the structured approach. How would one sort the output of ls by size in bash without letting ls do the sorting? In nushell it is as simple as "ls | sort-by size".

ken5y ago

> The Macintosh model, on the other hand, is the exact opposite. The system doesn’t deal with character streams. Data files are extremely high level, usually assuming that they are specific to an application. When was the last time you piped the output of one program to another on a Mac?

And yet, at least it's possible to have more than one Macintosh application use the same data. Half the world has migrated to web apps, which are far worse. As a user, it's virtually impossible to connect two web apps at all, or access your data in any way except what the designers decided you should be able to do. Data doesn't get any more "specific to an application" than with web apps.

junke5y ago

Command-line tools where you glue byte streams are, in spirit, very much like web scraping. Sure, tools can be good or bad, but by design you are going to have ad-hoc input/output formats, and for some tools this is interleaved with presentation/visual stuff (colors, alignment, headers, whitespaces, etc.).

In a way web apps are then more alike standard unix stuff, where you parse whatever output you get, hoping that it has enough usable structure to do an acceptable job.

The most reusable web apps are those that offer an API, with JSON/XML data formats where you can easily automate your work, and connect them together.

1 more reply

the_af5y ago

I think of the Unix Hater's Handbook as a kind of loving roast to Unix, that hackers of the time understood to be humorous (you know, how people complain about the tools they use every day, much like people would later complain endlessly about Windows) and which was widely misunderstood later to be a real scathing attack.

It hasn't aged very well, either. "Even today, the X server turns fast computers into dumb terminals" hasn't been true for at least a couple of decades...

Quekid55y ago

> It hasn't aged very well, either. "Even today, the X server turns fast computers into dumb terminals" hasn't been true for at least a couple of decades...

You're not wrong, but that's only because people wrote extensions for direct access to the graphics hardware... which obviously don't work remotely, and so aren't really in the spirit of X. It's great that that was possible, but OTOH it probably delayed the invention of something like Wayland for a decade+.

It's been ages since I've used X for a remote application, but I sometimes wonder how many of them actually really still work to any reasonable degree. I remember some of them working when I last tried it, albeit with abysmal performance compared to Remote desktop, for example.

2 more replies

msla5y ago

No, I think it's mostly just as angry and bitter as it sounds, coming from people who backed the wrong horse (ITS, Lisp Machines, the various things Xerox was utterly determined to kill off... ) and got extremely pissy about Unix being the last OS concept standing outside of, like, VMS or IBM mainframe crap, neither of which they'd see as improvements.

1 more reply

zamalek5y ago

I am firmly in the "ugh" camp. I strongly suspect that the fawning that occurs over pipelines is because of sentimentality more than practicality. Extracting information from text using a regular expression is fragile. Writing fragile code brings me absolutely no joy as a programmer - unless I am trying to flex my regex skills.

If you really look at how pipelines are typically used is: lines are analogous to objects and [whatever the executable happens to use] delimiters are analogous to fields. Piping bare objects ("plain old shell objects") makes far more sense, involves far less typing and is far less fragile.

shakna5y ago

If you're having to extract your data using a regex, then the data probably isn't well-formed enough for a shell pipeline. It's doable, but a bad idea.

Regex should not be the first hammer you reach for, because it's a scalpel.

I recently wanted cpu cores + 1. That could be a single regex. But this is more maintainable, and readable:

    echo '1 + '"$(grep 'cpu cores' /proc/cpuinfo | tail -n1 | awk -F ':' '{print $2}')" | bc

There's room for a regex there. Grep could have done it, and then I wouldn't need the others... But I wouldn't be able to come back in twelve months and instantly be able to tell you what it is doing.

6 more replies

julianeon5y ago

I love UNIX but I think on this point I'm with the haters.

Practically speaking, as an individual, I don't have the needs of the DMV. For my personal use I'm combing through pip-squeaky data file downloads and CSV's.

So even though using Python or some other language causes a gigantic performance hit, it's just the difference between 0.002s and 0.02 seconds: a unit of time/inefficiency so small I can't ever perceive it. So I might also well use a language to do my processing, because at my level it's easier to understand and practically the same speed.

fenwick675y ago

From the intro:

> As for me? I switched to the Mac.

It's amusing that Apple would go on to switch to a unix-based OS in '01.

pjmlp5y ago

Apple released their first UNIX based OS in 1988.

Exercise for the reader to find out its name.

marcosdumay5y ago

Hum... Those other abstractions on that section are great and everything, but it really sucks to write them interactively.

Pipelines also lack some concept similar to exceptions, but it would also suck to handle those interactively.

cryptonector5y ago

  set -o pipefail

1 more reply

Silamoth5y ago

Thanks for sharing! I'll have to give that a read sometime. It looks quite interesting.

ColanR5y ago

I read the introduction. Sounds like Norman would have liked to hear about Plan 9.

dialamac5y ago

I have a hard time believing he would have been unfamiliar with Plan 9. It wasn’t exactly obscure in the research community at the time. See the USENIX proceedings in the late 80s and 90s.

This is mere speculation, but I doubt he would have appreciated Plan 9.

1 more reply

atombender5y ago

Pipes are a great idea, but are severely hampered by the many edge cases around escaping, quoting, and, my pet peeve, error handling. By default, in modern shells, this will actually succeed with no error:

  $ alias fail=exit 1
  $ find / | fail | wc -l; echo $?
  0
  0

You can turn on the "pipefail" option to remedy this:

  $ set -o pipefail
  $ find / | fail | wc -l; echo $?
  0
  1

Most scripts don't, because the option makes everything much stricter, and requires more error handling.

Of course, a lot of scripts also forget to enable the similarly strict "errexit" (-e) and "nounset" options (-u), which are also important in modern scripting.

There's another error that hardly anyone bothers to handle correctly:

  x=$(find / | fail | wc -l)

This sets x to "" because the command failed. The only way to test if this succeeded is to check $?, or use an if statement around it:

  if ! x=$(find / | fail | wc -l); then
    echo "Fail!" >&2
    exit 1
  fi

I don't think I've seen a script ever bother do this.

Of course, if you also want the error message from the command. If you want that, you have to start using name pipes or temporary files, with the attendant cleanup. Shell scripting is suddenly much more complicated, and the resulting scripts become much less fun to write.

And that's why shell scripts are so brittle.

codemac5y ago

Just use a better shell. rc handles this wonderfully, $? is actually called $status, and it's an array, depending on the number of pipes.

fomine35y ago

set -e makes another pain for command that nonzero isn't mean failed (ex. diff). It changes semantics for whole script.

atombender5y ago

If by pain you mean you have to handle errors, yes, that's what you have to do. It's no different from checking the return code of functions in C.

2 more replies

geophile5y ago

I love pipelines. I don't know the elaborate sublanguages of find, awk, and others, to exploit them adequately. I also love Python, and would rather use Python than those sublanguages.

I'm developing a shell based on these ideas: https://github.com/geophile/marcel.

ehsankia5y ago

Piping is great if you memorize the (often very different) syntax of every individual tool and memorize their flags, but in reality unless it's a task you're doing weekly, you'll have to go digging through MAN pages and documentation every time. It's just not intuitive. Still to date if I don't use `tar` for a few months, I need to lookup the hodge podge of letters needed to make it work.

Whenever possible, I just dump the data in Python and work from there. Yes some tasks will require a little more work, but it's work I'm very comfortable with since I write Python daily.

Your project looks like, but honestly iPython already lets me run shell commands like `ls` and pipe the results into real python. That's mostly what I do these days. I just use iPython as my shell.

tandav5y ago

+1, but I use jupyter instead of IPython

khimaros5y ago

The lispers/schemers in the audience may be interested in Rash https://docs.racket-lang.org/rash/index.html which lets you combine an sh-like language with any other Racket syntax.

cat1995y ago

also what I think is the 'original' in this domain, scsh

jraph5y ago

Your project looks really cool.

I am pretty sure I've seen a Python-based interactive shell a few years ago but I can't remember the name. Have you heard of it?

x1798DE5y ago

I imagine you are thinking of xonsh? https://xon.sh/

geophile5y ago

Do you mean IPython? My understanding is that IPython more of a REPL for Python, less of a shell.

ihumanable5y ago

Perhaps you are thinking of xonsh https://xon.sh/

Edit: x1798DE beat me to it :D

1 more reply

ketanmaheshwari5y ago

Unix pipelines are cool and I am all for it. In recent times however, I see that sometimes they are taken too far without realizing that each stage in the pipeline is a process and a debugging overhead in case something goes wrong.

A case in point is this pipeline that I came across in the wild:

In this case, perhaps awk would have absorbed 3 to 4 stages.

dfinninger5y ago

Oh man. That's when knowing more about the tools you are using comes in handy. Kubectl has native JSONPath support [0].

Or at the very least, use structured output with "-o json" and jq [1], like they mention in the article.

I have always found that trying to parse JSON with native shell tools has been difficult and error-prone.

[0] https://kubernetes.io/docs/reference/kubectl/jsonpath/ [1] https://stedolan.github.io/jq/

salmo5y ago

I think people can go through a few stages of their shell-foo.

The first involves a lot of single commands and temporary files.

The second uses pipes, but only tacks on commands with no refactoring.

The third would recognize that all the grep and cut should just be awk, that you can redirect the cumulative output of a control statement, that subprocesses and coroutines are your friend. We should all aspire to this.

The fourth stage is rare. Some people start using weird file descriptors to chuck data around pipes, extensive process substitution, etc. This is the Perl of shell and should be avoided. The enlightened return back to terse readable scripts, but with greater wisdom.

Also, I always hear of people hating awk and sed and preferring, say, python. Understanding awk and especially sed will make your python better. There really is a significant niche where the classic tools are a better fit than a full programming language.

young_unixer5y ago

My most complex use of the shell is defining aliases in .bashrc and have never felt the need to go further. Do you recommend learning all of that for someone like me?

If so, what resource do you recommend?

4 more replies

viraptor5y ago

> Understanding awk and especially sed will make your python better.

Why do you think so?

m0xte5y ago

Every time I see that sort of stuff I end up | perl, still after 20 odd years. Sometimes i fine perl is not installed and I die a little.

enriquto5y ago

In what unix system perl is not installed? It is in the default install on current versions of macos (where it comprises the majority of interpreted programs), openbsd, freebsd, ubuntu and fedora linux.

2 more replies

lordleft5y ago

A little bit of grep / awk goes very, very far.

jpxw5y ago

grep, awk, sed, cut, sort, uniq and join are the Swiss Army knife of working with tabulated data on the command line.

3 more replies

pwdisswordfish25y ago

The -E on the grep makes no sense. I do not see any ERE. Looks like everything here could be done with a single invocation of sed. Can anyone share some sample output of kubectl?

ecnahc5155y ago

This is a pretty bad example since kubectl can output JSON and using something like `jq` would have made it even simpler.

majewsky5y ago

Better yet, kubectl can take a jsonpath expression to select from the JSON. For example, to just get a name:

  kubectl get something -o jsonpath='{.items[*].metadata.name}'

russellbeattie5y ago

I think there's an interesting inflection point between piping different utilities together to get something done, and just whipping up a script to do the same thing instead.

First I'll use the command line to, say, grab a file from a URL, parse, sort and format it. If I find myself doing the same commands a lot, I'll make a .sh file and pop the commands in there.

But then there's that next step, which is where Bash in particular falls down: Branching and loops or any real logic. I've tried it enough times to know it's not worth it. So at this point, I load up a text editor and write a NodeJS script which does the same thing (Used to be Perl, or Python). If I need more functionality than what's in the standard library, I'll make a folder and do an npm init -y and npm install a few packages for what I need.

This is not as elegant as pipes, but I have more fine grained control over the data, and the end result is a folder I can zip and send to someone else in case they want to use the same script.

There is a way to make a NodeJS script listen to STDIO and act like another Unix utility, but I never do that. Once I'm in a scripting environment, I might as well just put it all in there so it's in one place.

mongol5y ago

I have never come across a NodeJS script instead of a shell script. Can you suggest an example to look at that shows the advantages of this?

russellbeattie5y ago

Here's a JS script [1] I wrote a little while ago just for my own use that queries the CDC for the latest virus numbers, then calcs the average, formats the data and prints to the command line. You can pass in a number of days, otherwise it pulls the last 14 days.

$ node query-cdc.js 7

It's nothing special, but I wouldn't want to try to do this with command line utilities. (And yes, it was a bit uglier, but I cleaned it up of random crap I had thrown in before posting it.)

1. https://gist.github.com/russellbeattie/f9edf91115b43d6d7ca3c...

1 more reply

falafel5y ago

There are plenty of CLI tools built on Node: https://github.com/sindresorhus/awesome-nodejs#command-line-...

devit5y ago

Bash has perfectly functional if, while and for constructs.

nojito5y ago

Debugging Linux pipelines is not a fun experience.

This is one clear area where Powershell with its object model has got it right.

jandrese5y ago

I don't find it too bad most of the time. If the pipeline isn't working you can cut it off at any juncture and watch standard out to see whats going on.

You can even redirect some half processed data to a file so you don't have to re-run the first half of the pipe over and over while you work out what's going wrong in the tail end.

pwdisswordfish25y ago

Please provide an example. Thank you.

fomine35y ago

PowerShell supports breakpoint.

https://docs.microsoft.com/en-us/powershell/module/microsoft...

1 more reply

antipaul5y ago

A similar philosophy has made the "tidyverse" a much-loved extension of the statistical language R.

Compare the following 2 equivalent snippets. Which one seems more understandable?

    iris_data %>%
        names() %>%
        tolower() %>%
        gsub(".", "_", ., fixed=TRUE) %>%
        paste0("(", ., ")")

or:

    paste0("(", gsub(".", "_", tolower(names(iris_data)), fixed=TRUE), ")")

chubot5y ago

Yup, I'm working on Oil shell and I wrote an article about data frames, comparing tidyverse, Pandas, and raw Python:

What Is a Data Frame? (In Python, R, and SQL)

http://www.oilshell.org/blog/2018/11/30.html

The idea is essentially to combine tidyverse and shell -- i.e. put structured data in pipes. I use both shell and R for data cleaning.

My approach is different than say PowerShell because data still gets serialized; it's just more strictly defined and easily parseable. It's more like JSON than in-memory objects.

The left-to-right syntax is nicer and more composable IMO, and many functional languages are growing this feature (Scala, maybe Haskell?). Although I think Unix pipes serve a distinct use case.

CalmStorm5y ago

Unix pipelines are indeed beautiful, especially when you consider its similarity to Haskell's monadic I/O: http://okmij.org/ftp/Computation/monadic-shell.html

Unix pipelines actually helped me make sense of Haskell's monad.

AnimalMuppet5y ago

Could you be more specific? I don't get it.

msla5y ago

It helps if you avoid the syntactic sugar of do-notation:

    main = getArgs >>= processData >>= displayData

main is in the IO monad. The >>= function takes a value wrapped in a monad and a function which accepts a value and returns a value wrapped in the same type of monad, and returns the same monad-wrapped value the function did. It can be used as an infix operator because Haskell allows that if you specify precedence, in which case its left side is the first argument (the value-in-a-monad) and its right side is the second argument (the function).

The (imaginary) function getArgs takes no arguments and returns a list of command-line arguments wrapped in an IO monad. The first >>= takes that list and feeds it into the function on its right, processData, which accepts it and returns some other value in an IO monad, which the third >>= accepts and feeds into displayData, which accepts it and returns nothing (or, technically, IO (), the unit (empty) value wrapped in an IO monad) which is main's return value.

See? Everything is still function application, but the mental model becomes feeding data through a pipeline of functions, each of which operates on the data and passes it along.

1 more reply

CalmStorm5y ago

In Unix:

    a; b    # Execute command b after a. The result of a is not used by b.

In Haskell:

    a >> b  # Run function b after a. The result of a is not used by b.

In Unix:

    a | b    # Execute command b after a. The result of a is used by b.

In Haskell:

    a >>= b  # Run function b after a. The result of a is used by b.

1 more reply

thesz5y ago

Pipelines are function composition, not monads.

For pipelines to be monadic, their structure must be able to change depending on the result on the part of pipeline. The type of binding operator hints at it: (>>=) :: m a -> (a -> m b) -> m b.

The second argument receives result of first computation and computes the way overall result (m b) will be computed.

As for program composition, I would like to add gstreamer to the mix: it allows for DAG communication between programs.

nsajko5y ago

Surprised there is no mention of Doug McIlroy: https://wiki.c2.com/?DougMcIlroy

lexpar5y ago

Interesting comment in the header of this site.

https://github.com/prithugoswami/personal-website/blob/maste...

nsajko5y ago

It's a meme: https://www.reddit.com/r/copypasta/comments/5we0ny/if_youre_...

giantrobot5y ago

It's from the naughty strings list. It's meant to be a "meat exploit" instead of an exploit for the computer.

rzmnzm5y ago

Powershell is the ultimate expression of the Unix pipeline imo.

Passing objects through the pipeline and being able to access this data without awk/sed incantations is a blessing for me.

I think anyone who appreciates shell pipelines and python can grok the advantages of the approach taken by Powershell, in a large way it is directly built upon existing an Unix heritage.

I'm not so good at explaining why, but for anyone curious please have a look at the Monad manifesto by Jeffrey Snover

https://devblogs.microsoft.com/powershell/monad-manifesto-th...

You may not agree with the implementation, but the ideas being it, I think, are worth considering.

akavel5y ago

I think it will be on topic if I let myself take this occasion to once again plug in a short public service announcement of an open-source tool I built, that helps interactively build Unix/Linux pipelines, dubbed "The Ultimate Plumber":

https://github.com/akavel/up/

I've also recently seen it being described in shorter words as a "sticky REPL for shell". Hope you like it, and it makes your life easier!

edeion5y ago

I'm surprised this doesn't get up-voted. The tool automates very nicely my shell command writing process: pipe one step at a time and check the incremental results by using head to get a sample. Looks cool to me!

tekert5y ago

To my understanding, this is the same pattern where every "object" outputs the same data type for other "objects" to consume. This pattern can have a text or gui representation that is really powerful in its own nature if you think about it, its why automation agents with their events consumption/emition are so powerful, its why the web itself shifts towards this pattern (json as comunication of data, code as object), The thing is, this will always be a higher level of abstraction, i think that a gui of this pattern should exist as a default method in most operating systems, its would solve a lot learning problems like learning all the names and options of objects, i would be the perfect default gui tool, actually, sites like zapier, or tools like huginn already do this pattern, i always wondered why this pattern expands so slowly being so useful.

tarkin25y ago

I wish there were gui pipes. Pipes are wonderful but they’re neither interactive nor continuous.

Pipes that loop, outputting a declarative gui text format, and listen for events from the gui, would be marvellous.

I can’t think how to do that without sockets and a bash loop. And that seems to create the kind of complexity that pipes manage to avoid.

1 more reply

benjaminoakes5y ago

I find gron to be much more Unix-y than jq. It "explodes" JSON into single lines for use with grep, sed, etc and can recombine back into JSON as well.

https://github.com/TomNomNom/gron

majewsky5y ago

Since jq is sed for JSON, by the transitive property, you're saying that sed is not Unix-y. ;)

Seriously though, I use both, and IMO they serve different purposes. gron is incredibly useful for exploring unknown data formats, especially with any form of

  something | gron | grep something

Once you've figured out how the data format in question works, a jq script is usually more succinct and precise than a chain of gron/{grep,awk,sed,...}/ungron.

So in practice, gron for prompts and jq for scripts.

parliament325y ago

Related: Pipelines can be 235x faster than a Hadoop cluster https://adamdrake.com/command-line-tools-can-be-235x-faster-...

theshadowmonkey5y ago

Pipes are like one of the best experiences you’ll have whatever you were doing. I was debugging a remote server logging millions of Logs a day and was aggregating a little on the server. Then all it required was wget, jq, sed and awk. And I had a powerful log analyzer than splunk or any other similar solution on a developer Mac. Which you think is awesome when you’re paying a fortune to use Splunk. And for getting some insights quick, Unix pipes are a godsend.

sedatk5y ago

It's ironic that the article ends with Python code. You could have done everything in Python in the first place and it would have probably been much more readable.

enriquto5y ago

> You could have done everything in Python in the first place and it would have probably been much more readable.

Python scripts that call a few third party programs are notoriously unreadable, full of subprocess call getouptut decode(utf8) bullshit. Python is alright as a language, but it is a very bad team player: it only really works when everything is written in Python, if you want to use things written in other languages it becomes icky really fast. Python projects that use parts written in other languages inevitably gravitate to being 100% Python. Another way to put it, is that Python is a cancer.

unnouinceput5y ago

But slower

enriquto5y ago

If you eval this simple pipeline

     file -b `echo $PATH:|sed 's/:/\/* /g'`|cut -d\  -f-2|sort|uniq -c|sort -n

it prints a histogram of the types of all the programs on your path (e.g., whether they are shell, python, perl scripts or executable binaries). How can you ever write such a cute thing in e.g., python or, god forbid, java?

junke5y ago

- This fails on badly formed PATH, or malicious PATH

- When PATH contains entries that no longer exist / are not mounted, substitution with "sed" gives a wildcard that is not expanded, and eventually makes "file" report an error, which is not filtered out

- If PATH contains entries that have spaces, the expansion is incorrect

enriquto5y ago

More realistically, it also fails if you have directories on your path that are symlinks to other directories in the path. In that case their programs are doubly-counted.

Anyways, if your PATH is malicious then you have worse problems than this silly script :)

2 more replies

viraptor5y ago

You don't. If you use it once, meh. If you share it with anyone or preserve for the future, why would you want it to be cute?

It's just a few lines in python, probably takes just as long to write because you don't have to play with what needs to be escaped and what doesn't. You can actually tell what's the intent of each line and it doesn't fail on paths starting with minuses or including spaces. Outside of one time use or a code golf challenge, it's not cute.

https://pastebin.com/FnBCgHUm

enriquto5y ago

This is not about "code golfing", the "sort|uniq -c|sort" combo is in the hall of fame of great code lines. The PATH thing in my script was an unnecessary distractor, consider this:

    file -b /bin/* /usr/bin/* |cut -d' ' -f-2|sort|uniq -c|sort -n

There are no bizarre escapes nor anything. Besides, the "file" program is called only once. The python equivalent that you wrote may be better if you want to store it somewhere, but it takes a lot to type, and it serves a different purpose. The shell line is something that you write once and run it because it falls naturally from your fingers. Moreover, you can run it everywhere, as it works in bash, zsh, pdksh and any old shell. The python version requires an appropriate python version installed (3 not 2).

2 more replies

stevefan19995y ago

I think it is terrible, as if everything is a text you can't precisely describe some data structure, e.g. circular list.

praveen99205y ago

I recently wrote a golang code for fetching rss feed and displaying gist based on requirement.

Looking at this code, I am tempted to reimplement this using pipes but saner mind took over and said "don't fix something that is not broken"

I probably would be still do it and get some benchmark numbers to compare both.

hi415y ago

Thank you! I did not know you get to a "sql like group by" using uniq -c. That's so cool! I think I used to pipe it to awk and count using an array and then display but your method is far better than mine.

quicklime5y ago

I use "sort | uniq -c" all the time, but I find it annoying that it does it in worse-than-linear time. It gets super slow every time I accidentally pass it a large input.

At that point I usually fall back to "awk '{ xs[$0]++ } END { for (x in xs) print xs[x], x }'", but it's quite long and awkward to type, so I don't do it by default every time.

At some point I'll add an alias or a script to do this. One day.

edit: OK I finally did it; it's literally been on my to-do list for about 10 years now, so thanks :)

$ cat ~/bin/count

#!/usr/bin/awk -f

{ xs[$0]++ } END { for (x in xs) print xs[x], x }

Silamoth5y ago

That was a great article! Pipes can definitely be very powerful. I will say, though, that I often find myself reading pages of documentation in order to actually get anything with Unix and its many commands.

miclill5y ago

Great write up. One thing I would add is how pipes do buffering / apply backpressure. To my understanding this is the "magic" that makes pipes fast and failsafe(r).

not2b5y ago

I've been using pipes for decades to get my work done, but it was cool to learn about jq as I have much less experience with JSON. It's a very cool program. Thanks.

sigjuice5y ago

Minor nitpick: It is Unix "pipes" (not pipelines).

lsofzz5y ago

I think this reply needs more up votes! ;)

kazinator5y ago

> If you append /export to the url you get the list in a .csv format.

Text filtering approach narrowly rescued by website feature.

Phew, that was close!

stormdennis5y ago

The video from 1982 is brilliant great explanations from an era before certain knowledge was assumed as generally known

mehrdadn5y ago

What kills me about pipelines is when I pipe into xargs and then suddenly can't kill things properly with Ctrl+C. Often I have to jump through hoops to parse arguments into arrays and avoid xargs just for this. (This has to do with stdin being redirected. I don't recall if there's anything particular about xargs here, but that's where it usually comes up.)

jakubnarebski5y ago

The first example in the article is what `git shortlog -n` does; no need for the pipeline.

JadeNB5y ago

What does it mean to say that the video shows "Kernighan being a complete chad"?

Cerium5y ago

Based on some searching, it seems calling someone a "Chad" is evoking the cool and attractive image that the name presents.

"Chad is a man who automatically and naturally turns girls on due to his appearance and demeanor" https://www.reddit.com/r/ForeverAlone/comments/6bgbc2/whats_...

Aaronstotle5y ago

Essentially another way of saying that Kernighan exudes confidence in his demeanor and excellence in the subject matter

globular-toast5y ago

According to Urban Dictionary:

"chad 1. Residue of faecal matter situated between arse cheeks after incomplete wiping and can spread to balls."

I have no idea why the author decided to use that term.

There is a related term "Chad" (capital C) which invokes the image of a man who is attractive to women. Again, I have no idea why the author decided to use that term.

dodwer5y ago

It's an alt-right in-joke related to the incel subculture.

Aaronstotle5y ago

I wouldn't be so quick to say it's an alt-right joke. Plenty of my friends will describe something/someone as a Chad and it has nothing to do with incel/alt-right cultures. Are those kind of phrases thrown out in those circles? Yes, but that's more of the general meme/internet lingo as opposed to subscribing to an ideology.

3 more replies

0xsnowcrash5y ago

Aka The Ugliness of Powershell. Would be fun to see equivalents to these in Powershell.

nickthemagicman5y ago

I have problems understanding what commands I can pipe. Some work some don't.

ketanmaheshwari5y ago

The commands that read their input from standard input could be used as receiver in a pipeline. For instance: `A | B` is a short form of `A > tmp.txt; B < tmp.txt; rm tmp.txt`

There are some commands that need literal arguments which is different than standard input. For instance the echo command. `echo < list.txt` will not work assuming you want to print the items inside the list. `echo itemA itemB itemC` will work. This is where xargs comes into play -- it converts the standard input stream to literal arguments among other things.

nickthemagicman5y ago

How do you know which ones receive input from standard input or how can you find out efficiently?

1 more reply

mehrdadn5y ago

--help is a great one. When you want to search the help output you can either feed it into grep or not... depending on whether the program writes to stdout or stderr. You can force writing to stdout via adding 2>&1 at the end of your command. (It's possible that's what you're running into.)

cptnapalm5y ago

As everything gets tossed into bin, there's no easy way to find out what is and what isn't a tool or filter rather than an application. It makes discovery difficult.

unnouinceput5y ago

Cygwin and you can do exactly this on Windows too. For me Cygwin is a bless.

hbarka5y ago

abinitio.com was borne from these principles.

Dilu85y ago

Good article. Thanks for information

staycoolboy5y ago

The first time I saw pipes in action I was coming from Apple and PC DOS land. It blew my mind. The re-usability of so many /bin tools, and being able to stuff my own into that pattern was amazing.

If you like pipes and multimedia, checkout gstreamer, it has taken the custom pipeline example to real-time.

known5y ago

The capacity of pipe buffer is important for doing serious producer|consumer work;

billfor5y ago

I think systemd can do all of that.

niko02215y ago

Esta genial Laexplicacion. Se aprende mucho aqui Soy nuevo y este articulo es bastante genial.

fogetti5y ago

I am sorry for playing the devil's advocate. I also think that pipes are extremely useful and a very strong paradigm, and I use them daily in my work. Also it is not an accident that it is fundamental and integral part of powershell too.

But is this really HN top page worthy? I have seen this horse beaten to death for decades now. These kind of articles have been around since the very beginning of the internet.

Am I missing something newsworthy which makes this article different from the hundreds of thousands of similar articles?

hu35y ago

Perhaps it's useful to reinforce pipes' utility to those who are not aware.

As for being top most article, if I had to guess it's because people feel related to the tool's power and enjoy sharing anecdotes of it.

adamnemecek5y ago

Unix pipes are the a 1970's construct, the same way bell bottom pants are. It's a construct that doesn't take into account the problems and scale of today's computing. Unicode? Hope your pipes process it fine. Video buffers? High perf? Fuggetaboutit. Piping the output of ls to idk what? Nice, I'll put it on the fridge.

JdeBP5y ago

An informed criticism would be based upon the knowledge that Doug McIlroy came up with the idea in the 1960s. (-:

j / k navigate · click thread line to collapse

374 comments

gorgoiler5y ago

Pipes are wonderful! In my opinion you can’t extol them by themselves. One has to bask in a fuller set of features that are so much greater than the sum of their parts, to feel the warmth of Unix:

(1) everything is text

(2) everything (ish) is a file

(3) including pipes and fds

(4) every piece of software is accessible as a file, invoked at the command line

(5) ...with local arguments

(6) ...and persistent globals in the environment

A lot of understanding comes once you know what execve does, though such knowledge is of course not necessary. It just helps.

Unix is seriously uncool with young people at the moment. I intend to turn that around and articles like this offer good material.

jcranmer5y ago

> (1) everything is text

gorgoiler5y ago

To criticize sh semantics without acknowledging that C was always there when you needed something serious is a bit short sighted.

There are two uses of the Unix “api”:

[A] Long lived tools for other people to use.

[B] Short lived tools one throws together oneself.

The fact that most things work most of the time is why the shell works so well for B, and why it is indeed a poor choice for the sort of stable tools designed for others to use, in A.

5 more replies

hnlmorg5y ago

That's a POSIX shell thing rather than a Unix pipeline thing. Some non-POSIX shells don't have this problem while still passing data long Unix pipes

Source: I wrote a shell and it solves a great many of the space / quoting problems with POSIX shells.

1 more reply

gricardo995y ago

>good luck figuring out ...

6gvONxR4sf7o5y ago

God help you if your paths have spaces in them.

4 more replies

jlg235y ago

Or you just pipe through the program of your choice for which you know the syntax:

  echo "foo bar fnord" | cut --delimiter " " --output-delimiter ":" -f1-

  # foo:bar:fnord

1 more reply

fomine35y ago

The fact that filename can contain anything except NULL/slash is really pain. I often write a shell script that treats LF as separator but I know it's not good.

2 more replies

ComputerGuru5y ago

1 more reply

chooseaname5y ago

All of which are so much easier to deal with than some binary format you have no spec on.

XML? SMH.

michaelcampbell5y ago

Ah yes, perfect is the enemy of the good.

pwdisswordfish25y ago

Works for me.

derefr5y ago

People struggle so much with this, but I don't see what the point is at all. The fundamental problem that playing with delimiters solves, is passing arbitrary strings through as single tokens.

Well, that's easy. Don't escape or quote the strings; encode them. Turn them into opaque tokens, and then do Unix things to the opaque tokens, before finally decoding them back to being strings.

There's a reason od(1) is in coreutils. It's a Unix fundamental when working with arbitrary data. Hex and base64 are your friends (and Unix tools are happy to deal with both.)

1 more reply

laumars5y ago

> everything is text

Everything is a byte stream. Usually that means text but sometimes it doesn't. Which means you can do fun stuff like:

- copy file systems over a network: https://docs.oracle.com/cd/E18752_01/html/819-5461/gbchx.htm...

- stream a file into gzip

- backup or restore an SD card using `cat`

jrumbut5y ago

Also dd, which may be the disk destroyer but is also a great tool for binary file miracles.

See one here: https://unix.stackexchange.com/questions/6852/best-way-to-re...

2 more replies

dvirsky5y ago

> Unix is seriously uncool with young people at the moment.

Those damn kids with their loud rockn'roll music and their Windows machines. Back in my day we had he vocal stylings of Dean Martin and the verbal stylings of Linus Torvalds let me tell ya.

abnry5y ago

I guess I am one of the young kids who think the unix command line is wicked cool. It makes the user experience on my laptop feel so much more powerful.

1 more reply

xnyan5y ago

Giving up the idea that CLI = pain (i.e figuring out how to to navigate a file system, ssh keys, etc) for sure was a learning curve, but now I can't imagine using computers without it.

1 more reply

bityard5y ago

> (1) everything is text

Not at all, you can pipe around all the binary you want. Until GNU tar added the 'z' option, the way to extract all files in a tarball was:

`gunzip -c < foo.tar.gz | tar x`

However, "text files" in Unix do have a very specific definition, if you want them to work with standard Unix text manipulation utilities like awk, sed, and diff:

All lines of text end are terminated by a line feed, even the last one.

I can't tell you how many "text files" I run across these days that don't have a trailing newline. It's as bad as mixing tabs and spaces, maybe worse.

1 more reply

ssivark5y ago

fao_5y ago

Yeah, but that makes every single file an object. Then you need plugins to deal with plain text objects, plugins to deal with every single type of object and their versions.

2 more replies

flukus5y ago

> For example, allowing passing around dicts/maps/json (and possibly other data structures) would already be a massive improvement.

It is allowed, you can pass around data in whatever encoding you desire. Not many do though because text is so useful for humans.

> You know what might be even better — passing around objects you could interact with by passing messages (gasp! cue, Alan Kay)

boomlinde5y ago

> For example, allowing passing around dicts/maps/json (and possibly other data structures) would already be a massive improvement.

> (if you squint, files look like a poor man’s version of objects imposed on top of bytestreams)

If all you have is a hammer...

cobalt5y ago

try powershell, its on linux too

1 more reply

jiggawatts5y ago

> Pipes are wonderful!

It's wonderful only if compared to worse things, pretending that PowerShell is not a thing, and that Python doesn't exist.

UNIX pipes are a stringly-typed legacy that we've inherited from the 1970s. The technical constraints of its past have been internalised by its proponents, and lauded as benefits.

The myth of UNIX pipes is that it works like this:

    process1 | process2 | process3 | ...

The reality is that physically, it actually works like this:

   (process1,serialize) | (parse,process2,serialize) | ...

If you want to see this in action, check out my little "challenge" that I posed in a similar thread on YC recently:

https://news.ycombinator.com/item?id=23257776

The solution provided by "JoshuaDavid" really impressed me, because I was under the impression that that simple task is actually borderline impossible with UNIX pipes and GNU tools:

https://news.ycombinator.com/item?id=23267901

Compare that to the much simpler equivalent in PowerShell:

https://news.ycombinator.com/item?id=23270291

Especially take note half of that script is sample data!

> Unix is seriously uncool with young people at the moment. I intend to turn that around and articles like this offer good material.

UNIX is entirely too cool with young people. They have conflated the legacy UNIX design of the 1970s with their preferred open source software, languages, and platforms.

There are better things out there, and UNIX isn't the be all and end all of system management.

matheusmoreira5y ago

> Where each one of those "parse" and "serialise" steps is unique and special, inflexible, and poorly documented.

This can also be a security concern. According to research, a great number of defects with security implications occur at the input handling layers:

http://langsec.org

2 more replies

scbrg5y ago

   The reality is that physically, it actually works like this:

      (process1,serialize) | (parse,process2,serialize) | ...

But as soon as you involve the network or persistent storage, you need to do all that anyway. And one of the beauties is that the tools are agnostic to where the data comes from, or goes.

2 more replies

gorgoiler5y ago

Typed streams sound great.

Most data being line oriented (with white space field separators) has historically worked well enough, but I get your point that the wheels will eventually fall off.

It’s important to remember that the shell hasn’t always been about high concept programming tasks.

  $ grep TODO xmas-gifts
  grandma TODO
  auntie Sian TODO
  jeanie MASTODON T shirt

The bug (bugs!) in the above example doesn’t really matter in the context of the task at hand.

imglorp5y ago

More Lego rules:

(7) and common signalling facility

(8) also nice if some files have magic properties like /dev/random or /proc or /dev/null

(9) every program starts with 3 streams, stdin/stdout for work and stderr for out of band errors

enedil5y ago

(9) every program starts with 3 streams, stdin/stdout for work and stderr for out of band errors

This is false - if a process closes stdin/out/err, its children won't have these files open.

1 more reply

hhas015y ago

I defy anyone to build their error-handling logic off the back of stderr…

1 more reply

matheusmoreira5y ago

1. All data is bytes

2. Everything is a file descriptor

3. File descriptors are things that support standard I/O system calls such as read and write

Koshkin5y ago

You are describing Plan 9.

ryandrake5y ago

> (2) everything (ish) is a file

I'm not all that knowledgable about Unix history, but one thing that has always puzzled me was that for whatever reason network connections (generally) aren't files. While I can do:

  cat < /dev/ttyS0

to read from a serial device, I've always wondered why I can't do something like:

  bind /tmp/mysocket 12.34.56.78 80
  cat < /tmp/mysocket

spijdar5y ago

[0] http://doc.cat-v.org/plan_9/4th_edition/papers/net/

DSMan1952765y ago

1 more reply

jandrese5y ago

Berkeley sockets were kind of bolted on as an afterthought. If the AT&T guys developed them they would probably look a lot more like that.

1 more reply

philsnow5y ago

elsewhere in this thread somebody mentions socat, but you can do it entirely within bash. from https://www.linuxjournal.com/content/more-using-bashs-built-... :

    exec 3<>/dev/tcp/www.google.com/80
    echo -e "GET / HTTP/1.1\r\nhost: http://www.google.com\r\nConnection: close\r\n\r\n" >&3
    cat <&3

1 more reply

moonchild5y ago

In the c api, you can read/write to sockets. Wouldn't be difficult to implement something like your example with a named pipe.

Additionally, there is /dev/tcp in bash.

yjftsjthsd-h5y ago

Sounds like you want netcat?

1 more reply

devchix5y ago

>(1) everything is text

>(2) everything (ish) is a file

macintux5y ago

I worked around that by writing a Python script to let me pick the groups of interest, download all of the data for a certain time period, and collapse them into a CSV file ordered by time stamp.

Yes, it’s very annoying I have to do all that, but I commend Bezos for his foresight in demanding everything be driven by APIs.

quickthrower25y ago

Can you give me any clue as to what execve does? I looked at the man page but none the wiser. Sounds like magic from what I read there. I'm from a Windows backgroud and not used to pipes.

kccqzy5y ago

2 more replies

JdeBP5y ago

Well you have a Windows background, so you know BASIC, right? And you know that BASIC has a CHAIN statement, right? (-:

* https://en.wikibooks.org/wiki/QBasic/Appendix#CHAIN

And there are umpteen ways of improving on that, from JP Software's Take Command to various flavours of Unix-alike tools. And you should see some of the things that people do with FOR /F .

1 more reply

gorgoiler5y ago

Its how you launch (execute) a new program.

Such programs typically don’t have any interaction so the v and the e parts refer to the two ways you can control what the program does.

v: a vector (list) of input parameters (aka arguments)

e: an environment of key-value pairs

The executed program can interpret these two in any way it wishes to, though there are many conventions that are followed (and bucked by the contrarians!) like using “-“ to prefix flags/options.

This is in addition to any data sent to the program’s standard input — hence the original discussion about using pipes to send the output of one command to the input of another.

hhas015y ago

“(1) everything is text”

LOL. What fantasy land are you living in?

Pipes are great. Untyped, untagged pipes are not. They’re a frigging disaster in every way imaginable.

“Unix is seriously uncool with young people at the moment.”

Unix is half a century old. It’s old enough to be your Dad. Hell, it’s old enough to be your Grandad! It’s had 50 years to better itself and it hasn’t done squat.

Linux? Please. That’s just a crazy cat lady living in Unix’s trash house.

Come back when you’ve a modern successor to Plan 9 that is easy, secure, and doesn’t blow huge UI/UX chunks everywhere.

Shared4045y ago

>Unix is seriously uncool with young people...

Not all of us. I much prefer the Unix way of doing things, it just makes more sense then just trying to become another Windows.

pjmlp5y ago

Everything is a file until one needs to do UNIX IPC, networking or high performance graphics rendering.

lsofzz5y ago

> Unix is seriously uncool with young people at the moment. I intend to turn that around and articles like this offer good material.

I offer you my well wishes on that.

ianai5y ago

How’s it uncool? Just not “ mobile”?

gorgoiler5y ago

Exactly.

“Linux, ew!”

3 more replies

ma2rten5y ago

I was also skeptical of that statement. I did a google trends search and indeed there seems to be a slow decline in command-line related searches:

https://trends.google.com/trends/explore?date=all&q=bash,awk...

remexre5y ago

My complaints with unix (as someone running linux on every device, starting to dip my toe into freebsd on a vps); apologies for lack of editing:

> everything is text

> everything (ish) is a file

> including pipes and fds

To my mind, this is much less elegant when most of these things don't support seeking, and fnctls have to exist.

It'd be nicer if there were something to declare interfaces like Varlink [0, 1], and a shell that allowed composing pipelines out of them nicely.

> every piece of software is accessible as a file, invoked at the command line

> ...with local arguments

> ...and persistent global in the environment

Sure, mostly fine; serialization still a wart for arguments + env vars, but one I run into much less

> and common signalling facility

> also nice if some files have magic properties like /dev/random or /proc or /dev/null

userspace programs can't really extend these though, unless they expose FUSE filesystems, which sounds terrible and nobody does

also, this results in things like [2]...

> every program starts with 3 streams, stdin/stdout for work and stderr for out of band errors

iow, things get trickier once I need more than one input one one output. :)

I also prefer something like syslog over stderr, but again this is an argument for more structured things.

[0]: https://varlink.org/

[1]: ideally with sum type support though, and maybe full dependent types like https://dhall-lang.org/

[2]: https://www.phoronix.com/scan.php?page=news_item&px=UEFI-rm-...

2 more replies

xkucf035y ago

Unix is seriously cool and I consider myself still young :-) But it could be better. See https://relational-pipes.globalcode.info/

cowmix5y ago

I use pipelines as much as the next guy but every time I see post praise how awesome they are, I'm reminded of the Unix Hater's Handbook. Their take on pipelines is pretty spot on too.

http://web.mit.edu/~simsong/www/ugh.pdf

cuddlybacon5y ago

code-faster5y ago

It's less that pipelines are friction, they're really not.

It's more that people like building features and people don't like saying no to features.

The original unix guys had a rare culture that was happy to knock off unnecessary features.

rnestler5y ago

ken5y ago

junke5y ago

In a way web apps are then more alike standard unix stuff, where you parse whatever output you get, hoping that it has enough usable structure to do an acceptable job.

The most reusable web apps are those that offer an API, with JSON/XML data formats where you can easily automate your work, and connect them together.

1 more reply

the_af5y ago

It hasn't aged very well, either. "Even today, the X server turns fast computers into dumb terminals" hasn't been true for at least a couple of decades...

Quekid55y ago

> It hasn't aged very well, either. "Even today, the X server turns fast computers into dumb terminals" hasn't been true for at least a couple of decades...

2 more replies

msla5y ago

1 more reply

zamalek5y ago

shakna5y ago

If you're having to extract your data using a regex, then the data probably isn't well-formed enough for a shell pipeline. It's doable, but a bad idea.

Regex should not be the first hammer you reach for, because it's a scalpel.

I recently wanted cpu cores + 1. That could be a single regex. But this is more maintainable, and readable:

    echo '1 + '"$(grep 'cpu cores' /proc/cpuinfo | tail -n1 | awk -F ':' '{print $2}')" | bc

6 more replies

julianeon5y ago

I love UNIX but I think on this point I'm with the haters.

Practically speaking, as an individual, I don't have the needs of the DMV. For my personal use I'm combing through pip-squeaky data file downloads and CSV's.

fenwick675y ago

From the intro:

> As for me? I switched to the Mac.

It's amusing that Apple would go on to switch to a unix-based OS in '01.

pjmlp5y ago

Apple released their first UNIX based OS in 1988.

Exercise for the reader to find out its name.

marcosdumay5y ago

Hum... Those other abstractions on that section are great and everything, but it really sucks to write them interactively.

Pipelines also lack some concept similar to exceptions, but it would also suck to handle those interactively.

cryptonector5y ago

  set -o pipefail

1 more reply

Silamoth5y ago

Thanks for sharing! I'll have to give that a read sometime. It looks quite interesting.

ColanR5y ago

I read the introduction. Sounds like Norman would have liked to hear about Plan 9.

dialamac5y ago

I have a hard time believing he would have been unfamiliar with Plan 9. It wasn’t exactly obscure in the research community at the time. See the USENIX proceedings in the late 80s and 90s.

This is mere speculation, but I doubt he would have appreciated Plan 9.

1 more reply

atombender5y ago

  $ alias fail=exit 1
  $ find / | fail | wc -l; echo $?
  0
  0

You can turn on the "pipefail" option to remedy this:

  $ set -o pipefail
  $ find / | fail | wc -l; echo $?
  0
  1

Most scripts don't, because the option makes everything much stricter, and requires more error handling.

Of course, a lot of scripts also forget to enable the similarly strict "errexit" (-e) and "nounset" options (-u), which are also important in modern scripting.

There's another error that hardly anyone bothers to handle correctly:

  x=$(find / | fail | wc -l)

This sets x to "" because the command failed. The only way to test if this succeeded is to check $?, or use an if statement around it:

  if ! x=$(find / | fail | wc -l); then
    echo "Fail!" >&2
    exit 1
  fi

I don't think I've seen a script ever bother do this.

And that's why shell scripts are so brittle.

codemac5y ago

Just use a better shell. rc handles this wonderfully, $? is actually called $status, and it's an array, depending on the number of pipes.

fomine35y ago

set -e makes another pain for command that nonzero isn't mean failed (ex. diff). It changes semantics for whole script.

atombender5y ago

If by pain you mean you have to handle errors, yes, that's what you have to do. It's no different from checking the return code of functions in C.

2 more replies

geophile5y ago

I love pipelines. I don't know the elaborate sublanguages of find, awk, and others, to exploit them adequately. I also love Python, and would rather use Python than those sublanguages.

I'm developing a shell based on these ideas: https://github.com/geophile/marcel.

ehsankia5y ago

Whenever possible, I just dump the data in Python and work from there. Yes some tasks will require a little more work, but it's work I'm very comfortable with since I write Python daily.

Your project looks like, but honestly iPython already lets me run shell commands like `ls` and pipe the results into real python. That's mostly what I do these days. I just use iPython as my shell.

tandav5y ago

+1, but I use jupyter instead of IPython

khimaros5y ago

The lispers/schemers in the audience may be interested in Rash https://docs.racket-lang.org/rash/index.html which lets you combine an sh-like language with any other Racket syntax.

cat1995y ago

also what I think is the 'original' in this domain, scsh

jraph5y ago

Your project looks really cool.

I am pretty sure I've seen a Python-based interactive shell a few years ago but I can't remember the name. Have you heard of it?

x1798DE5y ago

I imagine you are thinking of xonsh? https://xon.sh/

geophile5y ago

Do you mean IPython? My understanding is that IPython more of a REPL for Python, less of a shell.

ihumanable5y ago

Perhaps you are thinking of xonsh https://xon.sh/

Edit: x1798DE beat me to it :D

1 more reply

ketanmaheshwari5y ago

A case in point is this pipeline that I came across in the wild:

In this case, perhaps awk would have absorbed 3 to 4 stages.

dfinninger5y ago

Oh man. That's when knowing more about the tools you are using comes in handy. Kubectl has native JSONPath support [0].

Or at the very least, use structured output with "-o json" and jq [1], like they mention in the article.

I have always found that trying to parse JSON with native shell tools has been difficult and error-prone.

[0] https://kubernetes.io/docs/reference/kubectl/jsonpath/ [1] https://stedolan.github.io/jq/

salmo5y ago

I think people can go through a few stages of their shell-foo.

The first involves a lot of single commands and temporary files.

The second uses pipes, but only tacks on commands with no refactoring.

young_unixer5y ago

My most complex use of the shell is defining aliases in .bashrc and have never felt the need to go further. Do you recommend learning all of that for someone like me?

If so, what resource do you recommend?

4 more replies

viraptor5y ago

> Understanding awk and especially sed will make your python better.

Why do you think so?

m0xte5y ago

Every time I see that sort of stuff I end up | perl, still after 20 odd years. Sometimes i fine perl is not installed and I die a little.

enriquto5y ago

2 more replies

lordleft5y ago

A little bit of grep / awk goes very, very far.

jpxw5y ago

grep, awk, sed, cut, sort, uniq and join are the Swiss Army knife of working with tabulated data on the command line.

3 more replies

pwdisswordfish25y ago

The -E on the grep makes no sense. I do not see any ERE. Looks like everything here could be done with a single invocation of sed. Can anyone share some sample output of kubectl?

ecnahc5155y ago

This is a pretty bad example since kubectl can output JSON and using something like `jq` would have made it even simpler.

majewsky5y ago

Better yet, kubectl can take a jsonpath expression to select from the JSON. For example, to just get a name:

  kubectl get something -o jsonpath='{.items[*].metadata.name}'

russellbeattie5y ago

I think there's an interesting inflection point between piping different utilities together to get something done, and just whipping up a script to do the same thing instead.

First I'll use the command line to, say, grab a file from a URL, parse, sort and format it. If I find myself doing the same commands a lot, I'll make a .sh file and pop the commands in there.

This is not as elegant as pipes, but I have more fine grained control over the data, and the end result is a folder I can zip and send to someone else in case they want to use the same script.

mongol5y ago

I have never come across a NodeJS script instead of a shell script. Can you suggest an example to look at that shows the advantages of this?

russellbeattie5y ago

$ node query-cdc.js 7

It's nothing special, but I wouldn't want to try to do this with command line utilities. (And yes, it was a bit uglier, but I cleaned it up of random crap I had thrown in before posting it.)

1. https://gist.github.com/russellbeattie/f9edf91115b43d6d7ca3c...

1 more reply

falafel5y ago

There are plenty of CLI tools built on Node: https://github.com/sindresorhus/awesome-nodejs#command-line-...

devit5y ago

Bash has perfectly functional if, while and for constructs.

nojito5y ago

Debugging Linux pipelines is not a fun experience.

This is one clear area where Powershell with its object model has got it right.

jandrese5y ago

I don't find it too bad most of the time. If the pipeline isn't working you can cut it off at any juncture and watch standard out to see whats going on.

You can even redirect some half processed data to a file so you don't have to re-run the first half of the pipe over and over while you work out what's going wrong in the tail end.

pwdisswordfish25y ago

Please provide an example. Thank you.

fomine35y ago

PowerShell supports breakpoint.

https://docs.microsoft.com/en-us/powershell/module/microsoft...

1 more reply

antipaul5y ago

A similar philosophy has made the "tidyverse" a much-loved extension of the statistical language R.

Compare the following 2 equivalent snippets. Which one seems more understandable?

    iris_data %>%
        names() %>%
        tolower() %>%
        gsub(".", "_", ., fixed=TRUE) %>%
        paste0("(", ., ")")

or:

    paste0("(", gsub(".", "_", tolower(names(iris_data)), fixed=TRUE), ")")

chubot5y ago

Yup, I'm working on Oil shell and I wrote an article about data frames, comparing tidyverse, Pandas, and raw Python:

What Is a Data Frame? (In Python, R, and SQL)

http://www.oilshell.org/blog/2018/11/30.html

The idea is essentially to combine tidyverse and shell -- i.e. put structured data in pipes. I use both shell and R for data cleaning.

My approach is different than say PowerShell because data still gets serialized; it's just more strictly defined and easily parseable. It's more like JSON than in-memory objects.

The left-to-right syntax is nicer and more composable IMO, and many functional languages are growing this feature (Scala, maybe Haskell?). Although I think Unix pipes serve a distinct use case.

CalmStorm5y ago

Unix pipelines are indeed beautiful, especially when you consider its similarity to Haskell's monadic I/O: http://okmij.org/ftp/Computation/monadic-shell.html

Unix pipelines actually helped me make sense of Haskell's monad.

AnimalMuppet5y ago

Could you be more specific? I don't get it.

msla5y ago

It helps if you avoid the syntactic sugar of do-notation:

    main = getArgs >>= processData >>= displayData

See? Everything is still function application, but the mental model becomes feeding data through a pipeline of functions, each of which operates on the data and passes it along.

1 more reply

CalmStorm5y ago

In Unix:

    a; b    # Execute command b after a. The result of a is not used by b.

In Haskell:

    a >> b  # Run function b after a. The result of a is not used by b.

In Unix:

    a | b    # Execute command b after a. The result of a is used by b.

In Haskell:

    a >>= b  # Run function b after a. The result of a is used by b.

1 more reply

thesz5y ago

Pipelines are function composition, not monads.

For pipelines to be monadic, their structure must be able to change depending on the result on the part of pipeline. The type of binding operator hints at it: (>>=) :: m a -> (a -> m b) -> m b.

The second argument receives result of first computation and computes the way overall result (m b) will be computed.

As for program composition, I would like to add gstreamer to the mix: it allows for DAG communication between programs.

nsajko5y ago

Surprised there is no mention of Doug McIlroy: https://wiki.c2.com/?DougMcIlroy

lexpar5y ago

Interesting comment in the header of this site.

https://github.com/prithugoswami/personal-website/blob/maste...

nsajko5y ago

It's a meme: https://www.reddit.com/r/copypasta/comments/5we0ny/if_youre_...

giantrobot5y ago

It's from the naughty strings list. It's meant to be a "meat exploit" instead of an exploit for the computer.

rzmnzm5y ago

Powershell is the ultimate expression of the Unix pipeline imo.

Passing objects through the pipeline and being able to access this data without awk/sed incantations is a blessing for me.

I think anyone who appreciates shell pipelines and python can grok the advantages of the approach taken by Powershell, in a large way it is directly built upon existing an Unix heritage.

I'm not so good at explaining why, but for anyone curious please have a look at the Monad manifesto by Jeffrey Snover

https://devblogs.microsoft.com/powershell/monad-manifesto-th...

You may not agree with the implementation, but the ideas being it, I think, are worth considering.

akavel5y ago

https://github.com/akavel/up/

I've also recently seen it being described in shorter words as a "sticky REPL for shell". Hope you like it, and it makes your life easier!

edeion5y ago

tekert5y ago

tarkin25y ago

I wish there were gui pipes. Pipes are wonderful but they’re neither interactive nor continuous.

Pipes that loop, outputting a declarative gui text format, and listen for events from the gui, would be marvellous.

I can’t think how to do that without sockets and a bash loop. And that seems to create the kind of complexity that pipes manage to avoid.

1 more reply

benjaminoakes5y ago

I find gron to be much more Unix-y than jq. It "explodes" JSON into single lines for use with grep, sed, etc and can recombine back into JSON as well.

https://github.com/TomNomNom/gron

majewsky5y ago

Since jq is sed for JSON, by the transitive property, you're saying that sed is not Unix-y. ;)

Seriously though, I use both, and IMO they serve different purposes. gron is incredibly useful for exploring unknown data formats, especially with any form of

  something | gron | grep something

Once you've figured out how the data format in question works, a jq script is usually more succinct and precise than a chain of gron/{grep,awk,sed,...}/ungron.

So in practice, gron for prompts and jq for scripts.

parliament325y ago

Related: Pipelines can be 235x faster than a Hadoop cluster https://adamdrake.com/command-line-tools-can-be-235x-faster-...

theshadowmonkey5y ago

sedatk5y ago

It's ironic that the article ends with Python code. You could have done everything in Python in the first place and it would have probably been much more readable.

enriquto5y ago

> You could have done everything in Python in the first place and it would have probably been much more readable.

unnouinceput5y ago

But slower

enriquto5y ago

If you eval this simple pipeline

     file -b `echo $PATH:|sed 's/:/\/* /g'`|cut -d\  -f-2|sort|uniq -c|sort -n

junke5y ago

- This fails on badly formed PATH, or malicious PATH

- If PATH contains entries that have spaces, the expansion is incorrect

enriquto5y ago

More realistically, it also fails if you have directories on your path that are symlinks to other directories in the path. In that case their programs are doubly-counted.

Anyways, if your PATH is malicious then you have worse problems than this silly script :)

2 more replies

viraptor5y ago

You don't. If you use it once, meh. If you share it with anyone or preserve for the future, why would you want it to be cute?

https://pastebin.com/FnBCgHUm

enriquto5y ago

This is not about "code golfing", the "sort|uniq -c|sort" combo is in the hall of fame of great code lines. The PATH thing in my script was an unnecessary distractor, consider this:

    file -b /bin/* /usr/bin/* |cut -d' ' -f-2|sort|uniq -c|sort -n

2 more replies

stevefan19995y ago

I think it is terrible, as if everything is a text you can't precisely describe some data structure, e.g. circular list.

praveen99205y ago

I recently wrote a golang code for fetching rss feed and displaying gist based on requirement.

Looking at this code, I am tempted to reimplement this using pipes but saner mind took over and said "don't fix something that is not broken"

I probably would be still do it and get some benchmark numbers to compare both.

hi415y ago

quicklime5y ago

I use "sort | uniq -c" all the time, but I find it annoying that it does it in worse-than-linear time. It gets super slow every time I accidentally pass it a large input.

At that point I usually fall back to "awk '{ xs[$0]++ } END { for (x in xs) print xs[x], x }'", but it's quite long and awkward to type, so I don't do it by default every time.

At some point I'll add an alias or a script to do this. One day.

edit: OK I finally did it; it's literally been on my to-do list for about 10 years now, so thanks :)

$ cat ~/bin/count

#!/usr/bin/awk -f

{ xs[$0]++ } END { for (x in xs) print xs[x], x }

Silamoth5y ago

miclill5y ago

Great write up. One thing I would add is how pipes do buffering / apply backpressure. To my understanding this is the "magic" that makes pipes fast and failsafe(r).

not2b5y ago

I've been using pipes for decades to get my work done, but it was cool to learn about jq as I have much less experience with JSON. It's a very cool program. Thanks.

sigjuice5y ago

Minor nitpick: It is Unix "pipes" (not pipelines).

lsofzz5y ago

I think this reply needs more up votes! ;)

kazinator5y ago

> If you append /export to the url you get the list in a .csv format.

Text filtering approach narrowly rescued by website feature.

Phew, that was close!

stormdennis5y ago

The video from 1982 is brilliant great explanations from an era before certain knowledge was assumed as generally known

mehrdadn5y ago

jakubnarebski5y ago

The first example in the article is what `git shortlog -n` does; no need for the pipeline.

JadeNB5y ago

What does it mean to say that the video shows "Kernighan being a complete chad"?

Cerium5y ago

Based on some searching, it seems calling someone a "Chad" is evoking the cool and attractive image that the name presents.

"Chad is a man who automatically and naturally turns girls on due to his appearance and demeanor" https://www.reddit.com/r/ForeverAlone/comments/6bgbc2/whats_...

Aaronstotle5y ago

Essentially another way of saying that Kernighan exudes confidence in his demeanor and excellence in the subject matter

globular-toast5y ago

According to Urban Dictionary:

"chad 1. Residue of faecal matter situated between arse cheeks after incomplete wiping and can spread to balls."

I have no idea why the author decided to use that term.

There is a related term "Chad" (capital C) which invokes the image of a man who is attractive to women. Again, I have no idea why the author decided to use that term.

dodwer5y ago

It's an alt-right in-joke related to the incel subculture.

Aaronstotle5y ago

3 more replies

0xsnowcrash5y ago

Aka The Ugliness of Powershell. Would be fun to see equivalents to these in Powershell.

nickthemagicman5y ago

I have problems understanding what commands I can pipe. Some work some don't.

ketanmaheshwari5y ago

The commands that read their input from standard input could be used as receiver in a pipeline. For instance: `A | B` is a short form of `A > tmp.txt; B < tmp.txt; rm tmp.txt`

nickthemagicman5y ago

How do you know which ones receive input from standard input or how can you find out efficiently?

1 more reply

mehrdadn5y ago

cptnapalm5y ago

As everything gets tossed into bin, there's no easy way to find out what is and what isn't a tool or filter rather than an application. It makes discovery difficult.

unnouinceput5y ago

Cygwin and you can do exactly this on Windows too. For me Cygwin is a bless.

hbarka5y ago

abinitio.com was borne from these principles.

Dilu85y ago

Good article. Thanks for information

staycoolboy5y ago

The first time I saw pipes in action I was coming from Apple and PC DOS land. It blew my mind. The re-usability of so many /bin tools, and being able to stuff my own into that pattern was amazing.

If you like pipes and multimedia, checkout gstreamer, it has taken the custom pipeline example to real-time.

known5y ago

The capacity of pipe buffer is important for doing serious producer|consumer work;

billfor5y ago

I think systemd can do all of that.

niko02215y ago

Esta genial Laexplicacion. Se aprende mucho aqui Soy nuevo y este articulo es bastante genial.

fogetti5y ago

But is this really HN top page worthy? I have seen this horse beaten to death for decades now. These kind of articles have been around since the very beginning of the internet.

Am I missing something newsworthy which makes this article different from the hundreds of thousands of similar articles?

hu35y ago

Perhaps it's useful to reinforce pipes' utility to those who are not aware.

As for being top most article, if I had to guess it's because people feel related to the tool's power and enjoy sharing anecdotes of it.

adamnemecek5y ago

JdeBP5y ago

An informed criticism would be based upon the knowledge that Doug McIlroy came up with the idea in the 1960s. (-:

j / k navigate · click thread line to collapse