Pipe Viewer (opens in new tab)

(ivarch.com)

183 points0x45696e61723y ago48 comments

48 comments

dang3y ago

PV (Pipe Viewer) – add a progress bar to most command-line programs - https://news.ycombinator.com/item?id=23826845 - July 2020 (2 comments)

A Unix Utility You Should Know About: Pipe Viewer - https://news.ycombinator.com/item?id=8761094 - Dec 2014 (1 comment)

Pipe Viewer - https://news.ycombinator.com/item?id=5942115 - June 2013 (1 comment)

Pipe Viewer - https://news.ycombinator.com/item?id=4020026 - May 2012 (26 comments)

A Unix Utility You Should Know About: Pipe Viewer - https://news.ycombinator.com/item?id=462244 - Feb 2009 (63 comments)

londons_explore3y ago

It would be nice to indicate if the upstream or the downstream is the 'limiting' factor in speed.

Ie. within pv, is it the reading the input stream or the writing the output stream that is blocking most of the time?

tanelpoder3y ago

I have written a (free OSS) Linux tool called `psn` for this kind of stuff [1]. It samples each interesting thread's state from /proc/PID/task/TID/status,wchan,syscall etc and shows you a summary of which threads were blocked in which state, in which syscalls and where in the kernel were they stuck (wchan). It can be used with applications like pv, tar, dd, mysqld, httpd, etc.

[1] https://0x.tools/#linux-process-snapper

bingaling3y ago

it's instantaneous, but the -T (transfer buffer % full display) is sometimes useful for that. (0% full -> source limited, 100% full -> sink limited)

Twirrim3y ago

Oh wow, I'd completely missed that -T flag. That's some useful data. Thanks for mentioning it!

ketralnis3y ago

It's open source, be the change you want to see in the world

kotlin23y ago

Having maintained an open source library, it’s actually really helpful to see features people want. Not everyone needs to contribute directly to the code base. User feedback is valuable, too.

ranger_danger3y ago

Unfortunately not everyone is a developer.

senjin3y ago

This would be a genius addition

michaelmior3y ago

Probably my favorite non-POSIX tool that I insert into my pipelines whenever anything takes more than a few second. I find it super helpful to avoid premature optimization. If I can quickly see that my hacked together pipeline will run in a few minutes and I only ever need to do that once, I'll probably just let it finish. If it's going to take a few hours, I might decide it's worth optimizing.

It also helps me optimize my time. If something is going to finish in a few minutes, I probably won't context switch to another major task. However, if something is going to take a few hours then I'll probably switch to work on something different knowing approximately when I can go back and check on results.

systems_glitch3y ago

Same, one of the first utilities I install on a new system.

JayGuerette3y ago

pv is a great tool. One of it's lesser known features is throttling; transfer a file without dominating your bandwidth:

pv -L 200K < bigfile.iso | ssh somehost 'cat > bigfile.iso'

Complete with a progress bar, speed, and ETA.

smcl3y ago

Oh damn that's neat I never thought to use `ssh` directly when transferring a file, I always used `scp bigfile.iso name@server.org:path/in/destination`

fbergen3y ago

Also see `scp -l 200 bigfile.iso name@server.org:path/in/destination`

from man page:

-l limit

Limits the used bandwidth, specified in Kbit/s.

1 more reply

tyingq3y ago

A similar trick that's nice is piping tar through ssh. Handy if you don't have rsync or something better around. Even handy for one file, since it preserves permissions, etc.

tar -cf - some/dir | ssh remote 'cd /place/to/go && tar -xvf -'

1 more reply

dspillett3y ago

Similarly, though useful less often these days, using -B/--buffer-size to increase the amount that it can buffer. If reading data from traditional hard drives, piping that data through some process, and writing the result back to the same drives, this option can increase throughput significantly by reducing head movements. It can help on other storage systems too, but usually not so much so.

cwillu3y ago

pv -d $(pidof xz):1 is great for when you realize too late that something is slow enough that you want a progress indication, and definitely do not want to restart from scratch.

dspillett3y ago

Another good option for that, which works in a number of other useful circumstances too, is progress: https://github.com/Xfennec/progress

xuhu3y ago

How `pv -d` work ? Does it use perf probes or attach to the target PID ?

remram3y ago

It finds the file using /proc/<pid>/fd/<num> and watches its size grow. It doesn't work with pipes, devices, a file being overwritten (not appended to), or anything whose size doesn't grow.

1 more reply

cwillu3y ago

It appears to monitor the contents of /proc/‹pid›/fdinfo/‹fd›

derefr3y ago

As a person who runs a lot of ETL-like commands at work, I never find myself using pv(1). I love the idea of it, but for the commands I most want to measure progress of, they always seem to be either:

1. things where I'd be paranoid about pv(1) itself becoming the bottleneck in the pipeline — e.g. dd(1) of large disks where I've explicitly set a large blocksize and set conv=idirect/odirect, to optimize throughput.

2. things where the program has some useful cleverness I rely on that requires being fed by a named file argument, but behaves a lot less intelligently when being fed from stdin — e.g. feeding SQL files into psql(1).

3. things where the program, even while writing to stdout, also produces useful "sampled progress" informational messages on stderr, which I'd like to see; where pv(1) and this output logging would fight each-other if both were running.

4. things where there's no clean place to insert pv(1) anyway — mostly, this comes up for any command that manages jobs itself in order to do things in parallel, e.g. any object-storage-client mass-copy, or any parallel-rsync script. (You'd think these programs would also report global progress, but they usually don't!)

I could see pv(1) being fixed to address case 3 (by e.g. drawing progress while streaming stderr-logged output below it, using a TUI); but the other cases seem to be fundamental limitations.

Personally, when I want to observe progress on some sort of operation that's creating files (rsync, tar/untar, etc), here's what I do instead: I run the command-line, and then, in a separate terminal connected to the machine the files are being written/unpacked onto, I run this:

    # for files
    watch -n 2 -- ls -lh $filepath

    # for directories
    watch -n 4 -- du -h -d 0 $dirpath

If I'm in a tmux(1) session, I usually run the file-copying command in one pane, and then create a little three-vertical-line pane below it to run the observation command.

Doing things this way doesn't give you a percentage progress, but I find that with most operations I already know what the target's goal size is going to be, so all I really need to know is the size-so-far. (And pv(1) can't tell you the target size in many cases anyway.)

invalidator3y ago

Try using "pv -d <pid>". It will monitor open files on the process and report progress on them.

1) this gets it out of the pipeline. 2) the program gets to have the named arguments. 3) pv's out put is on a separate terminal. 4) your job never needs to know.

Downside: it only sees the currently open files, so it doesn't work well for batch jobs. Still, it's handy to see which file it's on, and how fast the progress is.

Also, for rsync: "--info=progress2 --no-i-r" will show you the progress for a whole job.

gpderetta3y ago

IIRC pv uses splice internally and simply tells the kernel to mive pipe buffers from one pipe to the other, so it is very unlikely to be a bottleneck.

derefr3y ago

In the dd(1) case, we're talking about "having any pipe involved at all" vs "no pipe, just copying internal to the command." The Linux kernel pipe buffer size is only 64KB, while my hand-optimized `bs` usually lands at ~2MB. There's a big performance gap introduced by serially copying tiny (non-IO-queue-saturating) chunks at a time — it can literally be a difference of minutes vs. hours to complete a copy. Especially when there's high IO latency on one end, e.g. on IaaS network disks.

leni5363y ago

For rsync to get reliable global progress there is --no-i-r --info=progress2 . --no-i-r adds a bit of upfront work, but it's well worth it IMO.

derefr3y ago

Thanks for that! (I felt like I had to be missing something, with how useless rsync progress usually was.)

prmoustache3y ago

Sometimes you prefer predictability and information over sheer speed. If do a very large transfer that could take hours, I'd rather trade a bit of speed to know the progress and make sure nothing is stuck than launching in the blind and then repeat slow and expensive du commands to know where I am in the transfer or have to strace the process.

derefr3y ago

> slow and expensive du commands

You'd be surprised how cheap these du(1) can be when you're running the same du(1) command over and over. Think of it like running the same SQL query over and over — the first time you do it, the DBMS takes its time doing IO to pull the relevant disk pages into the disk cache; but the Nth≥2 time, the query is entirely over "hot" data. Hot filesystem metadata pages, in this case. (Plus, for the file(s) that were just written by your command, the query is hot because those pages are still in memory from being recently dirty.)

I regularly unpack tarballs containing 10 million+ files; and periodic du(1) over these takes only a few milliseconds of wall-clock time to complete.

(The other bottleneck with du(1), for deep file hierarchies, is printing all the subdirectory sizes. Which is why the `-d 0` — to only print the total.)

You might be worried about something else thrashing the disk cache, but in my experience I've never needed to run an ETL-like job on a system that's also running some other completely orthogonal IO-heavy prod workload. Usually such jobs are for restoring data onto new systems, migrating data between systems, etc.; where if there is any prod workload running on the box, it's one that's touching all the same data you're touching, and so keeping disk-cache coherency.

MayeulC3y ago

I usually fix 3. by redirecting the intermediate program to stderr before piping to pv.

My main use-case is netcat (nc).

As an aside, I prefer the BSD version, which I find is superior (IPv6 support, SOCKS, etc). "GNU Netcat" isn't even part of the GNU project, AFAIK. I also discovered Ncat while writing this, from the Nmap project; I'll give it a try.

derefr3y ago

I don't quite understand what you mean — by default, most Unix-pipeline-y tools that produce on stdout, if they log at all, already write their logs to stderr (that being why stderr exists); and pv(1) already also writes to stderr (as if it wrote its progress to stdout, you wouldn't be able to use it in a pipe!)

But pv(1) is just blindly attempting to emit "\r[progress bar ASCII-art]\n" (plus a few regular lines) to stderr every second; and interleaving that into your PTY buffer along with actual lines of stderr output from your producer command, will just result in mush — a barrage of new progress bars on new lines, overwriting any lines emitted directly before them.

Having two things both writing to stderr, where one's trying to do something TUI-ish, and the other is attempting to write regular text lines, is the problem statement of 3, not the solution to it.

A solution, AFAICT, would look more like: enabling pv(1) to (somehow) capture the stderr of the entire command-line, and manage it, along with drawing the progress bar. Probably by splitting pv(1) into two programs — one that goes inside the command-line, watches progress, and emits progress logs as specially-tagged little messages (think: the UUID-like heredoc tags used in MIME-email binary-embeds) without any ANSI escape codes; and another, which wraps your whole command line, parsing out the messages emitted by the inner pv(1) to render a progress bar on the top/bottom of the PTY buffer, while streaming the regular lines across the rest of the PTY buffer. (Probably all on the PTY secondary buffer, like less(1) or a text editor.)

Another, probably simpler, solution would be to have a flag that tells pv(1) to log progress "events" (as JSON or whatever) to a named-FIFO filepath it would create (and then delete when the pipeline is over) — or to a loopback-interface TCP port it would listen on — and otherwise be silent; and then to have another command you can run asynchronously to your command-line, to open that named FIFO/connect to that port, and consume the events from it, rendering them as a progress bar; which would also quit when the FIFO gets deleted / when the socket is closed by the remote. Then you could run that command, instead of watch(2), in another tmux(2) pane, or wherever you like.

2 more replies

trabant003y ago

I've used it mostly to measure events per second with something like:

  tail -f /some/log | grep something | pv -lr > /dev/null
  or
  tcpdump expression | pv -lr > /dev/null

heinrich59913y ago

`progress` is also a nice tool to see progress of programs operating linearly on a single file. A lot of tools do that!

pkrumins3y ago

I wrote a pv tutorial: https://catonmat.net/unix-utilities-pipe-viewer

Drybones3y ago

Funny timing as I recently, as of yesterday, found out about pv when searching for a way to view the progress of a zfs send and receive

Another utility I found out about is “progress” available at least on debian systems. It can monitor stuff like cp and mv without actually being used in the command

Kukumber3y ago

Can someone change the URL to https? https://www.ivarch.com/programs/pv.shtml

herpderperator3y ago

Can I just say, there's something so nice about this webpage. It's information-dense and well organised. I wish we had more of this today.

torgard3y ago

There are countless times where I would have found this incredibly helpful. Just 10 minutes ago, I wanted this exact tool.

Thanks!

est3y ago

pv was the tool when I discovered sometimes the VPS have only 10Gbps memory copy speed.

sigmonsays3y ago

i've consistently lost and found this tool over and over again for over 20 years

pbhjpbhj3y ago

Same, `apropos $keyword` helps, but strangely in this case doesn't find `progress` from `apropos progress`.

TT-3923y ago

But... pipe-viewer was already a commandline youtube browser

dima553y ago

pv predates youtube itself

j / k navigate · click thread line to collapse

48 comments

dang3y ago

PV (Pipe Viewer) – add a progress bar to most command-line programs - https://news.ycombinator.com/item?id=23826845 - July 2020 (2 comments)

A Unix Utility You Should Know About: Pipe Viewer - https://news.ycombinator.com/item?id=8761094 - Dec 2014 (1 comment)

Pipe Viewer - https://news.ycombinator.com/item?id=5942115 - June 2013 (1 comment)

Pipe Viewer - https://news.ycombinator.com/item?id=4020026 - May 2012 (26 comments)

A Unix Utility You Should Know About: Pipe Viewer - https://news.ycombinator.com/item?id=462244 - Feb 2009 (63 comments)

londons_explore3y ago

It would be nice to indicate if the upstream or the downstream is the 'limiting' factor in speed.

Ie. within pv, is it the reading the input stream or the writing the output stream that is blocking most of the time?

tanelpoder3y ago

[1] https://0x.tools/#linux-process-snapper

bingaling3y ago

it's instantaneous, but the -T (transfer buffer % full display) is sometimes useful for that. (0% full -> source limited, 100% full -> sink limited)

Twirrim3y ago

Oh wow, I'd completely missed that -T flag. That's some useful data. Thanks for mentioning it!

ketralnis3y ago

It's open source, be the change you want to see in the world

kotlin23y ago

Having maintained an open source library, it’s actually really helpful to see features people want. Not everyone needs to contribute directly to the code base. User feedback is valuable, too.

ranger_danger3y ago

Unfortunately not everyone is a developer.

senjin3y ago

This would be a genius addition

michaelmior3y ago

systems_glitch3y ago

Same, one of the first utilities I install on a new system.

JayGuerette3y ago

pv is a great tool. One of it's lesser known features is throttling; transfer a file without dominating your bandwidth:

pv -L 200K < bigfile.iso | ssh somehost 'cat > bigfile.iso'

Complete with a progress bar, speed, and ETA.

smcl3y ago

Oh damn that's neat I never thought to use `ssh` directly when transferring a file, I always used `scp bigfile.iso name@server.org:path/in/destination`

fbergen3y ago

Also see `scp -l 200 bigfile.iso name@server.org:path/in/destination`

from man page:

-l limit

Limits the used bandwidth, specified in Kbit/s.

1 more reply

tyingq3y ago

A similar trick that's nice is piping tar through ssh. Handy if you don't have rsync or something better around. Even handy for one file, since it preserves permissions, etc.

tar -cf - some/dir | ssh remote 'cd /place/to/go && tar -xvf -'

1 more reply

dspillett3y ago

cwillu3y ago

pv -d $(pidof xz):1 is great for when you realize too late that something is slow enough that you want a progress indication, and definitely do not want to restart from scratch.

dspillett3y ago

Another good option for that, which works in a number of other useful circumstances too, is progress: https://github.com/Xfennec/progress

xuhu3y ago

How `pv -d` work ? Does it use perf probes or attach to the target PID ?

remram3y ago

It finds the file using /proc/<pid>/fd/<num> and watches its size grow. It doesn't work with pipes, devices, a file being overwritten (not appended to), or anything whose size doesn't grow.

1 more reply

cwillu3y ago

It appears to monitor the contents of /proc/‹pid›/fdinfo/‹fd›

derefr3y ago

I could see pv(1) being fixed to address case 3 (by e.g. drawing progress while streaming stderr-logged output below it, using a TUI); but the other cases seem to be fundamental limitations.

    # for files
    watch -n 2 -- ls -lh $filepath

    # for directories
    watch -n 4 -- du -h -d 0 $dirpath

If I'm in a tmux(1) session, I usually run the file-copying command in one pane, and then create a little three-vertical-line pane below it to run the observation command.

invalidator3y ago

Try using "pv -d <pid>". It will monitor open files on the process and report progress on them.

1) this gets it out of the pipeline. 2) the program gets to have the named arguments. 3) pv's out put is on a separate terminal. 4) your job never needs to know.

Downside: it only sees the currently open files, so it doesn't work well for batch jobs. Still, it's handy to see which file it's on, and how fast the progress is.

Also, for rsync: "--info=progress2 --no-i-r" will show you the progress for a whole job.

gpderetta3y ago

IIRC pv uses splice internally and simply tells the kernel to mive pipe buffers from one pipe to the other, so it is very unlikely to be a bottleneck.

derefr3y ago

leni5363y ago

For rsync to get reliable global progress there is --no-i-r --info=progress2 . --no-i-r adds a bit of upfront work, but it's well worth it IMO.

derefr3y ago

Thanks for that! (I felt like I had to be missing something, with how useless rsync progress usually was.)

prmoustache3y ago

derefr3y ago

> slow and expensive du commands

I regularly unpack tarballs containing 10 million+ files; and periodic du(1) over these takes only a few milliseconds of wall-clock time to complete.

(The other bottleneck with du(1), for deep file hierarchies, is printing all the subdirectory sizes. Which is why the `-d 0` — to only print the total.)

MayeulC3y ago

I usually fix 3. by redirecting the intermediate program to stderr before piping to pv.

My main use-case is netcat (nc).

derefr3y ago

Having two things both writing to stderr, where one's trying to do something TUI-ish, and the other is attempting to write regular text lines, is the problem statement of 3, not the solution to it.

2 more replies

trabant003y ago

I've used it mostly to measure events per second with something like:

  tail -f /some/log | grep something | pv -lr > /dev/null
  or
  tcpdump expression | pv -lr > /dev/null

heinrich59913y ago

`progress` is also a nice tool to see progress of programs operating linearly on a single file. A lot of tools do that!

pkrumins3y ago

I wrote a pv tutorial: https://catonmat.net/unix-utilities-pipe-viewer

Drybones3y ago

Funny timing as I recently, as of yesterday, found out about pv when searching for a way to view the progress of a zfs send and receive

Another utility I found out about is “progress” available at least on debian systems. It can monitor stuff like cp and mv without actually being used in the command

Kukumber3y ago

Can someone change the URL to https? https://www.ivarch.com/programs/pv.shtml

herpderperator3y ago

Can I just say, there's something so nice about this webpage. It's information-dense and well organised. I wish we had more of this today.

torgard3y ago

There are countless times where I would have found this incredibly helpful. Just 10 minutes ago, I wanted this exact tool.

Thanks!

est3y ago

pv was the tool when I discovered sometimes the VPS have only 10Gbps memory copy speed.

sigmonsays3y ago

i've consistently lost and found this tool over and over again for over 20 years

pbhjpbhj3y ago

Same, `apropos $keyword` helps, but strangely in this case doesn't find `progress` from `apropos progress`.

TT-3923y ago

But... pipe-viewer was already a commandline youtube browser

dima553y ago

pv predates youtube itself

j / k navigate · click thread line to collapse