Perl 5.8.0 is over 20 years old (https://dev.perl.org/perl5/news/2002/07/18/580ann/) while centOS 3.9 was released in 2007! At the same time it seems not-that-old and ancient.
My personal anecdote with gnu parallel was running into it while working in academia. It worked well and saved me some time, but I felt that it was unreasonable of a tool to ask for a citation to parallelise a script - it seemed that matplotlib, jupyter and co would need one as well. On the other hand, I decided to not use it, because I also feel that authors can ask for whatever they want.
It still works, though you would have to archive/vendor dependencies
But I still think making citation for gnu parallel is unreasonable. There is a huge body of software, of which gnu parallel is probably the least important, that contributed to (at least my) research. Blowing up citation lists with those makes the citation list borderline useless.
It makes citations into advertising space for software - it's bad enough being coerced to make it an advertisement for reviewers papers.
I would have thought it's black magic with assembler optimisations for MIPS and special considerations for HP-UX...
This is such a lovely and interesting writeup, it's wonderful that people take their time to share so generously!
[1] : an 11k loc petal script, you can read along here: https://github.com/gitGNU/gnu_parallel/blob/master/src/paral...
A sample use case would be having a file that has words in it, one per line, and you want to run a program that operates on each word (device name, dollar amount, whatever). Sure, you can use a loop, but if the words and actions are independent, parallel is one way to spin up N copies of your program and pass it a single word from the file. Can get around Python's GIL without having to use multiprocessing or threads (as a more concrete example).
Didn't realise that it busy waits, but I'm typically running it on a not very busy server with tens of cores.
A) You don't understand. Please read the "Citation notice" section in the article.
B) You understand but don't use GNU Parallel.
C) You understand and use GNU Parallel in a non-academic setting and find the hassle of supplying --no-notice to be onerous vs the effort to write/maintain your own tool.
D) You understand and use GNU Parallel in an academic setting and have cited Ole or plan to cite Ole.
From the article, nearly 10 years ago Ole added the citation behavior after discussing it with his users: https://lists.gnu.org/archive/html/parallel/2013-11/msg00006...
Ole's citations took off roughly coincident with this behavior being added: https://scholar.google.com/citations?hl=en&user=D7I0K34AAAAJ... (click "Cited By" and notice the bar chart).
But quite useless as it'll print poorly and is overall a waste of resources to have that lovely beach scene in the background.
https://zenodo.org/record/1146014/files/GNU_Parallel_2018.pd...
The i7 on my laptop with quite a few CPUS/threads and a few optimisations got the job finished in 10 minutes.
(I later put the Hadoop use on my resume, not the GNU parallel. That's the joke of modern job hunting. There is no interested in what you did, just buzzwords and leetcode. Luckily there are still a few places that value real work or I'd be too old to get a job. :) )
https://github.com/shenwei356/rush
I use it pretty extensively with ffmpeg, imagemagick and the like.
I'd been using the mmstick/parallel for a while, but it moved to RedoxOS repos and then stopped being updated, while still having some issues not ironed out.
And with the `--halt now,done=1` option (that I think is relatively recent?) it means that if any of the parallel processes exit, parallel would exit itself, the whole container will shut down, and external orchestration would start another one if needed.
Example of installing it in a Debian/Ubuntu container during container build, here's an example Dockerfile:
RUN apt-get update \
&& apt-get -yq --no-upgrade install \
supervisor \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists /var/cache/apt/*
Then it's possible to create a configuration file, for example /etc/supervisord.conf, to specify what should run and how: [supervisord]
nodaemon=true
[program:php-fpm]
command=/usr/sbin/php-fpm8.0 -c /etc/php/8.0/fpm/php-fpm.conf --nodaemonize
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:nginx]
command=/usr/sbin/nginx
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
And finally it can be run inside of the container entrypoint, along the lines of this in docker-entrypoint.sh: #!/bin/bash
echo "Software versions..."
nginx -V && supervisord --version
echo "Running Supervisor..."
supervisord --configuration=/etc/supervisord.conf
Here's more information about the configuration file format, in case anyone is curious: http://supervisord.org/configuration.htmlIt should be noted that this package will bring in some dependencies, though, which may or may not be okay, depending on how stringent you are about space usage and what's in your containers, example for a Ubuntu container:
The following NEW packages will be installed:
libexpat1 libmpdec3 libpython3-stdlib libpython3.10-minimal libpython3.10-stdlib libreadline8 libsqlite3-0 media-types
python3 python3-minimal python3-pkg-resources python3.10 python3.10-minimal readline-common supervisor
0 upgraded, 15 newly installed, 0 to remove and 0 not upgraded.
Need to get 6905 kB of archives.
After this operation, 25.7 MB of additional disk space will be used.
(just found the piece of software itself useful for this use case, figured I'd share my experiences)My problem is that it's not always immediately clear how software that would normally run as a systemd service could be launched in the foreground instead. It usually takes a bit of digging around.
But if inside Docker when something else already has the job of restarting things if they fall over, then it feels a bit over complicated in that there are multiple ways of doing the restarting. Plus, I think there is a touch more visibility - it's all just command line arguments to parallel:
parallel --will-cite --line-buffer --jobs 2 --halt now,done=1 ::: \
"some_proc some args" \
"another_proc some more args"Which is a shame - 95% of my make usage is PHONY targets where I have a task and not a generated artifact. My current use case would have greatly benefited from the native parallelism and the ability to restart only failed files.
These are a must have today.
- entr. It runs a command on file/directory changes.
- spt. Simple pomodoro technique. A good timer to help yourself to work and take rests.
- herbe. It works great as a notifier for spt. Add "play" from sox to write a script to both
notify and play a sound in parallel.
- sox/ffmpeg/imagemagick. Audio, video and image production and conversion on the CLI. A must have.
- catdoc/antiword/odt2txt/wordgrinder+sc-im+gnuplot. Word/Excel/Libreoffice files reading and editing on the terminal. Gnuplot with help with sc-im. This can be a beast over SSH. With Gnuplot compiled with sixel support (and XTerm) you can do magic.
- iomenu - cat bookmarks.txt | iomenu | xargs firefox. Pick from a list of items (one per line) and choose. I think it has fuzzy-finding matches.
I have several more. Simple battery meter (sbm), grabc to grab a color from the screen,
pointtools+catpoint to do "presentations" over a terminal, nncp-go+yggdrasil for
ad-hoc networking and secure encrypted backups between devices...No need for massive distributed clusters when you have a simple perl oneliner
seq 0 10000 | parallel dd if=/dev/urandom of=/mnt/foo/input bs=10M count=10 seek={}0 dd if=/dev/urandom of=/mnt/foo/input bs=10M count=100000
in the amount of time that it took?