I write my fair share of shell scripts and I've hit practically every one of these snags in the past. However, for the majority of tasks I perform with bash, I genuinely don't care if I support spaces in filenames, or if I throw away a little efficiency with a few extra sub-shells, or if I can't test numbers vs strings or have a weird notion of booleans.
Your scripts are going to have bugs. The important question is: What happens when they fail?
Are your scripts idempotent? Are they audit-able? Interruptible? Do you have backups before performing destructive operations? How do you verify that they did the right job?
For example, if your shell scripts operate only on files under version control, you can simply run a diff before committing. Rather than spent a bunch of time tracking down a word expansion bug, you can simply rename that one file that failed to not include a space in its name.
That said, I long ago reached a place where I realized that, while shell scripting is entertaining, I'd much rather write anything more than a handful of lines in a general purpose programming language. Perl, Python, Ruby, whatever - even PHP involves far less syntactic suffering and general impedance than Bash. It's not that I'm exceptionally worried about correctness in stuff that no one besides me is ever going to use, it's just that once you're past a certain very low threshold of complexity, the agony you spend for a piece of reusable code is so much less. Even just stitching together some standard utilities, there are plenty of times it'll take a tenth as long and a thousandth as much swearing to just write some Perl that uses backticks here and there or mangles STDIN as needed.
> Are your scripts idempotent? Are they
> audit-able? Interruptible? Do you have
> backups before performing destructive
> operations? How do you verify that they
> did the right job?
Every single one of these questions is easier to answer if you're using a less agonizing language than Bash and its relatives.I disagree. While the set of things that are "hard" to do is probably larger in shell than the alternatives, the specific questions posed by the grandparent are hard in any language. They all boil down to "how can I correctly do something which has side effects (on external state)?"
Statefulness itself is a pain, and shell is in some sense the ultimate language for simply and flexibly dealing with external state.
Simplicity: the filesystem is an extremly simple and powerful state representation. Show me a language that interacts with the fs more concisely than
tr '[A-Z]' '[a-z]' < upper.txt > lower.txt
Flexibility: if shell can't do it, just use another program in another language that can, like `tr` in the above example. What other language enables polyglot programming like this? Literally any program in any language can become a part of a shell program.> it's just that once you're past a certain very low threshold of complexity, the agony you spend for a piece of reusable code is so much less.
Here's where I admit I was playing devil's advocate to an extent, because I fully agree with you here. I write lots of shell scripts. I never write big shell scripts. Above some length they just get targeted for replacement in a "real" language, or at the very least, portions of them get rewritten so they can remain small.
Empirically, it also seems true that shell is harder for people to grasp, harder to read, and harder for people to get right. These are real costs that have to be figured in.
PS. Speaking of shell brennen, we should be working on our weekend project. :)
Sometimes though you just need to get stuff done with the least amount of fuss, without worrying about the extreme edge-cases which the majority of the webpage attached to this story talk about. Heck, it's probably the majority of bash work I'd say that ends up like that.
And having learned the correct way, you'll instantly see it when a script you review is doing something in a way that will eventually bite someone.
There's no downsides to learning and doing things right. Of course it does take some extra time and effort at the start, as everything..
Sure, there are extreme examples that can be really hard to handle portably and safely if you're doing something more complicated (embedded newlines in filenames come to mind). So in the end some corner cutting is often inevitable :-)
(I suspect PowerShell would be a good environment to take design cues from or even port, but I've never used it so I can't say for sure.)
Have you looked at scsh, the Scheme Configurable Shell? It's a nice clean language with a REPL (well, it's just scheme) that has syntax for all the usual things you'd want in a unix shell -- pipelines, redirections, environment, signals, etc.
As for a port, you're welcome to contribute on Pash (https://github.com/Pash-Project/Pash) – it's still woefully incomplete, sadly.
I hold my nose and write shell, even as I look over at, for instance, scsh, and think ... yeah.
I've been working on one, in my spare time for about a year. I can't tell you how exciting it is. It's still a ways from being ready for actual use, but I'll have a website up in a couple of weeks to showcase the approach, and will surely post to HN when it's up.
Anyway, I enjoy using the fish shell, and you may too.
People do occasionally bitch and moan about slow downloads from my home host, but nobody ever offered money for a pain-free alternative.
http://bash.cumulonim.biz/BashPitfalls.html is a mirror, see mlacitation's comment.
The "Unix should be hard" crew has gotten a lot quieter in the last ten years with the rise of Ubuntu and other relatively user-friendly distros, but I feel like there's still an underlying current of elitism there; people are proud of mastering these bizarre, arcane methods, and they're offended that someone else might be able to accomplish just as much without doing half as much work.
There's many problems with it, but the only one I've run into that keeps it from being more useful is that there are no multi dimensional arrays built-in. There are super hacky ways I have seen them implemented, but by default it's something I basically never am able to turn to when scripting in bash and have to turn to other languages, even when the particular task I was working with would be mostly simpler in bash.
That said, there are associative arrays in bash these days.
for i in $(ls *.mp3); do
some command $i
done
So does that mean, that "for" will do something per word of the output of $, rather than per line of output of it?What to do if I want to do something for every line? What for example if I really want the output of ls, find (or any other command you can put int he $()) and loop through that line per line, even if some output has spaces?
Thanks.
# Loops over lines, in a subshell
prints_lines | while read -r line; do
some_command "$line"
done
# Loops over lines, in the current shell
while read -r line; do
some_command "$line"
done < <(prints_lines)
# If for some reason you really want a for loop
oldIFS=$IFS
IFS=$'\n' lines=($(prints_lines))
IFS=$oldIFS
for line in "${lines[@]}"; do
some_command "$line"
doneCorrect. The argument to "for" is a list of words.
> What to do if I want to do something for every line?
Use a while loop.
find /some/dir/ -type f |
while read -r line; do
; # something with $line
done
PS. You should almost always use `find` instead of `ls` in shell scripts. Given a pattern, `ls` will exit non-zero if nothing matches it, and you should be treating non-zero exits like you would exceptions in other languages.We should absolutely ban special characters from names. Specifically, all whitespace, the colon, semicolon, forward slash, backward slash, question mark, star, ampersand, and whatever else I'm missing that will confuse the shell. Also files cannot start with a dash.
However, people should be able to name files with these characters. So I propose that these characters in filenames be percent-encoded like they would be in a URL. Specifically, the algorithm should be
1. Take the file name and encode it as UTF-8. Enforce some sort of normalization.
2. Substitute each problematic byte with equivalent percent-encoded form. This does not touch bytes over 0x80 - they are assumed non-problematic.
3. Write the file in the file system under that name.
4. When displaying files, run the algorithm in reverse.
In the general case files like "01 - Don't Eat the Yellow Snow.mp3" would simply become 01%20-%20Don't%20Eat%20the%20Yellow%20Snow.mp3 in the filesystem and cause absolutely no further problems. To make it completely backwards-compatible we should also add the following rule: If a filename includes a problematic byte or a percent-encoded byte higher than 0x80, then it is assumed to be raw and will not undergo percent decoding.
Basically, I propose that every program which receives free text input for a file name percent-encode the filenames before writing them to the filesystem and decode them for display. Everything else remains unchanged.
Why this will not work:
Requiring programmers to keep track of two filenames instead of just one is rather a lot of work. File APIs will have to take both encoded and non-encoded forms and encode the non-encoded form, creating problems when people inadvertently use the wrong function with a name, either double-encoding it or not encoding it and leading to "this file does not exist" errors.
It will be possible to create two files with different names on disk which are nonetheless shown with the same name to the user.
Why it is ugly:
We're taping over a deficiency of an ancient language by inflicting pain on programmers.
Double-encoded filenames? MADNESS.
Why I like it:
I'll be able to have ?, * and : in filenames in windows.
My shell scripts will be much simpler.
What do you guys think?
You know what's crazy? Currently, in Unix, control characters are allowed in filenames. Like, \t and \n and \b and even \[. Those shouldn't be allowed, percent-escaped or not. Everything else you said is sensible.
That being said, enforcing such restrictions in upper layers brings pain as well, because suddenly you can have files that you cannot delete anymore (happens sometimes on Windows).
Non-percent-encoded control chars are strictly verboten. The VFS layer should contain a ban list of bytes (or codepoints) not allowed as part of a filename. It won't be a large list, just every nonprintable character from ASCII, every blank character (space, tab, newline, vertical tab, carriage return etc.), the characters /\;:?* - and that's it. This list should cover everything that might be problematic in windows OR linux OR MacOS. For full compatibility, we must also add the %uxxxx and %Uxxxxxxxx percent escapes for arbitrary unicode codepoints (I can sense that it might make sense to also escape all the unicode spaces, combining characters and the like, to make file manipulation from the shell easier).
It sounds sort of sensible, but we're dealing with two layers of encoding here, leading to three byte sequences.
1. You have a string the user entered. That's just a generic name which can be anything.
2. You take that string and substitute "problematic" characters with their percent encoded form. For example, every space becomes %20, non-breakable space might become %ua0, or it might be left alone
3. You now have a string of unicode codepoints, which are all "clean". This is encoded yet again to a sequence of bytes that are stored by the filesystem.
At least the second coding is done by the system, either by the standard file manipulation routines, or by the filesystem itself.
But it is the first one that seems infeasible. It has to be done at a layer above the standard "open" function and I can see developers being very confused on what and how to escape.
You know, maybe the answer might be not to have every other program do all this complicated dancing, but for the shell itself to escape filenames when it reads them. So when you say "cmd file%20with%20space", cmd is called with argument one set to "file with space". And when ls or find lists files, bad characters can be replaced with their percent-encoded forms. And xargs can unescape them.
I'll need to think about it some more again.
for arg
instead of for arg in "$@"
is gold. That is going straight to the pool room.