There are two uses of the Unix “api”:
[A] Long lived tools for other people to use.
[B] Short lived tools one throws together oneself.
The fact that most things work most of the time is why the shell works so well for B, and why it is indeed a poor choice for the sort of stable tools designed for others to use, in A.
The ubiquity of the C APIs of course solved [A] use cases in the past, when it was unconscionable to operate a system without cc(1). It’s part of why they get first class treatment in the Unix man pages, as old fashioned as that seems nowadays.
And the only reason I might be pushed down that path is because the task I'm working on happens to involve filenames with spaces in them (without those spaces, the code would work fine!), because spaces are a reasonable thing to put in a filename unless you're on a Unix system.
Putting spaces on a filename is atrocious and should be disallowed by modern filesystems. It is like if you could put spaces inside variable names in Python. Ridiculous.
But when you realize that space isn't a valid separator between token, seeing things like "Class to Pull Info From Database::Extractor Tool" actually becomes much easier to read and the language becomes highly expressive, helped somewhat by the insane integration into the firm's systems.
I was on your side until I tried it, but it can actually be quite useful, esp. if everything is consistent.
Also, some languages do allow whitespace in variable names like R and SQL, so long as the variable names are quoted or escaped properly.
It's like people saying you don't need to escape SQL values because they come from constants. Yes, they do... today.
It's not just quoting either. It's setting the separator value and reverting it correctly. It's making sure you're still correct when you're in a function in an undefined state. It's a lot of overhead in larger projects.
For example, there is no way to store the `a b "c d"` in a regular variable in a way where you can then call something similar to `ls $VAR` and get the equivalent of `ls a b "c d"`. You can either get the behavior of `ls a b c d` or `ls "a b c d"`, but if you need `ls a b "c d"` you must go for an array variable with new syntax. This isn't necessarily a big hurdle, but it indicates that the concepts are hard to grasp and possibly inconsistent.
$ exec(‘ls’, ‘-l’, ‘A B C’)
Maybe that’s unrealistic? I mean, if the shell was like that, it probably wouldn’t have exec semantics and would be more like this with direct function calls: $ ls(ls::LONG, ‘A B C’)
Maybe we would drop the parentheses though — they can be reasonably implied given the first token is an unspaced identifier: $ ls ls::LONG, ‘A B C’
And really, given that unquoted identifiers don’t have spaces, we don’t really need the commas either. Could also use ‘-‘ instead of ‘ls::’ to indicate that an identifier is to be interpreted locally in the specific context of the function we are calling, rather than as a generic argument. $ ls -LONG ‘A B C’
If arguments didn’t have spaces, you could make the quotes optional too.QED
It's like there is a shortcut that some people use that you want to wall in because it doesn't look pretty to you.
In most other cases I’ve never really had a problem with “this is a place where spaces are ok” (e.g. notes, documents, photos) and “this is a place where they are not ok” — usually in parts of my filesystem where I’m developing code.
It’s fine to make simplifying assumptions if it’s your own code. Command history aside, most one liners we type at the shell are literally throwaways.
I think I was clear that they aren’t the only category of program one writes and that, traditionally on Unix systems, the counterpart to sh was C.
The need to handle spaces and quotes can take you from a 20 character pipeline to a 10 line script, or a C program. That is not a good model whichever way you look at it.
If you control the inputs and you need to support quotes, spaces, non-white space delimiters, etc... in shell script, then that’s on you.
If you don’t control the inputs, then shell scripts are generally a poor match. For example, if you need summary reports from a client, but they sometimes provide the table in xlxs or csv format — shell might not be a good idea.
Might be controversial, but I think you can tell who works with shell pipes the most by looking at who uses CSV vs tab-delimited text files. Tabs can still be a pain if you have spaces in data. But if you mix shell scripts with CSV, you’re just asking for trouble.
You can define a field separator in the command line with the environment variable IFS - i.e. 'IFS=$(echo -en "\n\b");' for newlines - which takes care of the basic cases like spaces in filenames/directory names when doing a for loop, and if I have other highly structured data that is heavily quoted or has some other sort of structure to it, then I either normalize it in some fashion or, as you suggest, write a perl script.
I haven't found it too much of a burden, even when dealing with exceptionally large files.
Also Zsh solves 99% of my "pipelines are annoying" and "shell scripts are annoying" problems.
Even on systems where I can't set my default shell to Zsh, I either use it anyway inside Tmux, or I just use it for scripting. I suppose I use Zsh the way most people use Perl.
There is a world of stuff in between "I need relatively low-level memory management" and "I need a script to just glue some shit together".
For that we have Python and Perl and Ruby and Go, or even Rust.
(My point about C was in the historical context of Unix, which is relevant when talking about its design principles.)
Stable tools designed for oneself