It's just how bash works. If there's an entry in the session cache, it uses it. Since executable paths only get cached when you run a command successfully, this only happens when it gets moved from one directory in your PATH to another after you run it once, which isn't that common.
Setting PATH or calling hash -r will clear the session cache or one could run set +h which will disable it altogether.
It also happens when you have two executables in different directories and then you delete the one with the higher priority. Happens regularly for me after I uninstall a Linux Homebrew package.
Changing $PATH does wipe it at least.
Edit: wrong shell, zsh has rehash, bash does not.
I guess I’ve been using zsh longer than I thought, because I learned about rehash first, then made the switch to hash -r later. I started using zsh 14 years ago, and bash 20+ years ago, so my brain assumed “I learned about rehash first” must have been back when I was using bash. zsh is still “that new thing” in my head.
e.g.
hash java 2>/dev/null || printf "java command not available\n" command -v java >/dev/null || echo 'no java command in path'
...and of course, if you're going to run the command anyway, and you know an invocation that does nothing and always exits with success, you can do that too. I like doing running "--version" or equivalent in CI systems, because it has the side effect of printing what actual versions were in use during the run. java -version || { echo >&2 'no java command in path'; exit 1; }
git --version || { echo >&2 'no git command in path'; exit 1; }
gcc --version || { echo >&2 'no gcc command in path'; exit 1; }The kernel has no idea what the current process' environment $PATH is, and doesn't even parse any process environment variables at all.
https://github.com/torvalds/linux/blob/master/Documentation/...
System libraries like glibc are not part of the kernel, they are just components that can be replaced.
I wrote an article about it:
https://www.matheusmoreira.com/articles/linux-system-calls
I even asked Greg Kroah-Hartman about it:
https://old.reddit.com/r/linux/comments/fx5e4v/im_greg_kroah...
> So we rely on different libc projects to provide this, and work with them when needed.
> This ends up being more flexible as there are different needs from a libc, and for us to "pick one" wouldn't always be fair.
> And yes, you can just use a "nolibc" type implementation of you like.
> I know I do that for new syscalls when working on them, there's nothing stopping anyone else from doing that as well.
You can trash the entire GNU system and rewrite it all in Rust or Lisp if you wanted. It doesn't have to be some POSIX-like thing either, it could be whatever you wanted it to be. It doesn't need to have things like PATH. You could write a static freestanding application and boot Linux directly into it.
Nobody does stuff like this it's a lifetime of work. But it could be done.
It is basic knowledge that PATH is used by a command interpreter to locate the pathname of binaries. This is true for Window's cmd.exe as well. I never heard of a system where locating files for execution was performed by a kernel.
newfstatat(AT_FDCWD, ".", {st_mode=S_IFDIR|0700, st_size=4096, ...}, 0) = 0
newfstatat(AT_FDCWD, "/usr/local/sbin/cat", 0x7fffcec2f3b8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/bin/cat", 0x7fffcec2f3b8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/sbin/cat", 0x7fffcec2f3b8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/cat", {st_mode=S_IFREG|0755, st_size=68536, ...}, 0) = 0
Also true for MS/PC-DOS... which also holds the distinction of having some rare "truly monolithic" API-compatible variants that put the kernel, drivers, and shell in a single binary, so that may satisfy your criteria.
> If the file argument contains a slash character, the file argument shall be used as the pathname for this file. Otherwise, the path prefix for this file is obtained by a search of the directories passed as the environment variable PATH [...]
[1]: https://pubs.opengroup.org/onlinepubs/009695399/functions/ex...
There's a difference between something being a certain way because it has to be that way in order to implement the semantics of the system (e.g. interrupt handlers being a privilege transition) and something being a certain way as a result of an arbitrary implementation choice.
OSes differ on these implementation choices all the time. For example,
* in Linux, the kernel is responsible for accepting a list of execve(2) argument-words and passing them to the exec-ed process with word boundaries intact. On Windows, the kernel passes a single string instead and programs chop that string up into argument words in userspace, in libc
* in Linux, the kernel provides a 32-bit system call API for 32-bit programs running on 64-bit kernels; on Windows, the kernel provides only a 64-bit system call API and it's a userspace program that does long-mode switching and system call argument translation
* on Windows, window handles (HWNDs, via user32.dll) in IPC message passing (ALPC, in ntoskrnl) are implemented in the kernel, whereas the corresponding concepts on most Linux systems are pure user-space constructs
And that's not even getting into weirder OSes! Someone familiar with operating systems in general can nevertheless be surprised at how a particular OS chooses to implement this or that feature.
Right. You can't be sure that someone didn't stick $PATH expansion into glibc, or something. Because someone did.
QNX gets program loading entirely out of the kernel. When QNX is booted, initial programs and .so files in the boot image are loaded into memory. That's how things get started. Disk drivers, etc. come in that way, assuming the system has a disk.
Calling "exec.." or ".. spawn" merely links to a .so file that knows how to open and read an executable image. Program loading is done entirely by userspace code. Tiny microkernel. The "exec.." functions do not use the PATH variable.[1]
However, "posix_spawn" does read the PATH environment variable, in both QNX [2] and Linux.[3] Linux, for historical reasons, tends not to use "spawn" as much, but those are the defined semantics for it. QNX normally uses "spawn", because it lacks the legacy that encouraged fork/exec type process startup. "posix_spawn" is apparently faster in modern Linux, especially when the parent process is large, but there's a lot of fork/exec legacy code out there.
"posix_spawn" comes from FreeBSD in 2009, but I think the QNX implementation precedes that, because QNX's architecture favors "spawn" over "exec.." It may go back to UCLA Locus.
Windows has different program startup semantics. Someone from Windows land can address that. MacOS has a built in search path if you don't have a PATH variable.[5]
[1] https://www.qnx.com/developers/docs/8.0/com.qnx.doc.neutrino...
[2] https://www.qnx.com/developers/docs/8.0/com.qnx.doc.neutrino...
[3] https://www.man7.org/linux/man-pages/man3/posix_spawn.3.html
[4] https://www.whexy.com/posts/fork
[5] https://developer.apple.com/library/archive/documentation/Sy...
Yes it does, but the more surprising thing is (coming from AmigaOS with its dos.library function ReadArgs()) that the shell does this. The shell is also responsible for argument expansion - madness!
On AmigaOS, when you type "delete foo#? force", the shell passes the entire command line to the delete command. The delete command calls ReadArgs() with a template (FILE/M/A, ALL/S, QUlET/S, FORCE/S), and the standard OS function parses it into lists of files, flags, keyword arguments, etc. The "file" passed is "foo#?", and the command uses MatchFirst()/MatchNext() to do file pattern matching.
Every command (that uses ReadArgs() and didn't plump for "standard C" parsing) has the same behaviour: running the command with "?" gives you the template, which tells you how to use it. Args are parsed consistently across all programs.
Then you get "standard C", which because K&R and main(), ignores this standard Amiga parsing function and just does naive splits. Across multiple Amiga C compilers, quoting rules are inconsistent. Amiga C compilers have to produce an executable, and it knows it'll be called with a full command line, so the executable itself has to break that into words before it can call main(), and it's up to each compiler writer how they're going to do that. Urgh.
In unix-land, it's up to the shell to parse the command line, and pass only the words... hence why the shell naturally does all the filename globbing, and why you have gotchas like when these two commands are sometimes the same and sometimes they're not:
find . -name foo*
find . -name 'foo*'
Then we have Windows, which is like Amiga C programs - it's being passed a full command string and will have its C runtime parse it for main() to consume. There's a vague expectation that it'll do quoting "like COMMAND", which itself has very odd quoting rules. At least, most people are all using the same C compiler on Windows, so it's mostly only MSVCRT's implementation so it's mostly consistent.It's hard to appreciate how the world looks before you learn a fact. You can't unsee things.
It's not about knowledge, but about assumptions. The title and conclusion hint that there are some obvious assumptions, but these are not detailed. Maybe author assumed that because of the ubiquitous use of PATH across shells, it had to be managed centrally.
Now, there is a reason why kernel actually does not have such knowledge, but it's not at all unreasonable to assume that the kernel has it.
https://wiki.archlinux.org/title/Domain_name_resolution
As such, it's a thing one has to explicitly look up to know, which the author did.
I like this approach of shunting off functionality that's important, necessary, and omnipresent across all OSes to userspace, rather than giving into the temptation to put everything and the kitchen sink into the kernel. It seems to make a more versatile and future proof OS, that's easy to work with in spite of uncertainty.
Sections are very detailed metadata that all sorts of things use for all sorts of purposes. Compilers use them. Debuggers use them. Static and dynamic linkers use them. Anyone can use them for any purpose whatsoever. You can easily add your own custom sections to any executable using tools like objcopy. It's completely arbitrary, held together by convention.
Segments, on the other hand, don't even have names. They are just a list of file extents required for the program to actually execute and their address space locations. The program header table is essentially a sorted list of arguments for the mmap system call.
This is Linux kernel's ELF loader:
https://github.com/torvalds/linux/blob/master/fs/binfmt_elf....
It basically just mmaps in the PT_LOAD segments of the ELF file, copies stuff like arguments and environment and then starts a thread at the entry point specified in the ELF header.
It's just that when loading dynamic ELFs it jumps into the dynamic linker instead of the actual program. It's as though every single program had a #!/lib/ld.so shebang line. The absolute path is even hardcoded into the executable itself.
readelf -a $(which cat) | grep -i interpreter
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
When an "interpreter" is requested, Linux will load it alongside the actual program and will run it instead of the actual program. This "ELF interpreter" then does an absurd amount of work by recursively loading and linking libraries, linking the actual executable and only then jumping into its entry point.I'm not kidding about the "absurd amount of work" part. These linkers even have to topologically sort dependencies like a package manager so they can be initialized properly.
https://blogs.oracle.com/solaris/post/init-and-fini-processi...
I did get one thing out of this though. I had honestly wondered for the longest time why we need to call env to get the same functionality as PATH in a shebang.
Ironically, thanks to either an article I read here (or on the crustacean site) recently, I already knew that the shebang is something which is parsed by the kernel, but had not put two and two together at all.
Much like the author. So goes to show the benefits of exploring and thinking about seemingly "obvious" concepts.
config BINFMT_SCRIPT
tristate "Kernel support for scripts starting with #!"
default y
help
Say Y here if you want to execute interpreted scripts starting with
#! followed by the path to an interpreter.And I'm sure other kernels do other things too.
https://elixir.bootlin.com/linux/v6.14.4/source/fs/proc/base...
This is needed because the exece()/execve() [2] kernel system call is unaware of things like environment variables so it will not have any idea how or where to execute a program 'cat' unless it is given the full path to 'cat', so the shell has to look it up (again if the user doesn't pass the full path). It's the same on every POSIX system and the original UNIXes. It's been this way for at least 50 years. (edit 60 years, it's from Multics [1])
Kids today really need to learn the fundamentals of computer operating systems. Or do that boring old-person thing we did before StackOverflow, and read all the manual pages, which tell you all this [3] [4].
[1] https://en.wikipedia.org/wiki/PATH_(variable) [2] https://man7.org/linux/man-pages/man2/execve.2.html [3] https://www.man7.org/linux/man-pages/man1/dash.1.html [4] https://www.man7.org/linux/man-pages/man1/intro.1.html https://www.man7.org/linux/man-pages/man2/intro.2.html https://www.man7.org/linux/man-pages/man7/man-pages.7.html https://www.man7.org/linux/man-pages/man7/standards.7.html
While the PATH variable fundamentally is same as other env variables like HOME / USER
but how PATH is interpreted will change from context to context ?
In the unix systems of the past was it easier to hold a more complete understanding of the system and its components in your head?
I mean, no shit, Sherlock? the exec family of system calls requires a path to a file, not a filename with an implicit path from the environment, of course the PATH is handled by the shell.
Now, the semantics of this parameter is that kernel does not use it for path resolution when searching for the executable — but it could.