It seems like the only reliable way to fix this is to change these functions so that they exclusively acquire a mutex.
And remember that the exec* family of calls has a version with an envp argument, which is what should be used if a child process is to be started with a different environment — build a completely new structure, don't touch the existing one. Same for posix_spawn.
And, lastly, compatibility with ancient systems strikes again: the environment is also accessible through this:
extern char **environ;
Which is, of course, best described as bullshit.Note that Java, and the JVM, doesn't allow changing environment variables. It was the right choice, even if painful at times.
Environmental variables are not a replacement for your config. It’s not a place to store your variables.
Even if the env var API is fully concurrent, it is not convention to write code that expects an env var to change. There isn’t even a mechanism for it. You’d have to write something to poll for changes and that should feel wrong.
This holds for a lot of programs, but what if you're writing a shell?
"The getenv function returns a pointer to a string associated with the matched list member. The string pointed to shall not be modified by the program, but can be overwritten by a subsequent call to the getenv function."
In a single threaded virtual machine, you can immediately duplicate the string returned by getenv and stop using it, right there.
Under threads, getenv is not required to be safe.
I think that with some care, it may be; an environment implementation could guarantee that a non-mutating operation like getenv doesn't invalidate any previously returned strings.
I think POSIX does that. It allows getenv to reallocate the environ array, but not the strings themselves:
"Applications can change the entire environment in a single operation by assigning the environ variable to point to an array of character pointers to the new environment strings. After assigning a new value to environ, applications should not rely on the new environment strings remaining part of the environment, as a call to getenv(), secure_getenv(), [XSI] [Option Start] putenv(), [Option End] setenv(), unsetenv(), or any function that is dependent on an environment variable may, on noticing that environ has changed, copy the environment strings to a new array and assign environ to point to it."
environ is documented together with the exec family of functions; that's where this is found.
So whereas there are things not to like about environ, it can be the basis for thread safety of getenv in an application that doesn't mutate the environment.
Mutating argv is actually quite popular, or at least it used to be.
So setenv's existence makes getenv inherently unsafe unless you can ensure the entire application is at a safe point to use them.
Getenv() could keep several copies of the value around: one internal copy protected by a mutex, that it never returns, and one copy per thread that it stores in thread local storage. When you call getenv(), it locks the mutex, checks if the current thread's value exists, populates it from the internal copy if not, and returns it. It will also install a new setenv-specific signal handler on this thread and store info about this thread having a copy.
Setenv() will then take the same mutex as getenv(), check if the internal copy is different from the new value; if it is, it will modify the internal copy, modify the local thread's copy if that has one, and then signal each other thread in the process that has a copy in TLS. The setenv signal handler will modify the local copy that thread holds.
It's gonna be slow for a large multi-threaded program, but since setenv() used to corrupt memory for such programs, they probably don't care. And for single-threaded programs, or even for programs that don't access getenv()/setenv() on multiple threads, there should be no extra overhead other than the mutex and the bookkeeping.
The only issues that would remain are programs which send the pointer they get from getenv() to other threads without ensuring locking access, and programs which rely on modifying the pointer from getenv() directly as a way to set an env var, and expect this to be visible across threads. Those are just hopelessly broken and can't use the same API - but aren't more broken then they are today.
Of course, in addition to this complex work to make the old API (mostly) thread safe, it should also offer a new API that simply returns a copy every time, doesn't promise to show modifications to your copy when setenv() gets called (you need to call getenv() again), and puts the onus on you to free that copy explicitly.
stdenvlock(); // imaginary function added to ISO C or POSIX
char *home = getenv("HOME");
char *home_copy = strdup(home);
stdenvunlock(); // only here can we unlock
// home pointer is now indeterminate
Other solutions:1. Put the above sequence into a function, and don't expose the mutex. Thread-safe code must use:
char *home = dupenv("HOME"); // imaginary function; caller responsible for freeing.
2. Provide environment lookup into a buffer: getenvbuf("HOME", mybuf, sizeof mybuf); // returns some value that helps to resize the buffer
All functions that retain pointers out of the classic getenv remain unsafe.A mutex can be provided to those applications that want to manipulate the environ array directly, or use getenv and setenv, or any combinations of these.
The main problem is all the code out there using getenv.
If your program wants to use the environment as an out-of-band global var for cross thread communication, you can make your own mutex.
A mutex can ensure thread safety but risks deadlocks if not used carefully and will hurt performance...
The Rust stdlib is already using synchronization on the versions of these functions that are exposed from the Rust stdlib. That's why those functions were allowed to be marked as safe in the first place.
The problem is that people are calling C code from Rust (which already requires an unsafe annotation), and then that C code is doing silly thread-unsafe shenanigans for regrettable historical reasons.
It's beyond Rust's power to fix without cooperation from the underlying C code, which happens to be provided by the OS, which is just being compliant with Posix. Rust can only do so much when the platform itself is hell-bent on sabotaging you.
Nice to see that the author of the library has a sensible take. Unfortunately the ecosystem does not: https://github.com/seanmonstar/reqwest/blob/master/Cargo.tom...
This also shows up in web frameworks where Vue has the v-html directive and react has dangerouslySetInnerHTML. Vue definitely has it better.
https://doc.rust-lang.org/stable/std/env/fn.set_var.html
There is a patch for glibc which makes `getenv` safe in more cases where the environment is modified but C still allows direct access to the environ so it can't be completely safe in the face of modification https://github.com/bminor/glibc/commit/7a61e7f557a97ab597d6f...
keep[s] older versions around and adopt[s] an exponential resizing policy. This results in an amortized constant space leak per active environment variable, but there already is such a leak for the variable itself (and that is even length-dependent, and includes no-longer used values).
There have got to be pathalogical uses out there where this will cause unbounded memory growth in well-formed (according to the API) programs, no?
Interesting to see this _introduce_ a ‘bug’ (unbounded memory growth) for these programs that follow the API in order to ‘fix’ programs that don’t (by using the API in multiple threads). Pragmatism over dogma I guess. Leaves me feeling a bit sketched out though.
And the library's use of setenv is clearly a bug as setenv is documented to be not threadsafe in the C standard library. So that would take care of that problem.
The only way to coordinate locking would be to do so in libc itself.
The closest libc can get to MT safety is to never deallocate an environment string or an environ array. Solaris does this--if you continually add new variables with setenv it just leaks environ array memory, or if you continually overwrite a key it just leaks the old value. (IIRC, glibc is halfway there.) But even then it still requires the application to abstain from doing crazy stuff, like modifying the strings you get back from getenv. NetBSD tried adding safer interfaces, like getenv_r, but it's ultimately insufficient to meaningfully address the problem.
The right answer for safe, portable programs is to not mutate the environment once you go multi-threaded, or even better just treat process environment as immutable once you enter your main loop or otherwise finish with initial process setup. glibc could (and maybe should) fully adopt the Solaris solution (currently, IIRC, glibc leaks env strings but not environ arrays), but if applications are using the environment variable table as a global, shared, mutable key-value store, then leaking memory probably isn't what they want, either. Either way, the best solution is to stop treating it as mutable.
synchronization methods impose various complexity and performance penalties, and single threaded applications which don't need that would pay those penalties and get no benefit.
Unix was designed around a lightweight ethos that allowed simple combining of functions by the user on the command line. See "worse is better", but tl;dr that way of doing things proved better, and that's why you find yourself confronting what it doesn't do.
I'm also not convinced by Musl's maintainer that it can't be fixed within Musl considering glibc is making changes to make this a non-issue.
extern char **environ;
As long as environ is publicly accessible, there's no guarantee that setenv and getenv will be used at all, since they're not necessary.If you're willing to get rid of environ, it's pretty trivial to make setenv and getenv thread safe. If not, then it's impossible, although one could still argue that making setenv and getenv thread safe is at least an improvement, even if it's not a complete solution (aka don't let the perfect be the enemy of the good).
Exactly my point. Over time *environ would disappear, at least from the major software projects that everyone uses (assuming it's even in use in them in the first place).
[1] https://github.com/bminor/glibc/commit/7a61e7f557a97ab597d6f...
How do you figure? The problem isn't the implementation, it's the API. setenv(), unsetenv(), putenv(), and especially environ, are inherently unsafe in a multithreaded program. Even getenv_r() can't really save you, since another thread may be calling setenv() while the (old) value of an env var is being copied into the provided buffer. Sure, a getenv_r() fixes the case where you get something back from getenv(), and then another thread calls setenv() and makes that memory invalid, but there's no way to protect the other calls breaking the API.
There are ways to mitigate some of the issues, like having libc hold a mutex when inside getenv()/setenv()/putenv()/unsetenv(), but there's still no way for libc to guarantee that something returned by getenv() remains valid long enough for the calling code to use it (which, right, can be fixed by getenv_r(), which could also be protected by that mutex). But there's no good way to make direct access to environ safe. I suppose you could make environ a thread-local, but then different threads' views of the environment could become out of sync, permanently (and you could get different results between calling getenv_r() and examining environ directly).
Back-compat here is just really hard to do. Even adding a mutex to protect those functions could change the semantics enough to break existing programs. (Arguably they're already broken in that case, but still...)
> How do you figure?
From https://illumos.org/man/3C/putenv:
> The putenv() function can be safely called from multithreaded programs
Won't that depends on the libc implementation. For example, maybe setenv writes to another buffer, then swaps pointers atomically; wouldn't that work?
Most of the rest of the problem here seems to be the development environment. They're testing on a remote machine in an Amazon data center and using Docker. This rig fails to report that a process has crashed. Then they don't have enough debug symbol info inside their container to get a backtrace. If they'd gotten a clean backtrace reported on the first failure, this would have been obvious.
Why is anyone using "setenv" anyway?
There's no reason setenv should have been called here. The `openssl-probe` library could simply return the paths to the system cert files and callers could plug those directly into the OpenSSL config.
Oversights all around and hopefully this continues to improve.
It's worth noting here that you can also build your binaries and keep debug symbols separately.
You don't need to ship them with the binary (although it will make many scenarios a bit simpler if you do, since you'll always have the right ones available).
Some info that might help: https://www.tweag.io/blog/2023-11-23-debug-fission/ https://undo.io/resources/gdb-watchpoint/reduce-binary-size-...
You might want to look into debuginfod.
Because it’s there and it looks like a good idea until it takes one of your fingers.
The thing is, the OP people weren't doing that at all, it was some irresponsible library maintainers. If your code does that, you have to include something like the "surgeon general's warning" everywhere: "CAREFUL: USING THIS LIBRARY MAY CAUSE TERMINAL CRASHES".
I always thought this was kinda foolish: your configuration method is a flat-namespace basked of stringly-typed values. The perils of getenv()/setenv()/environ are also, I think, a great argument against using env vars for configuration.
Sure, there aren't always great, well-supported options out there. I prefer using a configuration file (you can have templated config and a system that fills in different values for e.g. dev/stage/prod), and I'll usually use YAML, despite its faults and gotchas. There are probably better configuration file formats, but IMO YAML is still significantly better than using env vars.
If there were a language feature that let me mark apps such that during any process env vars are not writable and are readable only once (together, in a batch, not once per var), I'd use it everywhere.
But yes, a flat namespace, with string values, shared as a free-for-all with who knows what libraries and modules you're loading… that's not a good idea even if it didn't have safety issues in setenv().
Being on the JVM which actually treats the environment as immutable that and which probably inspired a lot of the 12 factor app movement (with companies like Soundcloud being big Scala and Java users and pushing this), I've never experienced any issues with the environment changing on me or causing any threading issues. The environment is effectively immutable and there's nothing in my processes that sneakily circumvents that (via some native calls into libc). So, complete non issue on the JVM.
Even if somebody manages to modify the environment, the immutable copy stays the same. That copy gets created on JVM startup and is immutable. Anything using normal Java apis to interact with the environment will never see the modification. I'm sure people might have tried to work around that but it's not a wide spread practice. Because, again, why would you even want to do that?
The problem with configuration files is that their parsing is process specific. That's why Linux/Unix is such a mess. Every single tool seems to have its own conventions and mechanisms for configuration. There are no standards for this.
Other of course than the Docker ecosystem. You can do whatever you want inside the container but effectively your only interface to the outside world is either messily mounting some volume and doing whatever convoluted way of configuration your app requires; or just using environment variables. Most modern software is docker ready/friendly in the sense that you can fully control their behavior via the environment. It's perfectly adequate for most things that people run via docker these days. Which of course is pretty much anything.
And of course with Docker compose or kubernetes (which I'm not necessarily a fan of) you get yaml files defining lists of environment variables that define how your process starts. So you more or less get what you are asking for. I'm not a big YAML fan but it works well enough. Too much potential for syntax issues really ruining your day IMHO. But it's not like the alternatives are free of issues.
I personally use 12 factor app style, but once it's entered the app I validate the env variables and data and then store them. It's totally fine after that.
These kinds of detailed troubleshooting reports are the closest thing you can get to having to do it yourself. Thanks to the authors. It's easy to say "don't use X duh" until a dependency relies on it, and how were you supposed to know?
> We don’t have the necessary files outside of the container, and our containers are quite minimal and don’t allow us to easily install gdb.
Have people lost the ability to build and debug their code locally, without clouds and containers?
The reason is that cloud is where all the money is because cloud is DRM. Put software there and you can charge a subscription and nobody can evade it and you have perfect lock in forever. People usually can’t even get their data out. You can also do all kinds of realtime analytics conveniently to optimize your product.
Computing architecture is downstream of the business model. Mainframe died originally because there was no Internet and PCs were cheaper, but vendors also lost a lot of their lock in power. Now they have a way to bring a model that is much more profitable back. No more pesky freedom for users, who to be fair if given such freedom will often just refuse to pay, making quality software a non-viable business.
Tangent I know.
there are faults to the cloud but it solves real problems users have.
This is the iCloud model and it works. Imagine a more open version with competing storage providers.
This, however, would hand control back to the user, which would be bad for the software industry with its addiction to lock in and recurring revenue.
they should have handled crashes better - a problem they seem to recognize but not the issue here so not covered.
No, of course not, but it didn't crash on our machines!
I hate envvars. It’s “the Linux way”. I avoid them like the plague. A++ strong recommend.
libc is terrible. The world needs to move on.
Ideally all libraries which use environment variables should have APIs allowing you to override the env variables without calling setenv(), but that isn't always the case.
dlopen()
which passes on your environment. If you want to load libpam-keberos and pass DEBUG=verbose
you will need to setenv() your own environment.Throw away your CPU and RAM then.
And even if those abstractions can't be 100% effective, we'd go a long way to achieving the desirable results of getting rid of it, if we just develop the mindset of avoiding it if at all possible, excepting for very rare instances where it's needed as a last resort.
Go ahead and write lots of mutable global statics. But when your program crashes randomly and you need my help to debug and it is, once again, a global mutable then you have to perform a walk of shame.
the problem is not linux, not mutable global state or resources and not libc.
the problem is not getting time at work to do things properly. like spotting this in GDB before the issue hit, because your boss gave you time to tirelessly debug and reverse your code and anything it touches....
there is too much money in halfbaked code. sad but true.
If both Rust and C have independent standard libraries loaded into the same process, each would have an independent set of environment variables. So setting a variable from Rust wouldn't make it visible to the C code, which would break the article's usecase of configuring OpenSSL.
The only real solution is to have the operating system provide a thread-safe way of managing environment variables. Windows does so; but in Linux that's the job of libc, which refuses to provide thread-safety.
*BSD allows it too, or used as of 2022.
What is unusual about Linux is that it guarantees a syscall ABI, meaning that if you follow it, you can make a system call "portably" across "any" version of Linux.
Is that true? It's just a process global string -> string map, that can be pre-loaded with values before the process starts, with a copy of the current state being passed to any sub-process. This could be trivially implemented with batch processing/supervisory programs.
Yes to read them, if Rust wish to modify - modify your own, already copied structure. I'd do that in pretty much any language.
https://github.com/sunfishcode/eyra
Oh look:
> Why use Eyra? It fixes Rust's set_var unsoundness issue. The environment-variable implementation leaks memory internally (it is optional, but enabled by default), so setenv etc. are thread-safe.
Well, that's a lot of caveats. As I said, it would take years to complete. And it looks like it's well on its way but not near complete.
if (__environ == NULL || name[0] == '\0')
return NULL; import numpy
setproctitle() worked before numpy import but not after because it couldn't find the memory address of **environ.I'm hazy on the details but it led me to a somethingenv call (possibly getenv or setenv) in numpy initialization and it turned out that function changed the address of **environ and that was the reason for why setproctitle couldn't find it.
And:
> This function is safe to call in a single-threaded program.
> This function is also always safe to call on Windows, in single-threaded and multi-threaded programs.
> In multi-threaded programs on other operating systems, the only safe option is to not use set_var or remove_var at all.
It doesn't seem it would take much to do it efficiently, even retaining the poor getenv() pointer-returning API (which could point to a thread local buffer). The coordination between getenv and setenv could be very lightweight - spinlock vs mutex.
There's also no real backwards compatible way of fixing setenv(). getenv() returns a pointer that can be read at any time, and then there's the *environment parameter that can also be used to read env variables.
IMO the entire API should be deprecated for a thread safe one, but until someone comes with a standard setenv() alternative that's implemented by the libc runtimes, we'll be stuck with the shitty POSIX API, and every year we will read blog posts about get/setenv() crashing processes.
The setenv( ) function need not be thread-safe. A function that is not required to be thread-safe is not required to be reentrant.
https://www.open-std.org/jtc1/sc22/open/n4217.pdf.Page.. 1860 :')
Is "the standard says it doesn't NEED to be thread safe" the argument that the Linux libc maintainers are using for not enhancing it to be thread safe, or is it based on some technical or backwards compatibility issues in doing so ?
Leaking would be good enough for many use cases, but it would break long-running users of setenv (mainly those with libraries abusing env vars, as in TFA), and doesn't even solve how they interact with putenv and environ. This whole API is just cursed.
Libc could of course get better APIs, like GetEnvironmentVariable on Windows, but that won't fix all existing code.
A similar bug related to setlocale was found in 2007 and fixed in 2014. That bug did not take getenv/setenv into account. https://sourceware.org/bugzilla/show_bug.cgi?id=5443
> POSIX.1 does not require setenv() or unsetenv() to be reentrant.
A non-reentrant function cannot be thread safe.
In general (for POSIX, libc and many other libraries: if the docs do not explicitly say "this function is thread safe" they are not).
"It is ridiculous that this has been a known problem for so long. It has wasted thousands of hours of people's time, either debugging the problems, or debating what to do about it. We know how to fix the problem." https://www.evanjones.ca/setenv-is-not-thread-safe.html
30 years after these decisions were made, most sensible people do single threaded GUIs anyway (that is, all calls to the windowing API come from a single thread, and all redraws occur synchronously with respect to that thread; this does not block the use of threads functioning as workers on behalf of the GUI, but they are not allowed to make windowing API calls themselves).
Consequently, the overhead present in the win32 API is basically just dead-weight, there to make sure that "things are safe by default".
There's a design lesson here for everyone, though precisely what it is will likely still be argued about.
You could wrap setenv in a mutex, but that's not good enough. It can still be called from different processes, which means you'd need to do a more expensive and complex syncing system to make it safe.
That ballons out to other env related methods needing to honor the synchronization primitive in order for there to be a semblance of safety.
However, you still end up in a scenario where you can call
setenv
getenv
and that would be incorrect because between the set and the get, even with mutexes properly in place and coordinated amongst different applications, you have a race condition where your set can be overwritten by another application's set before your get can run. Now, instead of actually making these functions safe you've buried the fact that external processes (or your own threads) can mess with env state.The solution is to stop using env as some sort of global variable and instead treat it as a constant when the application starts. Using setenv should be mostly discouraged because of these issues.
Its a different story for languages/environments that are supposed to be safe by default and where you have language features that ensure safety (actors, optionals etc) but not for something like libc which has a standard it has to conform to and like 100 years of history.
Actually, a non-reentrant function can be thread-safe. A common example of such a function in libc being malloc().
So a non-reentrant function is a function that may not be invoked again between a previous invocation and returning from that invocation.
When a function may be invoked from different threads, then it is certain that sometimes it will be invoked by a thread before returning from a previous invocation from a different thread.
Therefore any function that may be invoked from different threads must be reentrant. Otherwise the behavior of the program is unpredictable. Reentrant functions may be required even in single-thread programs, when they may be invoked recursively, or they may be invoked by signal handlers.
An implementation of "malloc" may be reentrant or it may be non-reentrant.
Old "malloc" implementations were usually non-reentrant because they used global variables for managing the heap. Such "malloc" functions could not be used in multi-threaded programs.
Modern "malloc" implementations are reentrant, either by using only thread-local storage or by using shared global variables to which some method for concurrent access is implemented, e.g. with mutual exclusion.
You run into a problem if you keep using a string returned by getenv after calling another environment function: including possibly getenv itself!
However, it's easy to just strdup the result of getenv; that defends against the issue in a single-threaded program.
1. If a process crashes and dumps, be sure to look at the system log of the cause (e.g. SIGSEGV, OOM, invalid instruction, etc.)
2. Be certain you’re looking at the right core dumps — I believe UID 1000 just means posix UserID (which is unrelated to a PID), though I don’t use containers.
3. Stay focused on the right level of abstraction — memory model details are great to know, but irrelevant here.
4. Variables do not correlate 1:1 with registers, except in C calling conventions. The assumption about x20 and a local variable is incorrect, unfortunately.
5. getenv() and setenv() do not work as implied in the post. When a process starts via execve(), the OS/libc constructs a new snapshot of the environment, and cannot be modified by an ancestral process. It’s a snapshot in time, unless updated by the process itself. When a process fork()s, the child gets a new copy of the parent’s environment — updates do not propagate.
getenv() is thread safe and reentrant. You don’t use an environment to pass shared data — setenv() is generally used when constructing the environment for a child process before a fork(). See man environment.
6. FWIW, ‘char** env’ is a null-terminated array of pointers, so dumping memory from *env (or env[0]) is only valid until you hit the first NULL. The size of the array is not stored in the array.
I hope this helps! And apologies if this is redundant — I read so many comments; mostly variations of “the problem with getenv is x”, but gave up before reading all of the (currently) 168 comments.
Re: fork(), I just meant to be thorough in explaining the environment is copied, not shared by processes. Setenv() only affects the process from which it’s called.
The array size bit in the article: The value 0x220 looks suspiciously close to the size of the old environment in 64-bit words (0x220 / 8 = 68), and this value was written over the terminating NULL of the environment block…
HTH!
It does not help, because you do not appear to have understood the article (or even read it all that closely).
Some of these bullet points feel a lot like the kind of junk output one sees from the various (popular, but flawed) AI summary tools...
I could be wrong though —- could you be specific? I don’t want to misinform anyone…
3. There's nothing wrong with the level of abstraction here. If you have a crash that occurs on ARM but not on amd64, the differences in how those architectures operate is a very reasonable initial assumption.
4. The value in x20 is the same value in the local variable in question. Even though there may not be a general one-to-one mapping between variables and registers, at this particular instant in time that variable does correspond to this register.
5 is irrelevant, as the article isn't discussing forking. It's discussing the (somewhat questionable) practice of a program using getenv/setenv as mutable state.
For 6, the article doesn't say that env stores its own array length. It says that setenv called something like free() on the old env array, and free() overwrote env with the length of the memory allocation (which is a quite reasonable way for malloc to do book keeping).
I should see if the env_logger crate has a better solution.
But really, I don't understand why a sensitive security-related library would implicitly use an unsafe function like setenv().
This is a oversimplification. Windows has essentially the exact same API and it works just fine in multithreaded contexts.
The issue here is unix allows the underlying pointer to be accessed, bypassing any possible thread-safe APIs.
Ref: https://man7.org/linux/man-pages/man3/setenv.3.html
... it clearly says: "MT-Unsafe"
Also, there is a whole section about get/set env thread safety here (under "Other safety remarks -> env"):
Because I use structured concurrency, I can make it so every thread has its own environment stack. To add to a new environment, I duplicate it, add the new variable, and push the new enviroment on the stack.
Then I can use code blocks to delimit where that stack should be popped. [1]
This is all perfectly safe, no `unsafe` required, and can even extend to other things like the current working directory. [2]
IMO, Rust got this wrong 10 years ago when Leakpocalypse broke. [3]
[1]: https://git.yzena.com/Yzena/Yc/src/branch/master/tests/yao/e...
[2]: https://gavinhoward.com/2024/09/rewriting-rust-a-response/#g...
[3]: https://gavinhoward.com/2024/05/what-rust-got-wrong-on-forma...
If you have 1) C FFI interop in Yao, there's still a chance you might have two C libraries cause a crash without your code even being involved.