undefined | Better HN

0 pointsdiroussel1y ago0 comments

Indeed, environment variables should be used to configure child processes, not to configure the current process, for non-shell programs, IMHO.

Note that Java, and the JVM, doesn't allow changing environment variables. It was the right choice, even if painful at times.

0 comments

hinkley1y ago

I think there's a narrow window, at least in some programming languages, when environment variables can be set at the start of a process. But since it's global shared state, it needs to be write (0,1) and read many. No libraries should set them. No frameworks should set them, only application authors and it should be dead obvious to the entire team what the last responsible moment is to write an environment variable.

I am fairly certain that somewhere inside the polyhedron that satisfies those constraints, is a large subset that could be statically analyzed and proven sound. But I'm less certain if Rust could express it cleanly.

MaulingMonkey1y ago

Your process can be started in a paused state by a debugger, have new libraries and threads injected into it, and then resumed before a single instruction of your own binary has been executed... and debuggers are far from the only thing that will inject code into your processes. If you're willing to handwave that, pre-main constructors, etc. away, you can write something like this easily enough:

    struct BeforeEnvFreeze(());
    struct AfterEnvFreeze(());

    impl BeforeEnvFreeze {
        pub fn new() -> Self { /* singleton check using a static AtomicBool or something */ Self(()) }
        pub fn freeze(self) -> AfterEnvFreeze { AfterEnvFreeze(()) }
        pub fn set_env(&self, ...) { ... }
    }

    impl AfterEnvFreeze {
        pub fn spawn_thread(&self, ...) { ... }
    }

    fn main() {
        let a = BeforeEnvFreeze::new();
        a.set_env(...);
        a.set_env(...);
        //b.spawn_thread(...); // not available

        let b = a.freeze(); // consumes `a`

        b.spawn_thread(...);
        //a.set_env(...); // not available
    }

Exercises left to the reader:

• Banning access to the relevant bits of Rust's stdlib, libc, etc. as a means of escaping this "safe" abstraction

• Conning your lead developer into accepting your handwave

• Setting up the appropriate VCS alerts so you have a chance to NAK "helpful" "utility" pull requests that undermine your "protections"

And of course, this all remains a hackaround for POSIX design flaws - your engineering time might be better spent ensuring or enforcing your libc is "fixed" via intentional memory leaks per e.g. https://github.com/bminor/glibc/commit/7a61e7f557a97ab597d6f... , which may ≈fix more than your Rust programs.

plagiarist1y ago

I agree that libraries certainly should not. But why would writing be the right choice ever, even for applications? Doesn't it make far more sense to use env to create in some better-typed global configuration object, filling any gaps with defaults, then use that?

I'd go further and say env should always be read-only and libraries should never even read env vars.

danudey1y ago

> I think there's a narrow window, at least in some programming languages, when environment variables can be set at the start of a process.

I mean, based on this issue I would say the only safe time is "at the start of the program, before any new threads may have been created".

But again, as others have said, there's no good reason I'm aware of to set environment variables in your own process, and when you spawn a new process you can give it its own environment with any changes you want.

dietr1ch1y ago

Which programming languages?

When using C++ I wanted programs to have a function that was called before main() and set up things that got sealed afterwards, like parsing command-line-arguments, the environment variables, loading runtime libraries, and maybe look at the local directory, but I'm not sure if it'll be a useful and meaningful distinction unless you restructure way too many things.

I remember that on the Fuchsia kernel programs needed to drop capabilities at some point, but the shift needed might be a hard sell given things already "work fine".

saagarjha1y ago

Everyone thinks they are can be the first to do something, and that there is surely nothing that will happen before them. Unfortunately everyone save for one is mistaken. Sometimes that chosen one is not even consistent.

2 more replies

friendzis1y ago

`main` is the default entrypoint, with one simple argument to the linker you can change entrypoint symbol to whatever you wish.

You can add `premain` function that calls `main` and set it as an entrypoint, you can implement pre-start logic in main and call main loop later.

This is how any sane program is written anyway: set up environment -> continue with business logic

2 more replies

fch421y ago

You needn't go "hacky" for this; constructors for global/static variables are called before main(). But then, the underlaying linker support is usually "trivially exposed" (using the constructor attribute in gcc/clang, say).

This (obviously?) isn't "110%" perfect as the order of the constructor calls for several such objects may not be well-defined, and were they to create threads (who am I to suggest being reasonable ...) you end up with chicken-egg situations again.

1 more reply

bluGill1y ago

What is wrong with main setting those things first and then starting your main program? That is what everyone else does.

1 more reply

robertlagrant1y ago

> When using C++ I wanted programs to have a function that was called before main() and set up things that got sealed afterwards, like parsing command-line-arguments, the environment variables, loading runtime libraries, and maybe look at the local directory, but I'm not sure if it'll be a useful and meaningful distinction unless you restructure way too many things

If you're only reading environment variables you have no problem, though. It's only if you try to change them that it causes issues.

For setting, "only set environment variables in the Bash script that starts your program" might be a good rule.

GoblinSlayer1y ago

grpc reads some configuration from environment; environment has portability problems too, so it's useful to set it to cross platform shape.

2 more replies

hinkley1y ago

Poor choice of phrasing.

I ended up implying some extra support when all I meant was “one could”.

xxs1y ago

>Note that Java, and the JVM, doesn't allow changing environment variables. It was the right choice, even if painful at times.

Not sure why would it be considered painful. Imo, use of setenv to modify your own variable, the definition of setenv is thread unsafe. So unless running a single threaded application it'd never make sense to call it.

Java does support running child processes with a designated env space (ProcessBuilder.environment is a modifiable map, copied from the current process), so inability to modify its own doesn't matter.

Personally I have never needed to change env variables. I consider them the same as the command line parameters.

Calzifer1y ago

Java doesn't even allow to change the working directory also due to potential multi-threading problems.

Another reason why Java isn't the greatest language to create CLI tools with.

skissane1y ago

> Java doesn't even allow to change the working directory also due to potential multi-threading problems.

Linux and macOS both support per-thread working directory, although sadly through incompatible APIs.

Also, AFAIK, the Linux API can't restore the link between the process CWD and thread CWD once broken – you can change your thread's CWD back to the process CWD, but that thread won't pick up any future changes to the process CWD. By contrast, macOS has an API call to restore that link.

throwaway20371y ago

It is interesting that they do not allow ability to change env and working dir via security policy or a command line arg (--allow-setenv, etc.).

xxs1y ago

That would be so much wasted engineering effort. The actual solution is simple: read what you need from env, and pass it as parameters to the functions you want to. The values of what you have read can be changed... and if you really, really want start a child process with a modified env.

jamesfinlayson1y ago

Sure is painful (mostly when writing tests where the environment variables aren't abstracted in some way).

But I think it was actually possible to hack around up until Java 17.

xxs1y ago

if you really wish - you can change the bootstrap path and allow changing env() for whatever reason you want to (likely via copy on write). If you don't wish to do that feel free to spawn a child process with whatever env you desire, then redirect/join sys in/our/err (0/1/2)

Those are trivial things in around 100 lines of code and have been available since System.getenv() got back (it used to be deprecated and non-functional prior Java 1.5 or 2004)

jamesfinlayson1y ago

A lot of the Java I'm writing is in AWS Lambda so my options are a bit more limited.

j / k navigate · click thread line to collapse

0 comments

hinkley1y ago

MaulingMonkey1y ago

    struct BeforeEnvFreeze(());
    struct AfterEnvFreeze(());

    impl BeforeEnvFreeze {
        pub fn new() -> Self { /* singleton check using a static AtomicBool or something */ Self(()) }
        pub fn freeze(self) -> AfterEnvFreeze { AfterEnvFreeze(()) }
        pub fn set_env(&self, ...) { ... }
    }

    impl AfterEnvFreeze {
        pub fn spawn_thread(&self, ...) { ... }
    }

    fn main() {
        let a = BeforeEnvFreeze::new();
        a.set_env(...);
        a.set_env(...);
        //b.spawn_thread(...); // not available

        let b = a.freeze(); // consumes `a`

        b.spawn_thread(...);
        //a.set_env(...); // not available
    }

Exercises left to the reader:

• Banning access to the relevant bits of Rust's stdlib, libc, etc. as a means of escaping this "safe" abstraction

• Conning your lead developer into accepting your handwave

• Setting up the appropriate VCS alerts so you have a chance to NAK "helpful" "utility" pull requests that undermine your "protections"

plagiarist1y ago

I'd go further and say env should always be read-only and libraries should never even read env vars.

danudey1y ago

> I think there's a narrow window, at least in some programming languages, when environment variables can be set at the start of a process.

I mean, based on this issue I would say the only safe time is "at the start of the program, before any new threads may have been created".

dietr1ch1y ago

Which programming languages?

I remember that on the Fuchsia kernel programs needed to drop capabilities at some point, but the shift needed might be a hard sell given things already "work fine".

saagarjha1y ago

2 more replies

friendzis1y ago

`main` is the default entrypoint, with one simple argument to the linker you can change entrypoint symbol to whatever you wish.

You can add `premain` function that calls `main` and set it as an entrypoint, you can implement pre-start logic in main and call main loop later.

This is how any sane program is written anyway: set up environment -> continue with business logic

2 more replies

fch421y ago

1 more reply

bluGill1y ago

What is wrong with main setting those things first and then starting your main program? That is what everyone else does.

1 more reply

robertlagrant1y ago

If you're only reading environment variables you have no problem, though. It's only if you try to change them that it causes issues.

For setting, "only set environment variables in the Bash script that starts your program" might be a good rule.

GoblinSlayer1y ago

grpc reads some configuration from environment; environment has portability problems too, so it's useful to set it to cross platform shape.

2 more replies

hinkley1y ago

Poor choice of phrasing.

I ended up implying some extra support when all I meant was “one could”.

xxs1y ago

>Note that Java, and the JVM, doesn't allow changing environment variables. It was the right choice, even if painful at times.

Java does support running child processes with a designated env space (ProcessBuilder.environment is a modifiable map, copied from the current process), so inability to modify its own doesn't matter.

Personally I have never needed to change env variables. I consider them the same as the command line parameters.

Calzifer1y ago

Java doesn't even allow to change the working directory also due to potential multi-threading problems.

Another reason why Java isn't the greatest language to create CLI tools with.

skissane1y ago

> Java doesn't even allow to change the working directory also due to potential multi-threading problems.

Linux and macOS both support per-thread working directory, although sadly through incompatible APIs.

throwaway20371y ago

It is interesting that they do not allow ability to change env and working dir via security policy or a command line arg (--allow-setenv, etc.).

xxs1y ago

jamesfinlayson1y ago

Sure is painful (mostly when writing tests where the environment variables aren't abstracted in some way).

But I think it was actually possible to hack around up until Java 17.

xxs1y ago

Those are trivial things in around 100 lines of code and have been available since System.getenv() got back (it used to be deprecated and non-functional prior Java 1.5 or 2004)

jamesfinlayson1y ago

A lot of the Java I'm writing is in AWS Lambda so my options are a bit more limited.

j / k navigate · click thread line to collapse