Why requiring unsafe when the std implementation could take care of the synchron...

masklinn · on Jan 22, 2025

Because the std implementation can not force synchronisation on the libc, so any call into a C library which uses getenv will break... which is exactly what happened in TFA: `openssl-probe` called env::set_var on the Rust side, and the Python interpreter called getenv(3) directly.

rerdavies · on Jan 22, 2025

But the standard implementation could copy the environment at startup, and only uses its copy.

And the library's use of setenv is clearly a bug as setenv is documented to be not threadsafe in the C standard library. So that would take care of that problem.

demurgos · on Jan 23, 2025

If you clone the environment at startup, then you get a situation where code in the same binary can see different values depending if it uses libc or Rust's std. It's also no longer the same environment as in the process metadata.

Using a copy by default may have worked if it was designed as such before Rust 1.0, but Rust took the decision to expose the real environment and changing this now would be more disruptive than marking mutations as unsafe.

miohtama · on Jan 22, 2025

Is it possible to skip libc completely or would this introduce too many portability concerns?

fch42 · on Jan 23, 2025

In short, no - because environment variables are userland state only, you can't interact with them using system calls, the kernel doesn't keep a "canonical" copy of them on behalf of the process. So the "environment" is part of libc, and "libc's way" of interacting with it at runtime "is the way".

From the syscall interface point of view ... you pass the initial env of a process when you exec(), and the kernel copies that to (userland) memory of the new process. The fact "default initialisation" can copy from the environment of the exec()'ing parent, or the fact that the kernel can "read" a process' env (see /proc/<PID>/environ) doesn't change this; the kernel needn't be "accommodating" all the possible and impossible ways how a user application may want to interact with that state there, if you mess-too-much with it, you get garbage. Sooo ... the portability wart is setenv(), because as far as the system is concerned... your "initial" env is passed to you when exec() is called, and any modification thereafter is your concern, your problem, but foremost, your choice. And choices come with taking responsibility for the ones you make.

duped · on Jan 22, 2025

In general, no, because of FFI. In special circumstances, yes, but this isn't really important because the libc implementation is trivial (on all platforms that matter, envp is a char** to strings formatted as KEY=VALUE, set_env(key, value) is equivalent to allocating a new KEY=VALUE string and finding the index of a key if it exists or appending to the array).

Under the hood the pointer is initialized by the loader, in a special place in executable memory. Most of the time, the loader gets the initial environment variable list by looking at argv* (try reading past the end of the null separator, you'll find the initial environment variables).

It would be possible for a language to hack it such that on load they initialize their own env var set without using libc and be able to safely set/get those env vars without going through libc, and to inherit them when spawning child processes by reading the special location instead of the standard location initialized by your platforms' loader/updated by libc. But how useful is a language with FFI that's fundamentally broken since callees can't set environment variables? (probably very useful, since software that relies on this is questionably designed in the first place)

If you wanted to make a bullet proof solution, you would specify the location of an envp mutex in the loaders' format and make it libc's (or any language runtime) problem to acquire that mutex.

* there are platforms where this isn't true

jcotton42 · on Jan 22, 2025

It's not just libc, it's any C or C++ library that calls getenv or setenv.

rerdavies · on Jan 22, 2025

Specifically, any C or C++ library that calls setenv (despite documentation that says that setenv is not threadsafe).

SAI_Peregrinus · on Jan 23, 2025

Or any multithreaded program that uses a C or C++ library that calls setenv somewhere internally, and failed to document that it does so and is thus unsuitable for use by multithreaded programs.

No library does that documentation, so you can't use libraries on POSIX systems if writing multithreaded code. Or you do and hope for the best. So everyone just hopes for the best.

rerdavies · on Jan 24, 2025

The documentation for setenv:

   Caveats: POSIX.1 does not require setenv() or unsetenv() to be reentrant.

   ...
   Interface: setenv(), unsetenv() 
   Attribute: Thread safety
   Value:  MT-Unsafe const:env

Libraries that are thread-safe DO provide that documentation. One assumes that libraries that don't provide that documentation are not thread=safe.

GnuTLS docs: The GnuTLS library is thread safe by design, meaning that objects of the library such as TLS sessions, can be safely divided across threads as long as a single thread accesses a single object.

ChrisSD · on Jan 22, 2025

It can only synchronize if everything using is Rust's functions. But that's not a given. People can use C libraries (especially libc) which won't be aware of Rust's locks. Or they could even use a high level runtime with its own locking but then they'll be distinct from Rust's locks.

The only way to coordinate locking would be to do so in libc itself.

wahern · on Jan 22, 2025

libc does do locking, but it's insufficient. The semantics of getenv/setenv/putenv just aren't safe for multi-threaded mutation, period, because the addresses are exposed. It's not really even a C language issue; were you to design a thread-safe env API, for C or Rust, it would look much different, likely relying on string copying even on reads rather than passing strings by reference (reference counted immutable strings would work, too, but is probably too heavy handed), and definitely not exposing the environ array.

The closest libc can get to MT safety is to never deallocate an environment string or an environ array. Solaris does this--if you continually add new variables with setenv it just leaks environ array memory, or if you continually overwrite a key it just leaks the old value. (IIRC, glibc is halfway there.) But even then it still requires the application to abstain from doing crazy stuff, like modifying the strings you get back from getenv. NetBSD tried adding safer interfaces, like getenv_r, but it's ultimately insufficient to meaningfully address the problem.

The right answer for safe, portable programs is to not mutate the environment once you go multi-threaded, or even better just treat process environment as immutable once you enter your main loop or otherwise finish with initial process setup. glibc could (and maybe should) fully adopt the Solaris solution (currently, IIRC, glibc leaks env strings but not environ arrays), but if applications are using the environment variable table as a global, shared, mutable key-value store, then leaking memory probably isn't what they want, either. Either way, the best solution is to stop treating it as mutable.

ChrisSD · on Jan 22, 2025

A safe API would look a lot like Windows' GetEnvironmentVariable and SetEnvironmentVariable

https://learn.microsoft.com/en-us/windows/win32/api/winbase/...

wahern · on Jan 22, 2025

Yep. GetEnvironmentStrings and FreeEnvironmentStrings are probably even more noteworthy as they seem to substitute for an exposed environ array, though they push more effort to the application.

demurgos · on Jan 22, 2025

It can't ensure synchronization because any code using libc could bypass the sync wrapper. In particular, Rust lets you link C libs which wouldn't use the Rust stdlib.

msully4321 · on Jan 22, 2025

Because it can still race with C code using the standard library. getenv calls are common in C libraries; the call to getenv in this post was inside of strerror.

fsckboy · on Jan 22, 2025

you've gotten a lot of answers which say the same thing, but which I don't think answer your question:

synchronization methods impose various complexity and performance penalties, and single threaded applications which don't need that would pay those penalties and get no benefit.

Unix was designed around a lightweight ethos that allowed simple combining of functions by the user on the command line. See "worse is better", but tl;dr that way of doing things proved better, and that's why you find yourself confronting what it doesn't do.

davidt84 · on Jan 22, 2025

The real problem is that getenv() and setenv() were created before threads were really a thing.

kazinator · on Jan 24, 2025

getenv can easily be misused in a single threaded program.

davidt84 · on Jan 24, 2025

But it is possible to safely use it in a single threaded program.

There's no way to use it safely in a multi threaded application that may use setenv (unless you add your own synchronisation, and ensure everything uses it, even third party libraries).

kazinator · on Jan 24, 2025

Actually I don't believe that's the case. The getenv function as described by ISO C cannot be safely used in a program that only uses getenv, if that program uses ISO C threads, and more than one threat calls getenv without synchronizing with the others.

I don't think POSIX fixes this: it doesn't specify that the environ array is protected against concurrent access.

If two threads call getenv right around the same time, one of them could invalidate the environ array just as the other one has started to traverse it.

If you want to be safe, copy the environment to a different data structure on program startup. Then have all your threads refer to that data structure.

davidt84 · on Jan 24, 2025

Hmm, I'm apparently correct for C++11, where calling getenv only is thread safe, but that's not guaranteed by earlier standards (or, as far as I can tell, by C or POSIX).

kazinator · on Jan 24, 2025

I'm surprised C++ would have anything to say about getenv; mostly it just includes the standard C library via normative reference.

davidt84 · on Jan 25, 2025

Nevertheless, https://en.cppreference.com/w/cpp/utility/program/getenv

kazinator · on Jan 25, 2025

Interesting. So in a multi-threaded C++ program that only calls the std:: getenv function, everything is fine.

If anything calls the C getenv function, like a third party library, things are maybe not fine.

sunshowers · on Jan 22, 2025

Well it was better in the short term but is worse in the long term. In particular, the error handling situation is generally atrocious, which is fine for interactive/sysadmin use but much worse for serious production use.