> You don't know which functions will touch the variable and when. How do you no...

elpatoisthebest · on March 27, 2020

I don't think that person means you literally can't know, just that it increases the difficulty of reasoning through the code.

I was debugging some code earlier today. Someone had put a global variable that is either altered or used in 4 or 5 different functions across our codebase. I had to literally draw out the paths a user could go down to figure out what the value of this global variable would be at the time I was trying to call one of those functions. It was not awesome.

I figured it out, so you're right. I do know which functions touch the variable and NOW I know when. But I still can't guarantee the value of the variable.

Needless to say, tomorrow will see a little refactoring.

gregmac · on March 27, 2020

I was dealing with a hard problem earlier this week, which I'm pretty sure was causing a thread to crash without logging anything, but the program to stay running. Unfortunately, only seen in production and only once every few days.

The program does several stages of data processing in parallel batches, initially loading and eventually saving to a database. It's basically a "continuous" and complicated ETL.

There is effectively a set of global state variables to track progress of each input item through the stages. The values in this global state can depend on the data, execution order, and can be modified from a dozen places in the code.

I narrowed down several potential crash points, which was basically stuff like: if the global state contains x and a db lookup in thread 2 times out, if thread 3 accesses the value before 2 starts the next batch it could get a null reference. Another was based on making a decision to insert or update: in theory, the two global state value that effectively made this decision could never be set to states where it would do the wrong thing (getting either a foreign or duplicate key error) but the state is possible to represent.

If I were to run in a debugger using the massive production data stream I might eventually get lucky and see the data that triggers this. However, I could also sit for days and get nowhere, or the act of debugging and inspecting night be enough to prevent a race condition and not trigger a bug.

I still don't know for sure what's happening (though now there's instrumentation and better error handling in those spots so hopefully I will), but the point here is it's nearly impossible to reason about in a definitive way.

MaulingMonkey · on March 27, 2020

This works fine on small scales.

When dealing with millions of lines of code, I do not have the time to read the whole thing and internalize it's whole state. Understanding the call graph can help, but diving through every abstract interface and callback and abstraction is a non-starter. Even if I had time to read the entire codebase line by line, I wouldn't be able to fit it all in my head, and I often have enough coworkers that changes are occuring faster than I can read and understand them all.

Even the codebases I work on are dwarfed by much larger ones.

AlexCoventry · on March 27, 2020

> How do you not know? You have the source code.

For instance, concurrent accesses and modifications could occur in any order.

Sammi · on March 27, 2020

You loose local reasoning as was already said.

In theory you have the source code and you can know everything just by reading it all and debugging it all. In practice it becomes overwhelming.

Even intelligent people can only fit a little bit of information into working memory in their heads at a time. Mere mortals have no chance. We need things to be bite size and local and simple so we can fit it in our heads and reason about it.

Global variables force you to do global reasoning, which a human mind just doesn't have the capacity to do.

meheleventyone · on March 27, 2020

There are lots of ingenious ways to accidentally hide where a variable is used. Start passing some pointers around and storing them off under different names.

And of course with a race condition in a multithreaded context knowing where a variable is accessed is about 1% of the battle.