Demystify undefined behavior

In discussions about C or other standards (e.g POSIX) you may often have come across the term undefined behavior, UB for short. Unfortunately this term is sometimes mystified and even used to scare people unnecessarily, claiming that arbitrary things even outside the scope of your program can happen if a program “encounters undefined behavior”. (And I probably contributed to this at some point.)

First, this is just crappy language. How can something “encounter” behavior?  It can’t. What is simple meant by such slang is that such a program has no defined behavior, or stated yet otherwise, the standard doesn’t define what to do if such a program reaches the state in question. The C standard defines undefined behavior

behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements

That’s it.

Simple examples of undefined behavior in C are out-of-bounds access to arrays or overflow of signed integer types. More complicated cases arise when violating aliasing rules or when access to data races between different threads.

Now we often hear statements like “UB may format your hard drive” or “UB may steal all your money from your bank account” implying somehow that a program that is completely unrelated to my disk administration or to my online banking, could by some mysterious force have these evil effects. This is (almost) complete nonsense.

If UB in a simple program of yours formats your hard drive, you shouldn’t blame your program. No simple application run by an unprivileged user should have such devastating consequences, this would be completely inappropriate. If such things happen, it is your system which is at fault, get you another one.

As an analogy from every-day life, take the idea of locking your house at night, which seems to be the rule in some societies. Sure, if you don’t do that, you make it easier for somebody to sneak into your house and shoot you.   But still, that person would be a murderer, to be punished for that by the law, no sane legal system would acquit such a person or see this as mitigating circumstances.

Now things are in fact different when there is a direct relation between the object of your negligence and the ill-effect that it causes. If you leave your purse on the table in the pub when going to the bathroom, you should take a share in responsibility if it is stolen. Or to come back to the programming context, if you are programming a security sensible application (e.g handling passwords or bank credentials) then you should be extremely careful and stay on well-defined grounds.

When programming in C there are different kinds of constructs or data for which the program behavior then is undefined.

  • Behavior can be explicitly declared undefined or implicitly left undefined.
  • The error can be detectable or undetectable at compile time.
  • The error can be detectable or undetectable during execution.
  • Behavior of certain constructs can be undefined by a specific standard to allow extensions.

Unfortunately, people often use UB as an excuse to evacuate questions about certain behavior of their code (or compiler). This is just rude behavior by itself, and you usually shouldn’t accept this. An implementation should have reasons and policies how it deals with UB, otherwise it is just bad, and you should switch to something more friendly, if you may.

There is no reason not to be friendly, and there are many to be.

That is, in our context, once a (your!) code detects that it has a problem it must handle it. Possible strategies are

  • abort the compilation or the running program
  • return an error code
  • report the problem
  • define and document the behavior in your own terms

For the first you should always have in mind that the program should be debugable, so you really should use #error or static_assert for compile time errors and assert and abort for run time errors. Also, willingly aborting the program is not the same as having the program crash, see below.

Obviously, the second is only possible if the function in question has an established error return convention and if you can expect users of the function to check for that convention. POSIX has many such cases where the documentation says something like “may” return a certain error code. A C (or POSIX) library implementation that detects such a “may” case and doesn’t react accordingly, is of bad quality.

Reporting detected errors is an important alternative and some compiler implementors have chosen this as their default answer to problems. Perhaps you already have seen gcc’s famous “diagnostic” message

dereferencing pointer ‘bla’ does break strict-aliasing rules

This message supposes that you know what aliasing rules are, and that you also know why it came to that. Observe that it says “does” so the compiler claims to have proven that the aliasing rules are violated, and that the behavior is thus undefined. What the message doesn’t tell, and that is bad, is how it resolves the problem.

The last variant, defining your own stuff, should be handled with extreme care. Not only that what you define must be sensible, you also make a commitment in doing so. You promise your clients that you will follow these new rules in the future and that they may suppose that you will take care of the underlying problem. In most cases, you should leave the definition of extensions to the big players, platform designers or other standards. E.g POSIX defines a lot of cases that are UB for C as such.

As a last alternative, when

  • the error is undetectable
  • the detection of the error would be too expensive

you should simply let the program crash. The best strategy I know of is to always initialize all variables and always treat 0 as a special value. This may, under rare circumstance deal a tiny bit of performance against security, because on some rare occasions your compiler might not be able to optimize an unused initialization. If you do this, most errors that you will see are dereferences of null pointers.

Dereferencing a null pointer has UB. But modern architecture no how to handle this: they raise a  “segmentation fault” error and terminate the program. This is the nicest failure path that you can offer to your clients, failing anytime, anywhere.

10 thoughts on “Demystify undefined behavior”

  1. Chris Lattner has a great series of posts discussing undefined behavior (starting with http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html ). The comments about formatted hard drives, emptied bank accounts, or nasal demons ( http://www.catb.org/jargon/html/N/nasal-demons.html ) are meant to point out “undefined behavior” isn’t necessarily intuitive. At the very least, it’s silly to expect the scope of undefined behavior to be well defined.

    Examples Lattner gives include:

    * dereferencing a pointer, later checking whether the pointer was NULL and if not, using the value (since dereferencing a NULL pointer is undefined behavior, checking after the fact does no good; a compiler is justified in removing the (late) NULL check.)

    * reading an uninitialized value (since reading an uninitialized value is undefined behavior, a compiler is justified in assuming the code in question is dead and optimizing away the whole branch)

    * signed integer overflow leading to an infinite loop (signed integer overflow is undefined, so the compiler is free to elide the checks necessary to detect it; and “elide” includes “remove checks the programmer added to detect overflow after it has happened”).

    1. Max, thanks for the links and the additional examples. As you say, this jargon about “nasal daemons” was probably meant to show a point. I agree that errors that lead to UB can have effects that might be counter-intuitive at first. My problem with this is that nowadays this is used to put programming in C in some sort of “wizzard” or “guru” sphere and even sometimes are use to scare instead to educate. As Chris post shows, there are a handful of typical errors that lead to undefined behavior. The real problem are these errors, and not the UB. I would be happy if we could replace saying “look, your code has UB because your variable may overflow” by “look, here your variable overflows, this is an error“.

      1. I’ll certainly agree that undefined behavior has become something of a boogieman. And while I agree that we need to stop scaring people, I understand the reaction given that it’s possible to accurately say “undefined behavior on line 20 is causing the bug you see on line 5; and undefined behavior on line 32 allows you to see symptoms that the language says are impossible (e.g., a bool that is both true and false).”

        1. Maybe, but why not just say: “the error on line 20 …”. I don’t have the impression that even in your example the term UB buys you much.

          1. I apologize for forgetting to reply earlier. I think there’s a good reason distinguish different kinds of errors, whether you’re trying to do the wrong thing, you make a mistake while trying to do the right thing, you write valid code with a very different understanding of what it means compared to the compiler (say, using a “break” to exit an “if” block — which isn’t valid C, but which would compile if the “if” block is inside a loop), etc.

            Undefined behavior is an important class of bugs. You can call it what you want — “this code is outside the spec” — but bugs related to undefined behavior act differently from other classes of bugs, and require a different mindset to find than other kinds of bugs. I think there’s value in making that kind of distinction.

            1. I see what you want to say, but still I think that you formulate it the wrong way around. It is not “bugs related to UB” but the other way around “UB related to bugs”. First there is the bug, error or however you want to call it. Then there is the behavior of a program that has that bug. And sure we have to distinguish here, if this is detectable at compile time, leading to a diagnostic, or by nature of the bug may be undetectable, then. For run time errors, there are different cases, e.g. some that have simple assertions (checking for null pointers) and others where checking would have to use information that might not be present at the place the error occurs (e.g out of bounds access). The term UB isn’t precise enough to capture all of this, and therefore not too helpful.

              1. That was a very interesting discussion. I agree with Jens that the important thing is the bug, and bringing UB into the discussion *should* not be needed – in an ideal world.

                Alas, in practice I think most of us start in the camp “it works and the compiler doesn’t complain, so THIS can not be a bug – the bug we are looking for must be somewhere else”; and the *only* way to convince people to fix an UB-related bug is to educate them about UB. Which of course brings us back to having to mention UB prominently. There’s a reason the “nasal demons” are now in the programming folklore: how many times was that expression needed to force understanding into someone’s complacency? For sure it was needed in my case.

                So I end up agreeing with Max. The fact that UB is so easy to cause, while mostly undetectable unless you go out of your way to bring it to light (with special tools, assembly inspection in luckily clear cases, etc), is what makes it so abuse-able as a boogieman.

                Now, Jen is right that abusing the abuse and converting the problem of UB into an excuse for scaring/wizardry/etc is of course another layer of the problem, which with a bit of luck will be dealt with if any of the initiatives for “boringCC” or “Friendly C” get some traction. But this highlights that the problem is in the language and/or the infrastructure. We have a barely-adequate language like C that evolves by barely fixing old potholes while introducing new problems (“restricted”? strict-aliasing? compilers moving aggressive, UB-baring optimizations down the -O scale?). Maybe this was Jen’s focus after all, but if so I think it could be made clearer.

                So this is an environment which practically *requires* wizardry. Which is sad, and scary: our infrastructure is made by wizards. No, doubly sad and doubly scary: I hope they ARE wizards. If they aren’t, that’s a nightmare waiting to happen.

                Finally, note that trying to avoid initiation into wizardry only delays it. I try to minimize the UB mystification for less-experienced colleagues by adding enough compiler flags that the most UB-likely problems get avoided, or at least tamer. But even then, one has to be ready to explain the “nasal demons” the moment someone thinks that you’re removing “potential for optimization” by using -fno-strict-aliasing or -fwrapv. And that eventually happens: so one just can’t leave UB outside of the discussion for long.

                1. Thanks for your contribution, but you may imagine that I don’t share all opinions of yours. First, I’d like to emphasize that the C committee is well aware of the problems that go with UB, and the future version of C (C2x?) will try to close some of them. My favorites would be to have all variables 0-initilalized, and to have more support for compile time checking of out-of-bounds access. Then, I don’t see that compiler messages are already there where I think they should be. Only recently I had something from gcc in the tone “accessing bla is UB” instead of saying “access to element i of array A is out of bounds”. And no, C doesn’t require wizardry for most of the things people do with it. It just requires some self-discipline:

                  • correct semantic typing for all variable,
                  • systematic initialization, in particular for pointers,
                  • using 2D arrays instead of pointer-tables, and
                  • interdiction of all casts

                  This has 99% of the problems vanish. All of this starts with teaching C correctly by putting some brains into it.

                  1. I don’t know what will happen in C2X, but the evolution of the standards doesn’t give me much hope. Surely the committee already knew about the problems with UB on 1989, and still the list of UBs has (mostly) grown since then (as a quick but unscientific and semiunfair proxy measurement).

                    As for easy fixes for UB, your rules wouldn’t have avoided my first brush against it. In the implementation of a simple, straightforward stack-based calculator, I wrote this:

                    sum = stack[level--] + stack[level--];
                    

                    Isn’t that a natural, simple implementation of the sum operator? To me, it’s blood-curling that it already triggers UB. No pointers, no casts, no type foolery. It’s like a mine waiting for the newbie while s/he’s still in the easy part of the language.

                    I just tried compiling that. Clang warns by default, GCC only does with -Wall. I don’t remember getting a warning back then. Still, it’s little relief that the tool warns me against the mine put there by the standard – but I should be thankful, since the standard of course goes on to say that the tool doesn’t even have to warn me.

                    Also, there’s the fact that the big users of C prefer to disable compiler features to avoid UB. Linux, PostgresSQL…? If kernel developers are bitten by UB, what chances do “normal developers” have? When the discussion reaches the point of “UB travels back in time” [1], what hope is there?

                    [1] http://shape-of-code.coding-guidelines.com/2012/07/12/undefined-behavior-can-travel-back-in-time/

                    1. For me it is not so “natural”, but I am certainly biased. Having the increment operator return a value and even distinguishing prefix and postfix variants is not my favorite idea of readable code. One of the things that should make it on my list as to be avoided (certainly for beginners) is “coding with side effects”. To come back to the idea of my blog entry: all of this are perhaps valid points to speak against C and similar languages, and the emphasis that it makes on potential optimizations.

                      It is not a point for talking about UB in a general context. In your example, there is not UB for UB but there is an error to start with. The error is that you don’t have the right to change an object twice if there is no sequence point in between. This rule has a reason, one may agree with that reason or not. This error is detectable at compile time in most cases, and so it should be warned about as most modern compilers do. This error may also occur undetectable because the one object that is changed is accessed through different pointers.

                      The points I want to make in this blog post: (1) People shouldn’t talk about UB, they should talk about the error in the code (2) People shouldn’t scare other people, but educate them why something is “wrong”

Comments are closed.