chasing a phantom: checking the return of malloc

Often you see or hear as one of the first rules that are taught about the use of malloc (and derivatives) that you’d have to check the return value, to see if it is 0 and thus to know whether it failed. Although there are situations in which malloc may fail and that this check makes sense, doing so gives you mostly false security. In most situations where this might fail you are in trouble for quite a while and the user of the program (if any) will most probably have aborted the execution since long.

Don’t understand me wrong, I don’t say you shouldn’t ever check the return of malloc, I just will try to show you that there are many other things that have to be considered before that, and that to my opinion have much more importance. They are important on systems that have very restrictive possibilities for memory allocation. Usually these are so-called freestanding environments: embedded devices, space rockets, Linux kernel, or other very specialized stuff. Programming on multi-commodity architectures (hosted environment)is quite different from that.

malloc can return 0 under two different circumstances:

  1. The object size that was to be allocated was specified to be zero
  2. The memory system is saturated and an object of that size could not be allocated

Zero size allocations

The first condition is not imposed by the standard. When called with an argument of 0 the standard allows malloc to return a sort of unique address, which is not a null pointer and which you don’t have the right to access, nevertheless. It just imposes that such an address must also be a valid argument for free. So

Checking the return of malloc against 0 doesn’t protect you from allocation of zero-sized objects.

Always check the argument(s) to malloc and Co. or make sure by other means that you will not pass 0 to malloc. The way to avoid that is to avoid malloc in the first place. Use local (auto or register) variables whenever you may. If you are allocating an object of a primitive data type (int, double, void* …) with malloc (and conceptually this is not an array of length 1) you are most probably on the wrong track. Rethink your design.

The second easiest way to that is to never use explicit arithmetic as an argument to malloc. Always use a sizeof expression. In the simplest case this is just the variable that you are dealing with

strongNode * eff = malloc(sizeof *eff);

such a sizeof expression will never be zero. As an additional feature the type of eff is only specified once, if you later change it to weakNode, say, the rest will still work.

Even for vector allocations you don’t need explicit arithmetic. Just do something like

double * geh = malloc(sizeof(double[en]));

Technically, this uses the size of a variable length array, but this shouldn’t bother you. This is just a readable variant of telling malloc to allocate the space for the array type that you have in mind.

In P99 there is a convenience macro P99_MALLOC that condenses the malloc and the sizeof into one.

strongNode * eff = P99_MALLOC(*eff);
double * geh = P99_MALLOC(double[en]);

Memory subsystem failure

The second error condition (memory exhausted) on the other hand is quite rare in hosted environments. Usually you are in trouble long before you hassle with this kind of error. On a hosted environment malloc doesn’t allocate physical memory (this or that memory bank of your computer) but allocates memory in a virtual address space. This virtual address space is an abstraction that is provided by the OS that lets your process live in a sandbox where it is relatively undisturbed by other processes, and where the system juggles data between different forms of storage facilities (cache memory of different levels, RAM, “disk” such as hard-disk or SSD).

This world of the virtual address space is a well maintained fiction, that helps us to survive the daily challenge of programming. Just like Zaphod Beeblebrox only was able to survive the Total Perspective Vortex because he affronted it in a specially constructed, parallel universe. But just as for Zaphod, there might be problems in the real universe that you will not trick by just ignoring them.

As said virtual address space is usually backed by different types of devices, that have quite different properties, in particular quite different bandwidth (what throughput do I get) and latency (how fast do I get an answer). For practical purposes this means:

A system that is out of memory is unresponsive loooooong time before malloc will fail.

To that difficulty of a hanging systems comes an extra property that a modern system (notably Linux) might have, overcommitment. This is when virtual address space isn’t even backed up by some physical device. The system happily assigns addresses to your process that then are returned by malloc and it only faults when you access a page for which there is no physical correspondence. But when programming “normally” you will always get hit by the unresponsiveness of the system. Exhausting the 248 virtual addresses of a modern system and having malloc fail for that reason would be a sportive challenge.

To my experience, in 99% of the cases a program execution that runs out of memory is the result of a serious bug. Only the other 1% is due to a platform/problem/user combination reaching the limits. To avoid the first 99% you can use several things

  • Review your code and have it reviewed by someone else. In particular assure yourself that all allocations are freed somewhere. All.
  • Use an allocation checking tool like valgrind. Test your program under different scenarios and don’t stop until valgrind finds no leak at all.

Only then it makes sense of asking your self the question, “what if I run out of memory?“. And now you may see if checking for the return of malloc at a particular place of your code or perhaps the redesign of your data structures is the right answer.

Coding Style

Another disadvantage of the “check the return of malloc” idiom is code readability. People tend to decorate their code with complicated captures of that error condition that span several lines, doing perror and stuff like that. As already The Elders said (citation taken a bit out of context)

… the supply of new-lines on your screen is not a renewable resource …

there might be a trade-off between explicit error checking and readability of your code. If error checking conditions dominate your code and distract from the essential control flow you have a design problem. Wrap your error checking in a function or macro to avoid that.

The easiest, to my taste, would be something like memset(malloc(n), 0, 1). This just writes a 0 in the first byte and crashes nicely if malloc had an error or n was 0 to begin with.

A good coding style is also to always initialize variables. In particular this is crucial for pointer variables:

  • Situations where the compiler will not be able to optimize an initialization if it is overwritten by an assignment are rare. Only optimize for that case it a profiler tells you that there is a performance bottleneck.
  • These situations are most often even easy to avoid in the first place. C99 allows you to declare a variable at its first use. Such a first use must be an assignment (otherwise you have undefined behavior) so why not just declare it there, where it pops into live, and initialized it directly.
  • Initialize struct variables with initializer expressions enclosed in {}. This guarantees you that forgotten pointer members are initialized to 0.
  • Write a “init” function for every struct type that is allocated through malloc. As an extra bonus you then get P99_NEW that lets you easily allocate an initialize an object in one go.

7 thoughts on “chasing a phantom: checking the return of malloc

  1. I like that one:

    The wordpress automaton writes me

    Want more readers?

    You used the following categories and tags: C99.

    Add a couple more to make your post easier for others to discover. Some suggestions: space rockets, null pointer, zero size, memory system, and false security.

    How can a robot be so ignorant not to recognize that Beeblebrox is the most relevant keyword on that blog post.

  2. I’d like to agree with you, but I don’t think I can … fully. I come from a world where disk-based memory backing is a big no-no, for the reasons you mention:

    1. There is no disk on the device (embedded systems, retail products),
    2. The orders of magnitude reduction in memory responsiveness is prohibitive (real-time systems, computation engines), and/or
    3. The operating system or hardware doesn’t support it (you’ve got to hand-roll overlays, etc.)

    So, as you say, it may NOT be true that the system becomes unresponsive “long before the malloc fails”; it may pootle along fine until you try to allocate twelve bytes to store “Hello World”, and then BAM!

    My worry isn’t that people do or do not test the return value from malloc, but that they don’t do intelligent things when it DOES fail. Many systems just throw up an error message and abort.

    Take a games console example: if you fail to allocate memory this video frame, don’t abort the game, just defer the processing that requires more memory until the next frame. Keep retrying for a few seconds and THEN abort. If another subsystem is active and taking up a large chunk of physical memory but then successfully completes, you’ll get your memory later on.

    Alternatively, if an allocation fails, try to perform the processing using less memory. For example, if your super-fast sorting algorithm uses O(n*n) extra memory, fail-over to one that uses only O(n) if there isn’t enough memory. It’s better to get your results a little later, than not at all.

    Alas, this is all HARD WORK. And sometimes, on very small systems, the code bytes to handle the fail-overs take up enough space to make running out of memory much more likely.

    My concern is that without being rigorous about error checking and recovery under ALL circumstances, we don’t have those skills at hand when they’re REALLY needed. In my experience, “constrained system” programmers can write “applications” software, but the converse is generally NOT true.

    1. Ian,
      thanks for your remarks.

      Perhaps I have not emphasized enough on the distinction that all that I say here is for hosted environments, and I have to admit that I simply don’t have the experience of programming for freestanding ones.

      So yes, as you say, for freestanding environments this is much more complicated and subtle, and error checking there is mandatory. And I can easily imagine that in such environments the failure path is a piece of art. Inside an OS kernel, a space rocket, a pacemaker or anything as critical as that, aborting is not an option.

      > My concern is that without being rigorous about error checking and recovery
      > under ALL circumstances, we don’t have those skills at hand when they’re REALLY
      > needed.

      This one, I am a little bit less convinced. I think that everybody has to start somewhere, and I’d be already much more comfortable if people knew about the possibility and the syntax for the initialization of struct, that struct can be assigned, and that they tidy up their code such that it is readable. Telling people to check for situations that they don’t have the skills to handle correctly looks not as a priority to me.

      Jens

      1. “Telling people to check for situations that they don’t have the skills to handle correctly looks not as a priority to me.”

        Isn’t that a bit like saying to learner drivers “If you depress the pedal too much at the wrong time and in the wrong circumstances, you’ll skid. But don’t worry about that!” Surely, it’s prudent to teach them how to handle skids as early as possible, and teach them how to mitigate the risks.

        Less syntax, more semantics and defensive programming!

        1. To stay with your image, there are certain dangers that are common for all cars. Those should be the basis to know for all people who want a drivers licence. But you can’t teach everybody for real how to drive on snow, for the simple reason that you might not have snow available. So you might tell those people some basics on how to use their break etc under such circumstances, but most of them will not listen. And they are mostly right, they better focus on the real dangers that they are likely to encounter.

          And, for driving a racing car on ice I prefer that people pass through an additional exam.

          > P.S. I’m playing devil’s advocate here!

          sure!

      2. Ooh, ooh! I’m liking this analogy! 😉

        I come from a country where 1cm of snow causes chaos on the roads (UK) usually because people who SHOULD know better get in their cars and almost immediately crash or get stuck. Just one abandoned car in the wrong place is a nightmare.

        How often have you started using a third-party software library only to discover that, if a problem that the library designer/programmer didn’t THINK was likely to occur DOES occur it logs an error and calls exit()? If you’re a consumer of such a library, even if you had a “Plan B” for when the problem (that turns out to be actually quite common) occurs, you can’t do anything about it. Your application just aborts. Alternatively, you have to duplicate all the error-checking that is (or should be) inside the library to intercept the potential problem BEFORE you call the library.

        Software is getting more complex, so re-use is more prevalent. It only takes one bad apple in the barrel…

Comments are closed.