C11 defects: initialization of padding

The C11 has added an attempt to force compilers to initialize padding of structures and unions under certain circumstances. Unfortunately the situation has become confusing now, since it still foresees that padding can be treated differently from other parts of structures that are not initialized explicitly.

(Let’s concentrate on structures for the following, unions are analogous.)

C11 states in “6.7.9 Initialization”, para 10:

If an object that has static or thread storage duration is not initialized explicitly, then:
– ….
– if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;

So this only holds for objects of static storage, at a first view. But then later (para 19) it says in addition:

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

I read here, that padding inside sub-structures that are not initialized explicitly is zero-bit initialized. Let us take an example:

struct T {
  char a;
  /* padding b */
  double c;
  /* padding d */
};
static struct T A[2];
static struct T B[2] = { 0 };

A structure such as struct T has padding “b” between a and c on most (all?) modern architectures. Doubles are usually aligned on a 4 or 8 bit boundary. I might also have padding “d” after c, let us assume that for the sake of the example. Now us look what the standard guarantees for the initialization of A and B.

For A the standard is clear. All named members (A[0].a, A[0].c, A[1].a, A[1].c) are initialized to their default values, (char)0 and 0.0 in that case. And all bits of the padding “b” and “d” in A[0] and A[1] are set to zero.

For B this is different. Again, all named members (B[0].a, B[0].c, B[1].a, B[1].c) are initialized to their default values. No guarantee is given for the bits of the padding “b” and “d” in B[0], they have an indeterminate value. But for B[1] the implicit rule applies so it is initialized as if it where default initialized as whole. So all bits of the padding “b” and “d” in B[1] are set to zero.

In summary, some padding in a structure is guaranteed to be zero-bit initialized, some isn’t. I don’t think that such a confusing situation is intentional.

Older versions didn’t make guarantees about the initialization of padding bits at all. So with most existing compilers you’d have to be even more careful, since they don’t implement C11, yet. But AFAIK, clang already does on that behalf.

Also be aware that this only holds for initialization. Padding isn’t necessarily copied on assignment.

Possible ways to fix this

Option 1: Force static storage to zero bits

I see two possible ways to fix this. The first would be to suppose that default initialization for static objects could be something stronger than an explicit initialization by zero. Add a new paragraph after para 8:

If an object has static or thread storage duration, any padding is initialized to zero bits.

Then, delete all references to padding in the following paragraph (para 9 in current numbering) and change current para 19

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the named members of the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration. Padding inside members of objects that are neither of static or thread storage duration and that is hereby default initialized is in an indeterminate state.

Option 2: treat default initialization and initialization by zero the same

I would be much more in favor of not making a difference between these two types of initialization. AFAIK, such a distinction has never been the intent of the standards committee. This could be done by adding a new paragraph after para 8:

For an object that has is implicitly or explicitly initialized by one of the rules that follow, any padding is initialized to zero bits.

7 thoughts on “C11 defects: initialization of padding”

  1. What problem are we trying to solve here? Developers comparing objects having structure type for equality using memcmp. The reason WG14 (or rather X3J11 back in the day) gave for not supporting aggregrate types as the operands of the equality operators was the large amount of code likely generated for what appears to be a small amount of code. Yes, the code expansion is large when though of in terms of == or !=, but it is small when thought of in terms of a long sequence of tests for each member or the translator could generate a call to memcmp if it knew the paddings bit values would not change the result.

    The solution is to allow the operands to the equality operators to have an aggregate type.

    1. Derek, there are different issues, that should be clearly separated. The first is simply a defect in the current formulation of the standard. For no data type an implicit (default) initialization and an explicit initialization with { 0 } should have different semantics. C11 introduced several possible readings for such a distinction, and this is one of them. I think these should be corrected anywhere we find them. These would be corrigenda and not extensions.

      What you are discussing, and I agree upon that, is a particular problem spot. I can imagine that there have been historical reasons that == and != have not been introduced for structure or union types. Most probably these reasons have lost their relevance. I also think that extending these operators should be possible without invalidating any existing code. Still, introducing that would be an extension to the existing standard and should be discussed in depth for a new round of standardization.

  2. There is no defect in the current wording, all implementation will give the same answer. If you write two slightly different programs and access the padding bits (which is undefined behavior unless done using unsigned char) you can get different answers. What is the purpose of adding extra requirements to the standard so that the answer is always the same? Let’s address the real problem we are trying to solve and not disappear down the initializing padding bits rabbit hole.

    1. As far as I know there is no complete implementation of C11. This is a feature that was added for C11, so how can you make such a statement? (Or do you mean that all implementations in the empty set agree on the interpretation of that new feature, then you are certainly correct 🙂

      And if there was no problem with this type of intialization before, why has this been added to C11? I would expect a more qualified answer on that.

      And the real solution to that kind of problem would simply be to initialize all unnamed members and padding by zero bits, if there is an initialization, implicit or explicit.

  3. I reached this page from Stack Overflow while studying memcmp(). I am trying to understand the discussion above as well as many other esoteric details of C. Please help me with a small point. From the discussion, “Again, all named members (B[0].a, B[0].c, B[1].a, B[1].c) are initialized to their default values.” Does B[0].a get a “default value” from the initializer “= {0}”? When I replace “{0}” with “{1}”, B[0].a gets the value one while B[0].c, B[1].a, B[1].c get the defaults (char)0 or 0.0. B seems to be a hybrid: one part is explicitly initialized while three other parts are not explicitly initialized.

    1. I am not sure that I completely understand your question, but so to see any explicit initializer can be a hybrid. It initializes the members that have a corresponding value in the initializer, and then all other members to the default intializer. Since you never can be sure that someone adds a new member to a structure (perhaps in a far future) that default part can always be present. Also initialization from an explicit 0 is always the same as a default initialization, 0 has a very special role.

      1. Thanks, I struggle with the standards document. I did not realize that initialization from an explicit 0 and default initialization were the same. Even though they are the same, B[0] is still left with no guarantee about its padding bits. The presence of “{0}” seems to preclude the guarantee for the padding bits of B[0]. I am guessing that “{{0}}” and “{{0,0.0}} have the same effect as “{0}” on padding bits of B[0] (no guarantee) and B[1] (implicit rule applies).

Comments are closed.