Keyword overloading: the static keyword

The keyword static has seen a wider and wider use in the different versions of C and C++. For the use that it has now, compared to the beginnings of C, it would have been better to use another token for it something in the vein of variant or alternative_declaration… Here I try to list all the different usages of that keyword and how the introduction of static in a declaration chooses an alternative version of such a declaration.

Linkage specification

In C and C++ unless specified otherwise an object that is defined in file scope, i.e outside of any function, has static storage and external linkage. That is, space for the object is reserved at compile time and it is visible by the linker. Other code of other compilation units (.o files) may refer to such an object by its name.

The linkage changes when a declaration is prefixed with static: the object becomes internal to the compilation unit and is not accessible from other units. Different objects in different units with the same name that are declared static may co-exist even if they have different type and size.

C++ deprecates this use of static and provides the concept of an anonymous namespace as a replacement.

Storage class specification

In C and C++ unless specified otherwise a variable that is defined in function scope has automatic storage: at run time, when the function is called, memory for the variable is reserved on the execution stack. Each call of the function, in particular if is recursive, has its own new version of the variable.

The storage class changes when a declaration is prefixed with static: the object becomes static. Space for the variable is reserved at compile time and all calls of the function see the same variable, read and write into the same storage location.

Class member declaration

In C and C++ each instance of a struct (and in C++ also for a class) has its own copy of each of its declared members.

struct toto { double a; };
struct toto A;
struct toto B;

Here A.a and B.a are guaranteed to be two different objects with different location in storage

In C++ this changes when the declaration of the member is prefixed with static

struct toti { static double a; };
struct toti A;
struct toti B;

Here A.a and B.a are guaranteed to be the same object. Both refer to exactly the same storage location.

Arrays as function parameters

As one of the rough edges of the C language(s), two declaration that look exactly the same (but for the names), one in the prototype and one as a stack variable, result in the declaration of two different types of variables.

void foo(double A[10]) {
   double B[10];    
}

Inside the scope of foo, A is pointer to double and B is array of ten elements of type double. Even their sizes computed with sizeof are different.
C++ inherited this rule.

C99 complicates this matter even further by introducing the keyword static to introduce yet another variant

void foo(double A[static 10]) {
   double B[10];    
}

this doesn't change the rules on how A and B are seen from the inside, but provides an information to the caller side of how much array elements are expected. C99 specifies that it is responsibility of the caller to give at least as many elements as the array bound specifies.

For the moment gcc accepts this new syntax and simply ignores this information, so it is not C99 conforming with respect to that feature.

Avoiding name conflics for libraries

C programs usually use quite a lot of libraries with a lot of predefined names. These libraries may be coming from different origins, they may be part of the C runtime or your operating system (I’ll assume that this is POSIX), from third parties or be written by yourself or your collaborators. Conflicting names are a nasty thing to track down and may require big changes if they are detected too late.

Reserved Identifiers

All systems define a whole bunch of names that go far beyond just the keywords of the language. See the links that are provided below if you have a doubt for an individual name. Nevertheless there are systematic patterns that you should avoid for portability to other (future?) systems that you don’t have your hand on.

External Symbols in File Scope

All identifiers in file scope that start with an underscore _ are reserved for the runtime. So don’t use an identifier that could conflict with that rule, in particular

  • No globally visible identifier should start with an underscore. This concerns functions, variables, typedefs and enum constants.
  • Avoid names in tag space (struct union or enum) with an underscore. Although they are not in direct conflict with external symbols it might be difficult to find errors because of that.
  • Don’t #define names starting with an underscore. They might overwrite system symbols.

Reserved future keywords and preprocessor macros

Names starting with an underscore followed by a capital letter or a second underscore are reserved for future use as keywords and macros. This is e.g what happened during the extension of C from C89 to C99, where the keyword _Bool was included. Since this type of name was reserved beforehand, the standards body was free to chose this name for the new Boolean data type. The typedef bool that most people use is only defined in the standard header file stdbool.h. Only when including this header a conflict with that name (that had not been reserved previously) could occur.

Thus, in the inside of functions or as components of struct or union names with underscores would be ok, in general, as long as they don’t have a capital letter or a second underscore in second position.

Reserved future types

POSIX reserves all other names that end in _t for future use as types. Don’t use them, they are ugly anyhow.

Chose a prefix

If you expect/estimate/dream that the library that you are designing will be use a lot by others stick to a simple naming convention: chose a prefix, like demo_. This makes identifier conflicts easy to track and avoids potential conflicts with macro definitions that somebody else might use.

  struct demo_coucou {
     unsigned demo_age;
     double demo_weight;
  };

Observe that this also uses the prefix for individual fields in the struct. If we would just use age or weight they might conflict with some code that uses this as names of #define

Compatablity of C headers in C++

In C++ struct and union tags can be used as type identifiers iff they are not used in the identifier namespace. Using tag names and identifiers for different kinds of objects/concepts is a bad idea anyhow, so avoid that. I think the best way to do so is to typedef all struct and union types to the same name.

typedef struct demo_coucou demo_coucou;

For C, this allows you to do a forward declaration of struct demo_coucou and of demo_coucou in just one line. And for C++ this is valid, too, it just makes their convention of the implicit typedef explicit. As an additional advantage it hinders you to accidentally declare another identifier of that name yourself, which would much perturb C++ when it sees your declaration.

Further reading

how to make sem_t interrupt safe

The POSIX semaphore wait calls

int sem_wait(sem_t *sem);
int sem_trywait(sem_t *sem);
int sem_timedwait(sem_t *sem, const struct timespec *abs_timeout);

can be interrupted at any time, e.g by IO that is delivered or if a process child terminates. In such a case errno is set to EINTR:

ERRORS
EINTR The call was interrupted by a signal handler; see signal(7).

NOTES
A signal handler always interrupts a blocked call to one of these functions, regardless of the use
of the sigaction(2) SA_RESTART flag.

(Side note, sem_post is always atomic in that sense and will never return EINTR.)

So if we don’t use sem_t in a signal handler context and don’t check for EINTR on return of the wait, the whole semaphore approach becomes error prone. In particular there are good chances that with small test cases during development everything runs fine, but that then during production byzantine errors occur once in a while that will be very hard to track.

To test for EINTR systematically it is easy to write wrappers that just re-issue the corresponding wait whence the return indicates that the call was interrupted.

static inline
int sem_wait_nointr(sem_t *sem) {
  while (sem_wait(sem))
    if (errno == EINTR) errno = 0;
    else return -1;
  return 0;
}

static inline
int sem_trywait_nointr(sem_t *sem) {
  while (sem_trywait(sem))
    if (errno == EINTR) errno = 0;
    else return -1;
  return 0;
}

static inline
int sem_timedwait_nointr(sem_t *sem, const struct timespec *abs_timeout) {
  while (sem_timedwait(sem, abs_timeout))
    if (errno == EINTR) errno = 0;
    else return -1;
  return 0;
}

Since they use the inline keyword, the variants given here are only suitable for C99, but it should be easy for you to adapt them for other contexts. The inline here has the advantage that the call can be inlined efficiently at the caller side, basically resulting in the call of the real POSIX function in question plus some conditional jumps.

BTW, the evaluation of errno here has some thread magic. The POSIX standard guarantees that it must behave as if it were a local variable for each thread, so we don’t have to worry about concurrent access to it.