Macros versus inline functions

Functions (whether inline or not) and macros fulfill different purposes. Their difference should not be seen as ideological as some seem to take it, and what is even more important, they may work nicely together.

Macros are text replacement that is done at compile time and they can do things like

#define P99_ISSIGNED(T) ((T)-1 < (T)0)

which gives you a compile time expression of whether or not an integral type is signed or not. That is, they are ideally used when the type of an expression is not known (at the definition) and you want to do something about it. On the other hand, the pitfall with macros is that their arguments may be evaluated several times, which is bad because of side effects.

Functions on the other hand are typed, which makes them more strict or, phrased negatively, less flexible. Consider the functions

inline
uintmax_t absU(uintmax_t a) {
  return a;
}
inline
uintmax_t absS(uintmax_t a) {
   return (-a < a) ? -a : a;
}

The first implements the trivial abs function for an unsigned integral type. The second implements it for a signed type. (Yes, it takes an unsigned as argument, this is for purpose.)

We may use these with any integral type. But, the return type will always be of the largest width and there is a certain difficulty on knowing how do choose between the two.

Now with the following macro

#define ABS(T, A) ((T)(P99_ISSIGNED(T) ? absS : absU)(A))

we have implemented a

  • family of functions
  • that works for any integral type
  • that evaluates its argument only once
  • for which any recent and decent compiler will create optimal code

Well, I admit that doing this with abs is a bit artificial, but I hope you get the picture.

Misnomers in C

C has many misnomers concerning keywords. Here I give a table of possible keywords and convenient macro names that might replace them. New keywords in future C standards usually start with an underscore and a capital letter. Macros in new header files may be just convenient names. (For an example take the inclusion of _Bool and bool into C99).

This is not completely serious, but it might give you an idea of the different concepts. It also should emphasize the fact that none of these keywords is ignored by a conforming C compiler. If you have any ideas of other keywords or other possible namings, please let me know.

keyword/macro replacement macro comment
and_eq &= and_assign is an assignment not a comparison
const _Immutable immutable
_Invariant invariant
char _Byte byte in combination with signed and unsigned
inline _Negligible negligible
or_eq |= or_assign is an assignment not a comparison

register
_Addressless addressless


static

_Intern intern for use in file scope
_Common common for use in function scope
_Atleast atleast for use in parameter declaration. A similar proposal was rejected by the standards committee
typedef _Typealias typealias Doesn’t define a type
union _Overlay overlay
unsigned _Modulo modulo unsigned integers just implement computation modulo a power of 2
xor_eq ^= xor_assign is an assignment not a comparison

Edit: I changed

  • once to common, because this reflects better that this is a variable that is common to all invocations of a function.
  • synonym to typealias, to stay closer to the original

A common misconception: the register keyword

Since probably its beginnings C has the register keyword and I recently read several times that it has no effect and would be ignored by the compiler. This is a misconception of what this keyword does and what it is meant to do. Unfortunately, this is just due to the misnoming of the feature as register. One of the misnomers that we encounter in C, others being static, inline and const.

The only purpose that register nowadays (and since long) has is that inhibits the taking of the address of a variable. I makes e.g the following code invalid:

register double a;
double *ap = &a;

The principal use of this is to serve as an optimization hint to the compiler:

I know, I wouldn’t need an address of that variable, do all you can to optimize the access to it

In that sense it is of the same importance as the restrict keyword.

This can be particularly useful for array variables. An array variable is easily confounded with a pointer variable. Unless it is followed by a [expr] or with a sizeof it evaluates to the address of the first element. If you declare the array register all these uses are forbidden; we only access individual elements or ask for the total size. Such an register-array then may be much easier used as if it just were a set of variable by the optimizer. No aliasing (accessing the same variable through different pointers) may occur.

Another use for declaring a variable as register and const is to inhibit any non-local change of that variable, even trough taking its address and then casting the pointer. Even if you think that you yourself would never do this, once you pass a pointer (even with a const attribute) to some other function, you can never be sure that this might be malicious and change the variable under your feet. So in a setting with exposure to risks, you could declare all your variables as being register and then carefully inspect all the remaining places where you take addresses and pass stack pointers to other functions.

As said, unfortunately this purpose is not so easily deducible from its name `register`. Holding the variable in a register of the CPU is just one possible optimization that the compiler might find for such a variable, another one would be to just realize it as an immediate assembler operator. Maybe one day a compiler could enforce to hold such a variable in cache, and not to spill it out to higher levels of the memory hierarchy. Or maybe in many cases it just can’t do anything for you, perhaps it was worth to try?

Edit: register variables are also the technical base of a proposal for generic named constants that I recently elaborated.

Scope Bound Resource Management with for Scopes

Resource management can be tedious in C. E.g to protect a critical block from simultaneous execution in a threaded environment you’d have to place a lock / unlock pair before and after that block:

pthread_mutex_t guard = PTHREAD_MUTEX_INTIALIZER;

pthread_mutex_lock(&guard);
// critical block comes here
pthread_mutex_unlock(&guard);

This is very much error prone since you have to provide such calls every time you have such a block. If the block is longer than some lines it is difficult to keep track of that, since the lock / unlock calls are spread on the same level as the other code.

Within C99 (and equally in C++, BTW) it is possible to extend the language of some sorts such that you may make this easier visible and guarantee that your lock / unlock calls are matching. Below , we will give an example of a macro that will help us to write something like

P99_PROTECTED_BLOCK(pthread_mutex_lock(&guard), 
    pthread_mutex_unlock(&guard)) {
       // critical block comes here
}

If we want to make this even a bit more comfortable for cases that we still need to know the mutex variable we may have something like:

GUARDED_BLOCK(guard) {
       // critical block comes here
}

The macro P99_PROTECTED_BLOCK can be defined as follows:

#define P99_PROTECTED_BLOCK(BEFORE, AFTER)                         \
for (int _one1_ = 1;                                               \
     /* be sure to execute BEFORE only at the first evaluation */  \
     (_one1_ ? ((void)(BEFORE), _one1_) : _one1_);                 \
     /* run AFTER exactly once */                                  \
     ((void)(AFTER), _one1_ = 0))                                  \
  /* Ensure that a `break' will still execute AFTER */             \ 
  for (; _one1_; _one1_ = 0)

As you may see, this uses two for statements. The first defines an auxiliary variable _one1_ that is used to control that the dependent code is only executed exactly once. The arguments BEFORE and AFTER are then placed such that they will be executed before and after the dependent code, respectively.

The second for is just there to make sure that AFTER is even executed when the dependent code executes a break statement. For other preliminary exits such as continue, return or exit there is unfortunately no such cure. When programming the dependent statement we have to be careful about these, but this problem is just the same as it had been in the “plain” C version.

Generally there is no run time performance cost for using such a macro. Any decent compiler will detect that the dependent code is executed exactly once, and thus optimize out all the control that has to do with our variable _one1_.

The GUARDED_BLOCK macro could now be realized as:

#define GUARDED_BLOCK(NAME)        \
P99_PROTECTED_BLOCK(               \
    pthread_mutex_lock(&(NAME)),   \
    pthread_mutex_unlock(&(NAME)))

Now, to have more specific control about the mutex variable we may use the following:

#define P99_GUARDED_BLOCK(T, NAME, INITIAL, BEFORE, AFTER)           \
for (int _one1_ = 1; _one1_; _one1_ = 0)                             \
  for (T NAME = (INITIAL);                                           \
       /* be sure to execute BEFORE only at the first evaluation */  \
       (_one1_ ? ((void)(BEFORE), _one1_) : _one1_);                 \
       /* run AFTER exactly once */                                  \
       ((void)(AFTER), _one1_ = 0))                                  \
    /* Ensure that a `break' will still execute AFTER */             \
    for (; _one1_; _one1_ = 0)

This is a bit more complex than the previous one because in addition it declares a local variable NAME of type T and initializes it.

Unfortunately, the use of static for the declaration of a for-scope variable is not allowed by the standard. To implement a simple macro for a critical section in programs that would not depend on any argument, we have to do a bit more than this.

Other block macros that can be implemented with such a technique:

  • pre- and postconditions
  • make sure that some dynamic initialization of a static variable is performed exactly once
  • code instrumentation

P99 now has a lot of examples that use this feature.

va_arg functions and macros

Traditionally C has functions with a variable length argument list, so-called variadic functions. The handling of such arguments is done with the va_list data type from stdarg.h and the corresponding macros. I see two pitfalls with this type of approach that usually make it relatively difficult to use, even in cases where the arguments are supposed to be all the same type T.

  • There is no indication by these macros how long the list that is passed as argument is.
  • There is an implicit conversion of small integer arguments to signed or unsigned int according to the integer promotion rules. These types only have an implementation defined width.

Then first pitfalls requires that usually we need to apply one of the following techniques to handle the list:

  • Terminate the list at each call by a special value. This convention has the disadvantage that each caller has to follow this rule and that failing to do so might produce errors that are hard to track.
  • Provide a count of the arguments as an extra parameter that precedes the list. Whereas here also the calling side must do something for each call, at least the convention can be determined from the prototype of the function.
  • As a variation of this gives a format string of how the arguments are to be interpreted. The printf family of functions uses this approach,

Since C99 we now have macros with variable length argument lists. These can be used to interface functions that obtain a length parameter and an array of type T and that then are much easier to use on the calling side. Suppose that we have a function varArrFunc and a macro varListMacro as follows (for and explanation of the implementation see below)

   #define P99_CALL_VA_ARG(NAME, TYPE, ...)  (NAME(P99_NARG(__VA_ARG__), (TYPE[]){ __VA_ARG__ }))

   void varArrFunc(size_t len, T* A);
   #define varListMacro(...)  P99_CALL_VA_ARG(varArrFunc, T,  __VA_ARG__ )

Such a macro/function pair may then just be called as varListMacro(78, 7, 9, 99) or varListMacro("a", "toto"), if for the first example we assume that T is compatible with int or for the second that it is with char*. As we can see this avoids both pitfalls

  • There is no need to have a calling side convention to handle the length of the argument list.
  • All argument conversion is to a type T that we specify clearly in the definition of varListMacro. If e.g we specify T to be uint64_t we will always know which value the function varArrFunc will see if we feed in (signed char)-1 as an argument.

How does this work? First we need a macro P99_NARG(...) that provides us with the number of arguments that it receives. We showed how to implement such a macro in a earlier post. Then in its second part the macro P99_CALL_VA_ARG uses a compound literal to pass an array of base type T with our arguments as initial values to the function varArrFunc.

Such an implementation is at least as efficient as would be an implementation of varArrFunc itself as a variadic function.

  • The length of the array is computed at compile time. It is known there, so the information should not get lost.
  • As for the variadic function approach at run time each individual argument is only evaluated once.
  • Where the variadic function approach would implement the argument list on the stack of the callee, here the array is implemented on the stack of the caller. In any case it is on the stack. For any of the calling conventions that we mentioned above we would either need an extra terminating argument or an extra parameter, so our use of a length parameter to varArrFunc is as efficient as that.
  • As an extra bonus, the call to varArrFunc may even be inlined, if we specify it with inline. This then may lead to optimizations that generally are more difficult to achieve for the variadic function approach:
    • The handling the array A of parameters inside varArrFunc will usually be done with a simple for-loop.
    • This loop then has known bounds for each call and the compiler may do loop unrolling
    • Once unrolled, the compiler might even avoid the whole generation of the array and use the parameter expressions directly.