Jens Gustedt's Blog

July 15, 2013

a praise of size_t and other unsigned types

Filed under: C11, C99, integers, language — Jens Gustedt @ 16:17

Again I had a discussion with someone from a C++ background who claimed that one should use signed integer types where possible, and who also claimed that the unsignedness of size_t is merely a historical accident and would never be defined as such nowadays. I strongly disagree with that, so I decided to write this up, for once.

What I write here will only work with C, and can possibly extended to C++ and other languages that implement unsigned integer types, e.g good old Pascal had a cardinal type.


July 10, 2011

Avoid writing va_arg functions

Filed under: C99, integers, language — Jens Gustedt @ 23:03

As we have seen in another post, functions that receive a variable argument list but which are all the same type are better replaced by a function and variadic macro. P99 has easy means to transform an argument list x0, x2, .., xN into two parameters the first being just the length of the list, here N+1 and an array that has the values. These types of functions can usually be much better optimized in place by modern compilers.

So va_arg functions should first of all restricted to the case that the function may receive incompatible types as its argument (such as floating point and integers). If used for that the default argument promotion rules may be really harmful.

December 27, 2010

constant expressions

Filed under: C99, integers, language — Jens Gustedt @ 09:25

In C, constant expressions come in two different flavors:

  1. integer constant expressions
  2. initializer constant expressions.

The naming of (1.) is (again!) sort of unfortunate, because as we will see below there are constant expressions of integer type that are not integer constant expressions in the sense of the C standard. Better think of (1.) of as compile time constant integer expressions.

November 7, 2010

Don’t use NULL

Filed under: C99, integers, syntax — Jens Gustedt @ 23:36

I always thought that using NULL whenever I wanted to assign a null pointer value to a pointer was a good thing, but today I learned the contrary.

October 23, 2010

A generic swap implementation

Filed under: C99, integers, preprocessor — Jens Gustedt @ 11:13

Swapping the contents of two variable is an elementary task that is often met in daily programming. There are two generic strategies to do that for general types.

October 18, 2010

Detecting integer overflow II

Filed under: C99, integers — Jens Gustedt @ 22:21

In an earlier post we came up with a general solution to check for potential under- or overflow in an integer addition. On most modern architectures this can be done more efficiently, even when assuming that there are no special instructions that capture overflow bits or such.

October 16, 2010

Detecting integer overflow I

Filed under: C99, integers — Jens Gustedt @ 10:15

A recent discussion on stackoverflow has shown that detecting integer overflow without provoking undefined behavior need some reflection, and that the quick answers are not necessarily the best ones.

September 21, 2010

Anatomy of integer types in C

Filed under: C99, integers — Jens Gustedt @ 04:06

Integer types in C may present subtle traps (sic!) that many people are not aware of when doing seemingly simple things like ~0 or

(1 << (sizeof(int)*CHAR_BIT - 1))

Most times, on almost all processors these will produce the desired effects, but sometimes such a code will fail, crash, spit a lot of warnings. I will try to analyze this a bit, to show what may go wrong, here, and how you can get around a lot of possible problems.

June 24, 2010

Integer type confusion

Filed under: C99, integers, POSIX — Jens Gustedt @ 12:59

Many people still use int as the integer type to be used in C. Don’t do that. This is misleading, non portable (yes!) and error prone.

  • Valid use of int is for the return code of a system function or errno.
  • Valid use of char is for the character in a string.
  • Valid use of char* is byte arithmetic on pointers.

That’s it really.

Traditionally, integer types in C are a mess, already syntactically. You can have combinations such as long long unsigned int etc that make the code really hard to read. For the following we will use the 10 different real primitive types that C99 provides and for 6 of them use typedef to have an abbreviation for each of them

typedef signed char schar;
typedef unsigned char uchar;
typedef unsigned short ushort;
typedef unsigned long ulong;
typedef signed long long sllong;
typedef unsigned long long ullong;

To a signed int we will just refer as signed and to an unsigned int as unsigned and we will use short and long for the other two. All these 10 refer to different types for the compiler but may not necessarily be of a distinct width, or of just the width you suppose them to be. No, int is not always 32 bit wide, no, long is not always 64 bit wide, be carefull.

There is another integer type in C99 that doesn’t get too much attention, _Bool (or bool if you include “stdbool.h”). This blog entry here doesn’t go much into the details of Boolean logic, but you should use bool whenever it is appropriate for a variable that has the semantics of a truth value.

Pitfall Number One: array indexing

Arrays in C are indexed from 0 up to n-1, where n is the number of elements in the array, you all know that. So indexing an array with a negative number is something odd, that should only happen under very controlled circumstances.

Never use a signed value to index an array or to store its bounds

unless you are certain of what you are doing. But just as a reflex, don’t use int, signed or something similar.

Use the unsigned integer type of your system that has the correct width to address any size of object that is possible on your system. This type has a name: size_t. Use it, it is as simple as that.

Use size_t to index an array or to store its bounds

By that your code becomes portable since this is well defined on every modern system and you are much less vulnerable to overflow problems.

Pitfall Number Two: type width

First, the width of a type (even if it is unsigned) is not forcibly in direct relation with its size as returned by the sizeof operator: there may unused bits, so-called padding bits.

Then, don’t make false assumptions on the width of any of the 10 integer types. Such assumptions are the main source for 32 versus 64 bit portability problems. The only reasonable assumptions that you can make are for the size

1 = sizeof(schar) <= sizeof(short) <= sizeof(signed) <= sizeof(long) <= sizeof(sllong)
1 = sizeof(uchar) <= sizeof(ushort) <= sizeof(unsigned) <= sizeof(ulong) <= sizeof(ullong)

There are some other properties about the minimum width of these types, but if you stick to the following you wouldn’t need them.

Whenever you use an integer as bitfield think twice of what you assume about the number of bits of the underlying type. Same holds, obviously, if you assume something about the maximum or minimum value of the type. The [u]intX_t family of typedef that is guaranteed to exist for C99 is there for you. Here the X stands for the width that you want, typical values for X are 8, 16, 32, and 64. E.g uint64_t is an unsigned integer type of exactly 64 bits. And so you see also there are 4 such commonly used width but 5 different types. So usually there are at least two basic types with the same width.

Use [u]intX_t whenever you make an assumption about the width of a type

Edit: Well, the [u]intX_t types are only required by POSIX, C99 leaves them optional. C99 only requires the types [u]int_fastX_t to exist. On most commodity architectures [u]intX_t will exist nowadays, but if you want to be sure that your code also is valid on embedded platforms and alike, use [u]int_fastX_t.

If you need the maximum value of an unsigned integer type T, (T)-1 will always do the trick, no need to know about magic constants. If you need to know if a type is signed or unsigned, ((T)-1 < (T)0) will do. Since these are a bit ugly, I use the following macros

#define P99_0(T) ((T)0)
#define P99_M1(T) ((T)-1)
#define P99_ISSIGNED(T) (P99_M1(T) < P99_0(T))

Theses are compile time constant expressions so you may use them even in type-declarations or static initializations as it pleases.

Pitfall Number Three: signedness

Don’t mix signed and unsigned types in comparisons. The promotion rules are complicated and do not only involve the signedness but also whether or not the type is long. In particular, the resulting real comparison may be a comparison of signed values or of unsigned values. Don’t rely on that.

When comparing values of different signedness, use a cast to promote one of them the signedness of the other.

This makes your intentions explicit and easier to follow.

BTW, the correct signed type for size_t is ssize_t, not ptrdiff_t.

Pitfall Number Four: char arithmetic

Signedness of char is simply a mess, don’t rely on it.

When you need a `small’ integer type don’t use char but use [u]int8_t.

Use char as the base type for strings, and char* as a generic pointer type whenever you have to do byte arithmetic on pointers. All other uses of char should be banned.

Pitfall Number Five: file offsets

File offsets (and also block sizes) are yet another value that are independent from pointer or integer width. You may well have a 32 bit system (maximal addressable space 4GiB) with a 64bit file system: think of a large disk partition or of a mount over NFS. The correct type to access file offsets is off_t, which is a signed type. Usually, there are predefined macros that force your system to use the 64 bit variant, if it is available.

Pitfall Number Six: pointer conversions and differences

Generally it is a bad idea to convert a pointer to an integer. It really is. Think of it, twice. If it is unavoidable, int is certainly the wrong type. The reason for that is that the most convenient integer type of the system may be of a certain width x and the width of pointers may be just another value y.

Don’t suppose that pointers and integers have the same width

Use the correct types to transform a pointer to an integer. These have predefined names, too, intptr_t and uintptr_t, if they exist. If they don’t exist on your system, well really don’t do it. They know why they don’t provide it. On one system these might be signed and unsigned on another long and ulong.

When you do pointer arithmetic, use ptrdiff_t for differences between pointers. This is guaranteed to have the correct width. But… even then there might be extreme cases where you have an overflow. If you assume for a moment that pointers are 32 bit and you take the difference of one pointer on the stack (often numbers of the form 0xFFFF….) and another pointer deep down that difference might need the highest bit to code the number and thus if ptrdiff_t is also 32 bit wide, the number might overflow. Be carefull.

June 2, 2010

Right shift on signed types is not well defined

Filed under: C99, integers — Jens Gustedt @ 06:15

The shift operators (<< and >>) shift the bits in a word to the left or the right. From such an explanation it doesn’t follow directly what should happen with the bits at the word boundaries. There are several commonly used strategies

  • logical:
    Bits that go beyond the word boundary are dropped and the new positions are filled with zeroes.
  • ones:
    Bits that go beyond the word boundary are dropped and the new positions are filled with ones.
  • arithmetic:
    1. Shift is `logical’ for positive values.
    2. For negative values right shift is `ones’ and
    3. left shift is `logical’ but always sets the highest order bit (sign bit) to 1.
  • circular: Bits that go beyond the word boundary are reinserted at the other end.

`Arithmetic’ shift has its name from the fact that it implements an integer multiplication or division by a power of two.

For unsigned integer types C prescribes that the shift operators are `logical’ . So e.g (~0U >> 1) results in a word of all ones but for the highest order bit which is 0. The picture darkens when it comes to signed types. Here the compiler implementor may choose between a `logical’ and an `arithmetic’ shift. Basically this means that the use of the right shift operator on signed values is not portable unless very special care is taken. We can detect which shift is implemented by the simple expression ((~0 >> 1) < 0)

  • If the shift is `logical’ the highest order bit of the left side of the comparison is 0 so the result is positive.
  • If the shift is `arithmetic’ the highest order bit of the left side is 1 so the result is negative.

Observe in particular that in case of an arithmetic shift (~0 >> 1) == ~0. So this operator has two fixed points in that case, 0 and -1. If we want a portable shift we may choose the following operations

#define LOGSHIFTR(x,c) (((x) >> (c)) &amp; ~(~0 << (sizeof(int)*CHAR_BIT - (c)))

This produces a mask with the correct number of 1’s in the low order bits and performs a bitwise and with the result of the compiler shift. Observe

  • This supposes that x is of type int, a type independent definition would be much more complicated.
  • c is evaluated twice so don’t use side effects here.

Here is a C99 program to test your compiler.

#include <limits.h>
#include <stdio.h>

int logshiftr(int x, unsigned c);

int arishiftr(int x, unsigned c);

#define HIGHONES(c) ((signed)(~(unsigned)0 << (sizeof(signed)*CHAR_BIT - (c))))
#define HIGHZEROS(c) (~HIGHONES(c))

int logshiftr(int x, unsigned c) {
  return (x >> c) &amp; HIGHZEROS(c);

int arishiftr(int x, unsigned c) {
  return logshiftr(x, c) ^ (x < 0 ? HIGHONES(c) : 0);

int main(int argc) {
  int b = argc > 1 ? argc : 0;
  int val[11u] = { b, b + 1, b - 1, b + 2, b - 2, b + 3, b - 3, b + 4, b - 4, b + 5, b - 5};
  for (unsigned sh = 1; sh < 3; ++sh)
    for (unsigned i = 0; i < 11u; ++i)
             (val[i] >> sh),
             logshiftr(val[i], sh),
             arishiftr(val[i], sh));
Older Posts »

Create a free website or blog at