Many people still use
int as the integer type to be used in C. Don’t do that. This is misleading, non portable (yes!) and error prone.
- Valid use of
intis for the return code of a system function or
- Valid use of
charis for the character in a string.
- Valid use of
char*is byte arithmetic on pointers.
That’s it really.
Traditionally, integer types in C are a mess, already syntactically. You can have combinations such as
long long unsigned int etc that make the code really hard to read. For the following we will use the 10 different real primitive types that C99 provides and for 6 of them use
typedef to have an abbreviation for each of them
typedef signed char schar; typedef unsigned char uchar; typedef unsigned short ushort; typedef unsigned long ulong; typedef signed long long sllong; typedef unsigned long long ullong;
signed int we will just refer as
signed and to an
unsigned int as
unsigned and we will use
long for the other two. All these 10 refer to different types for the compiler but may not necessarily be of a distinct width, or of just the width you suppose them to be. No,
int is not always 32 bit wide, no,
long is not always 64 bit wide, be carefull.
There is another integer type in C99 that doesn’t get too much attention,
bool if you include “stdbool.h”). This blog entry here doesn’t go much into the details of Boolean logic, but you should use
bool whenever it is appropriate for a variable that has the semantics of a truth value.
Pitfall Number One: array indexing
Arrays in C are indexed from 0 up to n-1, where n is the number of elements in the array, you all know that. So indexing an array with a negative number is something odd, that should only happen under very controlled circumstances.
Never use a signed value to index an array or to store its bounds
unless you are certain of what you are doing. But just as a reflex, don’t use
signed or something similar.
Use the unsigned integer type of your system that has the correct width to address any size of object that is possible on your system. This type has a name:
size_t. Use it, it is as simple as that.
size_t to index an array or to store its bounds
By that your code becomes portable since this is well defined on every modern system and you are much less vulnerable to overflow problems.
Pitfall Number Two: type width
First, the width of a type (even if it is unsigned) is not forcibly in direct relation with its size as returned by the
sizeof operator: there may unused bits, so-called padding bits.
Then, don’t make false assumptions on the width of any of the 10 integer types. Such assumptions are the main source for 32 versus 64 bit portability problems. The only reasonable assumptions that you can make are for the size
1 = sizeof(schar) <= sizeof(short) <= sizeof(signed) <= sizeof(long) <= sizeof(sllong) 1 = sizeof(uchar) <= sizeof(ushort) <= sizeof(unsigned) <= sizeof(ulong) <= sizeof(ullong)
There are some other properties about the minimum width of these types, but if you stick to the following you wouldn’t need them.
Whenever you use an integer as bitfield think twice of what you assume about the number of bits of the underlying type. Same holds, obviously, if you assume something about the maximum or minimum value of the type. The
[u]intX_t family of
typedef that is guaranteed to exist for C99 is there for you. Here the X stands for the width that you want, typical values for X are 8, 16, 32, and 64. E.g
uint64_t is an unsigned integer type of exactly 64 bits. And so you see also there are 4 such commonly used width but 5 different types. So usually there are at least two basic types with the same width.
[u]intX_t whenever you make an assumption about the width of a type
Edit: Well, the
[u]intX_t types are only required by POSIX, C99 leaves them optional. C99 only requires the types
[u]int_fastX_t to exist. On most commodity architectures
[u]intX_t will exist nowadays, but if you want to be sure that your code also is valid on embedded platforms and alike, use
If you need the maximum value of an unsigned integer type
(T)-1 will always do the trick, no need to know about magic constants. If you need to know if a type is signed or unsigned,
((T)-1 < (T)0) will do. Since these are a bit ugly, I use the following macros
#define P99_0(T) ((T)0) #define P99_M1(T) ((T)-1) #define P99_ISSIGNED(T) (P99_M1(T) < P99_0(T))
Theses are compile time constant expressions so you may use them even in type-declarations or static initializations as it pleases.
Pitfall Number Three: signedness
Don’t mix signed and unsigned types in comparisons. The promotion rules are complicated and do not only involve the signedness but also whether or not the type is
long. In particular, the resulting real comparison may be a comparison of signed values or of unsigned values. Don’t rely on that.
When comparing values of different signedness, use a cast to promote one of them the signedness of the other.
This makes your intentions explicit and easier to follow.
BTW, the correct signed type for
Pitfall Number Four:
char is simply a mess, don’t rely on it.
When you need a `small’ integer type don’t use
char but use
char as the base type for strings, and
char* as a generic pointer type whenever you have to do byte arithmetic on pointers. All other uses of
char should be banned.
Pitfall Number Five: file offsets
File offsets (and also block sizes) are yet another value that are independent from pointer or integer width. You may well have a 32 bit system (maximal addressable space 4GiB) with a 64bit file system: think of a large disk partition or of a mount over NFS. The correct type to access file offsets is
off_t, which is a signed type. Usually, there are predefined macros that force your system to use the 64 bit variant, if it is available.
Pitfall Number Six: pointer conversions and differences
Generally it is a bad idea to convert a pointer to an integer. It really is. Think of it, twice. If it is unavoidable,
int is certainly the wrong type. The reason for that is that the most convenient integer type of the system may be of a certain width x and the width of pointers may be just another value y.
Don’t suppose that pointers and integers have the same width
Use the correct types to transform a pointer to an integer. These have predefined names, too,
uintptr_t, if they exist. If they don’t exist on your system, well really don’t do it. They know why they don’t provide it. On one system these might be
unsigned on another
When you do pointer arithmetic, use
ptrdiff_t for differences between pointers. This is guaranteed to have the correct width. But… even then there might be extreme cases where you have an overflow. If you assume for a moment that pointers are 32 bit and you take the difference of one pointer on the stack (often numbers of the form 0xFFFF….) and another pointer deep down that difference might need the highest bit to code the number and thus if
ptrdiff_t is also 32 bit wide, the number might overflow. Be carefull.