integers – Jens Gustedt's Blog

Early access to the C23 edition of Modern C

Manning’s early access program (MEAP) for the new edition is now open
at

https://www.manning.com/books/modern-c-third-edition

There is a special code mlgustedt2 to get 45% off of the official price. This is currently limited until Jan 31.

The previous edition already has been largely successful and is considered by some as one of the reference books on C. This new edition has been the occasion to overhaul the presentation in many places, but its main purpose is the update to the new C standard, C23. The goal is to publish this new edition of Modern C at the same time as the new C standard goes through the procedure of ISO publication and as new releases of major compilers will implement all the new features that it brings.

Among the most noticeable changes and additions that we handle are those for integers: there are new bit-precise types coined _BitInt(N), new C library headers (for arithmetic with overflow check) and (for bit manipulation), possibilities for 128 bit types on modern architectures, and substantial improvements for enumeration types. Other new concepts in C23 include a nullptr constant and its underlying type, syntactic annotation with attributes, more tools for type generic programming such as type inference with auto and typeof, default initialization with {}, even for variable length arrays, and constexpr for name constants of any type. Furthermore, new material has been added, discussing compound expressions and lambdas, so-called “internationalization”, a comprehensive approach for program failure.

During MEAP we will also add an appendix and continue work on a temporary include header for an easy transition to C23 on existing platforms, that will allow you to start off with C23 right away.

We also encourage you to post any questions or comments you have about the content in the liveBook Discussion forum. We appreciate knowing where we can make improvements and increase your understanding of the material.

Note that the “about this book” section in MEAP is a bit hidden under the front page. I think that reading this even before going in details is important so I repeat some of this information here.

a praise of size_t and other unsigned types

Again I had a discussion with someone from a C++ background who claimed that one should use signed integer types where possible, and who also claimed that the unsignedness of size_t is merely a historical accident and would never be defined as such nowadays. I strongly disagree with that, so I decided to write this up, for once.

What I write here will only work with C, and can possibly extended to C++ and other languages that implement unsigned integer types, e.g good old Pascal had a cardinal type.

Continue reading “a praise of size_t and other unsigned types”

Avoid writing `va_arg` functions

As we have seen in another post, functions that receive a variable argument list but which are all the same type are better replaced by a function and variadic macro. P99 has easy means to transform an argument list x0, x2, .., xN into two parameters the first being just the length of the list, here N+1 and an array that has the values. These types of functions can usually be much better optimized in place by modern compilers.

So va_arg functions should first of all restricted to the case that the function may receive incompatible types as its argument (such as floating point and integers). If used for that the default argument promotion rules may be really harmful.
Continue reading “Avoid writing va_arg functions”

constant expressions

In C, constant expressions come in two different flavors:

integer constant expressions
initializer constant expressions.

The naming of (1.) is (again!) sort of unfortunate, because as we will see below there are constant expressions of integer type that are not integer constant expressions in the sense of the C standard. Better think of (1.) of as compile time constant integer expressions.
Continue reading “constant expressions”

Don’t use `NULL`

I always thought that using NULL whenever I wanted to assign a null pointer value to a pointer was a good thing, but today I learned the contrary.
Continue reading “Don’t use NULL“

A generic `swap` implementation

Swapping the contents of two variable is an elementary task that is often met in daily programming. There are two generic strategies to do that for general types.
Continue reading “A generic swap implementation”

Detecting integer overflow II

In an earlier post we came up with a general solution to check for potential under- or overflow in an integer addition. On most modern architectures this can be done more efficiently, even when assuming that there are no special instructions that capture overflow bits or such.
Continue reading “Detecting integer overflow II”

Detecting integer overflow I

A recent discussion on stackoverflow has shown that detecting integer overflow without provoking undefined behavior need some reflection, and that the quick answers are not necessarily the best ones.
Continue reading “Detecting integer overflow I”

Anatomy of integer types in C

Integer types in C may present subtle traps (sic!) that many people are not aware of when doing seemingly simple things like ~0 or

(1 << (sizeof(int)*CHAR_BIT - 1))

Most times, on almost all processors these will produce the desired effects, but sometimes such a code will fail, crash, spit a lot of warnings. I will try to analyze this a bit, to show what may go wrong, here, and how you can get around a lot of possible problems.
Continue reading “Anatomy of integer types in C”

Integer type confusion

Many people still use int as the integer type to be used in C. Don’t do that. This is misleading, non portable (yes!) and error prone.

Valid use of int is for the return code of a system function or errno.
Valid use of char is for the character in a string.
Valid use of char* is byte arithmetic on pointers.

That’s it really.

Traditionally, integer types in C are a mess, already syntactically. You can have combinations such as long long unsigned int etc that make the code really hard to read. For the following we will use the 10 different real primitive types that C99 provides and for 6 of them use typedef to have an abbreviation for each of them

typedef signed char schar;
typedef unsigned char uchar;
typedef unsigned short ushort;
typedef unsigned long ulong;
typedef signed long long sllong;
typedef unsigned long long ullong;

To a signed int we will just refer as signed and to an unsigned int as unsigned and we will use short and long for the other two. All these 10 refer to different types for the compiler but may not necessarily be of a distinct width, or of just the width you suppose them to be. No, int is not always 32 bit wide, no, long is not always 64 bit wide, be carefull.

There is another integer type in C99 that doesn’t get too much attention, _Bool (or bool if you include “stdbool.h”). This blog entry here doesn’t go much into the details of Boolean logic, but you should use bool whenever it is appropriate for a variable that has the semantics of a truth value.

Pitfall Number One: array indexing

Arrays in C are indexed from 0 up to n-1, where n is the number of elements in the array, you all know that. So indexing an array with a negative number is something odd, that should only happen under very controlled circumstances.

Never use a signed value to index an array or to store its bounds

unless you are certain of what you are doing. But just as a reflex, don’t use int, signed or something similar.

Use the unsigned integer type of your system that has the correct width to address any size of object that is possible on your system. This type has a name: size_t. Use it, it is as simple as that.

Use size_t to index an array or to store its bounds

By that your code becomes portable since this is well defined on every modern system and you are much less vulnerable to overflow problems.

Pitfall Number Two: type width

First, the width of a type (even if it is unsigned) is not forcibly in direct relation with its size as returned by the sizeof operator: there may unused bits, so-called padding bits.

Then, don’t make false assumptions on the width of any of the 10 integer types. Such assumptions are the main source for 32 versus 64 bit portability problems. The only reasonable assumptions that you can make are for the size

1 = sizeof(schar) <= sizeof(short) <= sizeof(signed) <= sizeof(long) <= sizeof(sllong)
1 = sizeof(uchar) <= sizeof(ushort) <= sizeof(unsigned) <= sizeof(ulong) <= sizeof(ullong)

There are some other properties about the minimum width of these types, but if you stick to the following you wouldn’t need them.

Whenever you use an integer as bitfield think twice of what you assume about the number of bits of the underlying type. Same holds, obviously, if you assume something about the maximum or minimum value of the type. The [u]intX_t family of typedef that is guaranteed to exist for C99 is there for you. Here the X stands for the width that you want, typical values for X are 8, 16, 32, and 64. E.g uint64_t is an unsigned integer type of exactly 64 bits. And so you see also there are 4 such commonly used width but 5 different types. So usually there are at least two basic types with the same width.

Use [u]intX_t whenever you make an assumption about the width of a type

Edit: Well, the [u]intX_t types are only required by POSIX, C99 leaves them optional. C99 only requires the types [u]int_fastX_t to exist. On most commodity architectures [u]intX_t will exist nowadays, but if you want to be sure that your code also is valid on embedded platforms and alike, use [u]int_fastX_t.

If you need the maximum value of an unsigned integer type T, (T)-1 will always do the trick, no need to know about magic constants. If you need to know if a type is signed or unsigned, ((T)-1 < (T)0) will do. Since these are a bit ugly, I use the following macros

#define P99_0(T) ((T)0)
#define P99_M1(T) ((T)-1)
#define P99_ISSIGNED(T) (P99_M1(T) < P99_0(T))

Theses are compile time constant expressions so you may use them even in type-declarations or static initializations as it pleases.

Pitfall Number Three: signedness

Don’t mix signed and unsigned types in comparisons. The promotion rules are complicated and do not only involve the signedness but also whether or not the type is long. In particular, the resulting real comparison may be a comparison of signed values or of unsigned values. Don’t rely on that.

When comparing values of different signedness, use a cast to promote one of them the signedness of the other.

This makes your intentions explicit and easier to follow.

BTW, the correct signed type for size_t is ssize_t, not ptrdiff_t.

Pitfall Number Four: `char` arithmetic

Signedness of char is simply a mess, don’t rely on it.

When you need a `small’ integer type don’t use char but use [u]int8_t.

Use char as the base type for strings, and char* as a generic pointer type whenever you have to do byte arithmetic on pointers. All other uses of char should be banned.

Pitfall Number Five: file offsets

File offsets (and also block sizes) are yet another value that are independent from pointer or integer width. You may well have a 32 bit system (maximal addressable space 4GiB) with a 64bit file system: think of a large disk partition or of a mount over NFS. The correct type to access file offsets is off_t, which is a signed type. Usually, there are predefined macros that force your system to use the 64 bit variant, if it is available.

Pitfall Number Six: pointer conversions and differences

Generally it is a bad idea to convert a pointer to an integer. It really is. Think of it, twice. If it is unavoidable, int is certainly the wrong type. The reason for that is that the most convenient integer type of the system may be of a certain width x and the width of pointers may be just another value y.

Don’t suppose that pointers and integers have the same width

Use the correct types to transform a pointer to an integer. These have predefined names, too, intptr_t and uintptr_t, if they exist. If they don’t exist on your system, well really don’t do it. They know why they don’t provide it. On one system these might be signed and unsigned on another long and ulong.

When you do pointer arithmetic, use ptrdiff_t for differences between pointers. This is guaranteed to have the correct width. But… even then there might be extreme cases where you have an overflow. If you assume for a moment that pointers are 32 bit and you take the difference of one pointer on the stack (often numbers of the form 0xFFFF….) and another pointer deep down that difference might need the highest bit to code the number and thus if ptrdiff_t is also 32 bit wide, the number might overflow. Be carefull.

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Pitfall Number One: array indexing

Pitfall Number Two: type width

Pitfall Number Three: signedness

Pitfall Number Four: char arithmetic

Pitfall Number Five: file offsets

Pitfall Number Six: pointer conversions and differences

Share this:

Pitfall Number Four: `char` arithmetic