C11 has added a certain level of Unicode support to C, but I think for C2x it will be time to go a step further and put C in line with general usage of special characters as they are normalized by Unicode. In particular, it is time to get rid of restrictions in operator naming that stem from the limited availability of special characters 30 years ago, when all of this was invented.
March 9, 2017
January 26, 2017
Some time ago I had advertised for one of my new toys, Modular C, an extension of the C language aiming for modularity, and, more generally for an easier use. Since then it has gained some new features and I have rounded up some edges, so you all should definitively have another look 🙂
Among the new features are
- complete Unicode support (identifiers, operators)
- finite code unrolling (
- programmable expression contexts (e.g for modulo arithmetic or string operations)
Also, there is now some concise documentation, at
November 25, 2016
You find the latest version of my book at the usual place at
This now contains all the material that I plan to put into it. New are “experience” parts about performance, function-like macros and
_Generic, control flow, threads and atomics. There is in particular a section about memory consistency, probably the most difficult write-up for this book.
These higher level parts have not yet have had the reading they hopefully deserve, so please give it a try. As before, constructive feedback is highly welcome. Thanks to everybody who already did send me comments and corrections!
Constructive comments to this blog are welcome.
This is not the right place for
- feedback about details (missing commas, spelling errors). If you have such details, please collect them and send them to me directly.
- rants against the C standard. This is not the place to let off steam nor to compare to whatever other programming language you prefer. If you have criticism about the C standard, please contribute. The C standards committee is a nice batch of people, don’t be afraid. And if you have concrete proposals, also it would be much better to contact me (or any other member of the committee) directly and see how you may invest yourself in the ISO process.
Also, comments that come from real people are highly preferred.
September 7, 2016
With this post I will start to discuss a series of modifications of the C standard that I have (or will) propose for C2x. As a starting point there is the observation that the passage from one C standard to the next was not easy in the past. This, neither for implementers of C compilers nor for users, because it was not easy to capture partial improvements. Basically for gcc, e.g., you had to add
-std=c11 or similar, and then test for the version number to see if a particular feature is implemented. This task then was even more complicated because some of the new features are language, and some are library.
This was tedious and error prone, and I think we should avoid such difficulties in the future.
August 17, 2016
I already have posted about the evilness of cast some time ago, but recently I have seen that there is still much confusion out there about the C rules that pointer casts, also called type punning, break. So I will try to give it a more systematic approach, here.
In fact, I think we should distinguish two different sets of rules, the effective type rules that are explicitly written up in the C standard (6.5 p6 and p7), and the strict aliasing rules that are consequence of these, but that concern only a very special use case of type punning. They represent only one of the multiple ways a violation of the effective type rules can damage the program state.
July 29, 2016
In a document to the standards committee of last year I had observed a set of unfortunate inconsistencies in C11’s specification of arithmetic operations for atomics. As you perhaps know such operations can be specified with operators
a++ (in the language part) or generic functions
atomic_fetch_add(&a, 1) (in the library), but unfortunately the two parts had not been redacted to fit well in all places.
My new document Minimal Suggested Corrigendum for Arithmetic on Atomic Objects will hopefully make it as proposed corrigendum into the next “bugfix” version of the standard, and hopefully this new version will see the light and the end of 2017. It will perhaps not be with the exact words as they are presented, but I think that the text already comes close to what the intent of the committee had been, and to what implementors actually do, anyhow.
July 4, 2015
Since decades, C is one of the most widely used programming languages, and is used successfully for large software projects that are ubiquitous in modern computing devices of all scales. For many programmers, software projects and commercial enterprises C has advantages (relative simplicity, faithfulness to modern architectures, backward and forward compatibility) that largely outweigh its shortcomings. Among these shortcomings, is a lack of two important closely related features: modularity and reusability. C misses to encapsulate different translation units (TU) properly: all symbols that are part of the interface of a software unit such as functions are shared between all TU that are linked together into an executable.
May 11, 2015
Recently it showed that the C standard seems to be ambiguous on how to interpret the controlling expression of
_Generic, the one that determines the choice. Compiler implementors have given different answers to this question; we will see below that there is code that is interpreted quite differently by different existing compilers. None of them is “wrong” at a first view, so this tells us that we must be careful when we use
_Generic. In this post I will try to explain the problem and to give you some work around for common cases.
April 25, 2015
In discussions about C or other standards (e.g POSIX) you may often have come across the term undefined behavior, UB for short. Unfortunately this term is sometimes mystified and even used to scare people unnecessarily, claiming that arbitrary things even outside the scope of your program can happen if a program “encounters undefined behavior”. (And I probably contributed to this at some point.)
First, this is just crappy language. How can something “encounter” behavior? It can’t. What is simple meant by such slang is that such a program has no defined behavior, or stated yet otherwise, the standard doesn’t define what to do if such a program reaches the state in question. The C standard defines undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
Simple examples of undefined behavior in C are out-of-bounds access to arrays or overflow of signed integer types. More complicated cases arise when violating aliasing rules or when access to data races between different threads.
Now we often hear statements like “UB may format your hard drive” or “UB may steal all your money from your bank account” implying somehow that a program that is completely unrelated to my disk administration or to my online banking, could by some mysterious force have these evil effects. This is (almost) complete nonsense.
If UB in a simple program of yours formats your hard drive, you shouldn’t blame your program. No simple application run by an unprivileged user should have such devastating consequences, this would be completely inappropriate. If such things happen, it is your system which is at fault, get you another one.
As an analogy from every-day life, take the idea of locking your house at night, which seems to be the rule in some societies. Sure, if you don’t do that, you make it easier for somebody to sneak into your house and shoot you. But still, that person would be a murderer, to be punished for that by the law, no sane legal system would acquit such a person or see this as mitigating circumstances.
Now things are in fact different when there is a direct relation between the object of your negligence and the ill-effect that it causes. If you leave your purse on the table in the pub when going to the bathroom, you should take a share in responsibility if it is stolen. Or to come back to the programming context, if you are programming a security sensible application (e.g handling passwords or bank credentials) then you should be extremely careful and stay on well-defined grounds.
When programming in C there are different kinds of constructs or data for which the program behavior then is undefined.
- Behavior can be explicitly declared undefined or implicitly left undefined.
- The error can be detectable or undetectable at compile time.
- The error can be detectable or undetectable during execution.
- Behavior of certain constructs can be undefined by a specific standard to allow extensions.
Unfortunately, people often use UB as an excuse to evacuate questions about certain behavior of their code (or compiler). This is just rude behavior by itself, and you usually shouldn’t accept this. An implementation should have reasons and policies how it deals with UB, otherwise it is just bad, and you should switch to something more friendly, if you may.
There is no reason not to be friendly, and there are many to be.
That is, in our context, once a (your!) code detects that it has a problem it must handle it. Possible strategies are
- abort the compilation or the running program
- return an error code
- report the problem
- define and document the behavior in your own terms
For the first you should always have in mind that the program should be debugable, so you really should use
static_assert for compile time errors and
abort for run time errors. Also, willingly aborting the program is not the same as having the program crash, see below.
Obviously, the second is only possible if the function in question has an established error return convention and if you can expect users of the function to check for that convention. POSIX has many such cases where the documentation says something like “may” return a certain error code. A C (or POSIX) library implementation that detects such a “may” case and doesn’t react accordingly, is of bad quality.
Reporting detected errors is an important alternative and some compiler implementors have chosen this as their default answer to problems. Perhaps you already have seen gcc’s famous “diagnostic” message
dereferencing pointer ‘bla’ does break strict-aliasing rules
This message supposes that you know what aliasing rules are, and that you also know why it came to that. Observe that it says “does” so the compiler claims to have proven that the aliasing rules are violated, and that the behavior is thus undefined. What the message doesn’t tell, and that is bad, is how it resolves the problem.
The last variant, defining your own stuff, should be handled with extreme care. Not only that what you define must be sensible, you also make a commitment in doing so. You promise your clients that you will follow these new rules in the future and that they may suppose that you will take care of the underlying problem. In most cases, you should leave the definition of extensions to the big players, platform designers or other standards. E.g POSIX defines a lot of cases that are UB for C as such.
As a last alternative, when
- the error is undetectable
- the detection of the error would be too expensive
you should simply let the program crash. The best strategy I know of is to always initialize all variables and always treat 0 as a special value. This may, under rare circumstance deal a tiny bit of performance against security, because on some rare occasions your compiler might not be able to optimize an unused initialization. If you do this, most errors that you will see are dereferences of null pointers.
Dereferencing a null pointer has UB. But modern architecture no how to handle this: they raise a “segmentation fault” error and terminate the program. This is the nicest failure path that you can offer to your clients, failing anytime, anywhere.
February 7, 2015
I am pleased to announce the feature completion of Level 2 of my book
It deals with most principal concepts and features of the C
programming language, such as control structures, data types,
operators and functions. Its knowledge should be sufficient for an
introductory course to Algorithms, with the noticeable particularity
that pointers aren’t fully introduced here, yet.
As before, the current version of the book can be found at my homepage
and also as before, constructive feedback is highly welcome. Many
thanks to those that already gave such valuable feedback for previous