A Common C/C++ Core Specification rev 2

v2 of that specification just appeared on the C comittee website:

A Common C/C++ Core Specification rev 2

For an introduction to the originial version see “A common C/C++ core specification”

Most important additions in this revision are

  • three-way comparison (spaceship operator)
  • “initializer” construct for captures of lambdas
  • a tool for textual representation of all basic types and arrays, totext
  • more attributes: core::free , core::realloc, core::noleak,
    core::writethrough, core::concurrent
  • constexpr
  • if with variable definitions

and some more cleanups and harmonizations between C and C++.

A common C/C++ core specification

I just published a new document on the WG14 (C committee) site

N2494: A common C/C++ core specification

It is attempt to push the fast-forward button to the annoyingly slow development process for C and its coordination with C++.

Continue reading “A common C/C++ core specification”

cross-language interfaces between C and C++

As you know, C and C++ are sister languages that have a lot in common, but that drifted quite apart over the years. In general, neither code of one language can be compiled as the other language, there are too many major and minor twists that make this impossible. Not only are there syntax differences between the two, some common syntax can actually have diverging semantics. So generally, it makes no sense to compile C code with a C++ compiler, and you should look with suspicion at any code or programmer that claims to do so.

Where C and C++ usually agree, though, is on the ABI, the application binary interface, so data structures and functions of one language can be used by the other to some extent. C and C++ also kept a sufficiently wide intersection in there respective specification of interfaces, such that one header file can be used from both.

In this post I try collect those parts that are in that intersection, and I propose some coding style that should accommodate both worlds suitably well. But as my personal history goes, this will merely be a POV of a C programmer that wants to provide interfaces for C++.

Continue reading “cross-language interfaces between C and C++”

right angle brackets: shifting semantics

As I showed int this post, using > as right angle brackets was not a particularly good idea, but trying to patch this misdesign even makes it worth. After a bit of experimenting I found an expression that is in fact valid for both, C++98 and C++11, but that has a different interpretation in both languages:

fon< fun< 1 >>::three >::two >::one

So if you have to maintain a large code base with templates that depend on integers that are perhaps produced automatically by some tools, be happy, you will not be out of work for a while: changing your compiler to C++11 might change the semantics of your code.

Continue reading “right angle brackets: shifting semantics”

A disimprovement observed from the outside: right angle brackets

It is long time that I didn’t look into C++, I have to admit. By coincidence I recently unearthed a hilarious example that I had once written that shows the difficulty of parsing some C++ code, as well as for compilers as for us poor humans. It all starts with the >> operator that (supposedly until C++11) could cause problems as in the following:

toto< tutu< 3 >> A;

Here the >> is (was) interpreted as `right shift’ operator and thus this code would create a compile time error. C++11 changed this by introducing the possibility that in that case the right-shift-operator-token closes the two template angle brackets. The argument is that shift operators in template arguments are rare (which is probably true) and so this sacrifices some valid uses of that operator for the sake of causing less brain damage to C++ newbies.

Continue reading “A disimprovement observed from the outside: right angle brackets”

struct tags in C++ are even weirder

I already discussed that fact that struct tags are not identifiers in C++, and in particular that a tag name can be used as the name of a types. Today I learned that the rules for that are even more complicated than I thought. In C an identifier that has been used as a tag (for struct, union or enum) can freely be used as another identifier (variable, typedef, label). In C++ only almost: there is one sort of identifiers it can’t be used for, typedef unless it refers to the same type as the tag.

Continue readingstruct tags in C++ are even weirder”

surprising occurrence of identifiers in header files

I remember being stuck sometime ago because a system header at the time on the platform that I was using defined the undocumented identifier barrier. IIRC this even was a macro, which made the bug really hard to track, seemingly harmless code simply exploded.

Hopefully nowadays platform implementors are a bit more careful in not polluting the namespace, but still avoiding naming conflicts is not so easy. E.g inline functions are a useful tool when you want to expose small functions to all compilation units of a program. There is one pitfall, though, when it comes to naming conventions for their parameter names and local variables. If you get the name wrong, as in this simple example

inline double my_sin(double PHI) { return sinf(PHI); } 

other users of your code might encounter random problems if they define a macro PHI.
Continue reading “surprising occurrence of identifiers in header files”

struct tags are not identifiers in C++

It seems a common mistake to think that a declaration like struct toto { ... }; in C++ implies the definition of the identifier toto as a type. In reality the rule for this is much more subtle than that: it only implies some sort of implicit typedef struct toto toto;. When and if in the corresponding scope there is no other identifier of the same name the toto refers to the struct.

This comes e.g in effect when you try to use the tools from “sys/stat.h” in C++. It defines a function stat and a struct stat that coexist in the same scope.

This kind of implicit definition is a pitfall when you think of code sharing between C and C++. In the following we will consider four codes that are slight variations of the same idea.

/* Compiles in C and C++, output will usually differ for both*/
#include <stdio.h>
static char T = 'a';
int main(int argc, char** argv) {
    struct T { char X[2]; };
    printf("size of T is %zu\n", sizeof(T));
}

Here the implicit typedef in C++ comes to its full beauty: for C++ the sizeof operator refers to the type and not to the variable. Thus the output in C will be 1 (this is a char variable, not a character literal) and in C++ it will at least be 2.

/* Compiles in C and C++, output will be 1 for both*/
#include <stdio.h>
int main(int argc, char** argv) {
    static char T = 'a';
    struct T { char X[2]; };
    printf("size of T is %zu\n", sizeof(T));
}

In this example, the variable T in the function scope inhibits the lookup of T as a struct tag . So the sizeof operator will refer to the variable in both languages. Since sizeof(char) is always 1 in both cases, this is what will always be printed.

/* Compiles in C but not in C++ */
#include <stdio.h>
static char T = 'a';
int main(int argc, char** argv) {
    struct T { char X[2]; };
    printf("size of T is %zu\n", sizeof T);
}

Here T will be interpreted differently by C and C++ as in the first example. Since the keyword sizeof is only valid as a prefix expression before another expression and not in front of a type, this is invalid code in C++.

/* Compiles in C and C++, output will be 1 for both*/
#include <stdio.h>
int main(int argc, char** argv) {
    static char T = 'a';
    struct T { char X[2]; };
    printf("size of T is %zu\n", sizeof T);
}

This last example is equivalent to the second, only that omitting the parenthesis in the sizeof expression ensures that T is not taken as a type, here.

Things get even worse. If you define an object with the same name later in the code, the output changes:

/* Compiles in C and C++, output will usually differ for both*/
#include <stdio.h>
static char T = 'a';
int main(int argc, char** argv) {
  struct T { char X[2]; };
  printf("size of T is %zu\n", sizeof(T));
  static char T = 'a';
  printf("size of T is %zu\n", sizeof(T));
}

In C++ this prints two different values.

This answer on stackoverflow may give you further insight into this question.

Keyword overloading: the static keyword

The keyword static has seen a wider and wider use in the different versions of C and C++. For the use that it has now, compared to the beginnings of C, it would have been better to use another token for it something in the vein of variant or alternative_declaration… Here I try to list all the different usages of that keyword and how the introduction of static in a declaration chooses an alternative version of such a declaration.

Linkage specification

In C and C++ unless specified otherwise an object that is defined in file scope, i.e outside of any function, has static storage and external linkage. That is, space for the object is reserved at compile time and it is visible by the linker. Other code of other compilation units (.o files) may refer to such an object by its name.

The linkage changes when a declaration is prefixed with static: the object becomes internal to the compilation unit and is not accessible from other units. Different objects in different units with the same name that are declared static may co-exist even if they have different type and size.

C++ deprecates this use of static and provides the concept of an anonymous namespace as a replacement.

Storage class specification

In C and C++ unless specified otherwise a variable that is defined in function scope has automatic storage: at run time, when the function is called, memory for the variable is reserved on the execution stack. Each call of the function, in particular if is recursive, has its own new version of the variable.

The storage class changes when a declaration is prefixed with static: the object becomes static. Space for the variable is reserved at compile time and all calls of the function see the same variable, read and write into the same storage location.

Class member declaration

In C and C++ each instance of a struct (and in C++ also for a class) has its own copy of each of its declared members.

struct toto { double a; };
struct toto A;
struct toto B;

Here A.a and B.a are guaranteed to be two different objects with different location in storage

In C++ this changes when the declaration of the member is prefixed with static

struct toti { static double a; };
struct toti A;
struct toti B;

Here A.a and B.a are guaranteed to be the same object. Both refer to exactly the same storage location.

Arrays as function parameters

As one of the rough edges of the C language(s), two declaration that look exactly the same (but for the names), one in the prototype and one as a stack variable, result in the declaration of two different types of variables.

void foo(double A[10]) {
   double B[10];    
}

Inside the scope of foo, A is pointer to double and B is array of ten elements of type double. Even their sizes computed with sizeof are different.
C++ inherited this rule.

C99 complicates this matter even further by introducing the keyword static to introduce yet another variant

void foo(double A[static 10]) {
   double B[10];    
}

this doesn't change the rules on how A and B are seen from the inside, but provides an information to the caller side of how much array elements are expected. C99 specifies that it is responsibility of the caller to give at least as many elements as the array bound specifies.

For the moment gcc accepts this new syntax and simply ignores this information, so it is not C99 conforming with respect to that feature.

Obfuscation or inventing a new operator tends to operator -->

I found a really nice one in this discussion here. Basically the idea is that you may format operators a bit to show this code, which is valid C99 and C++:

for (unsigned x = 10; x --> 0;)
     printf("%u\n", x);

Ain’t that cute?

Now I was thinking that in C++ we could really obfuscate that much better by inventing some helper class. I came up with the following class Heron that `converges’, (written as aHeron --> eps) towards the square root of the initial value.

Enjoy.

#include <iostream>
#include <math.h>

using std::cout;
using std::endl;

struct Heron {
  double const a;
  double x;
  Heron(double _a) : a(_a), x(_a) { }
  operator double(void){ return x; }
};

class Heron_tmp;

inline
Heron_tmp operator--(Heron& h, int);

class Heron_tmp {
  friend class Heron;
  friend Heron_tmp operator--(Heron& h, int);
private:
  Heron* here;
  Heron_tmp(Heron& h) : here(&h) { }
public:
  inline int operator>(double err) const;
};

inline
Heron_tmp operator--(Heron& here, int) {
  return here;
}

inline
int Heron_tmp::operator>(double err) const {
  double& x = here->x;
  double const& a = here->a;
  x = (x + a/x) * 0.5;
  return fabs((x*x - a)/a) > err;
}

int main(void) {
  Heron aHeron(2.0);
  while (aHeron --> 1E-15)
    cout << (double)aHeron << endl;
}