Jens Gustedt's Blog

July 4, 2015

Modular C

Filed under: C11, Modular C — Jens Gustedt @ 20:41

Since decades, C is one of the most widely used programming languages, and is used successfully for large software projects that are ubiquitous in modern computing devices of all scales. For many programmers, software projects and commercial enterprises C has advantages (relative simplicity, faithfulness to modern architectures, backward and forward compatibility) that largely outweigh its shortcomings. Among these shortcomings, is a lack of two important closely related features: modularity and reusability. C misses to encapsulate different translation units (TU) properly: all symbols that are part of the interface of a software unit such as functions are shared between all TU that are linked together into an executable.


The common practice to cope with that difficulty is the introduction of naming conventions. Usually software units (modules) are attributed a name prefix that is used for all data types and functions that constitute the programmable interface (API). Often such naming conventions (and more generally coding styles) are perceived as a burden. They require a lot of self-discipline and experience, and C is missing features that would support or ease their application.

C needs a specific approach

Modular C is a new approach that takes up at that point and proposes to fill the gap. It adds one feature (composed identifiers) to the core language and operates with a handful of directives to glue different modules into one project, providing modularity and reusability.

Modular C is thought to be simple and to conserve most features of the core language and of the habits of its community. It should not develop into a new language of its own, but remain a shallow addition on top of C. Hopefully it would be possible to add it as an optional feature to the standard.

All is about naming

Already C’s handling of its own library API already shows all weaknesses of the approach:

  • It is intrusive: naming choices for the C library strongly influence possible choices for other projects.
  • It is inconsistent: the naming convention shows a history of random choices.
  • It is ever-growing: every version adds new constraints.
  • It is incomplete: struct field names or parameters of inline functions are not “officially” reserved

A typical example when using composed identifiers, and some directives to import (and abbreviate) other modules, would look like this:

#pragma CMOD separator ➕
#pragma CMOD module head   = proj➕structs➕list
#pragma CMOD import elem   = proj➕structs➕element
#pragma CMOD import io     = C➕io
#pragma CMOD import printf = C➕io➕printf

/* The following declarations are exported */
#pragma CMOD declaration

/* Exports proj➕structs➕list */
struct head {
  elem* first;
  elem* last;
};

/* From here on, only exported if external linkage. */
#pragma CMOD definition

void say_hello(void){
  io➕puts("Hello world");
}
static unsigned count;
static void say_goodbye(void){
  printf("on exit we see %u", count);
}
head* init(head* h) {
  if (h) *h = (head){ 0 };
  return h;
}
/* Exports proj➕structs➕list➕top, no conflict with ctype.h */
double top(head* h) {
  return h->first.val;
}

As you can see the main part of this is still “normal” C, only the interpretation and visibility of some identifiers change.
There is a much longer story to tell about the features that come almost free when introducing this naming and import scheme. If you want to know more about Modern C you should read on with the research report that I have written about it and then just try the reference implementation:

https://scm.gforge.inria.fr/anonscm/git/cmod/cmod.git

There are still a lot of bugs and missing features, please help me to find them. Below you find a list of the advantages that you might get in return.

Modular C features:

Encapsulation: We ensure encapsulation by introducing a unique composed module name for each TU that is used to prefix identifiers for export. This creates unique global identifiers that will never enter in conflict with any identifier of another module, or with local identifiers of its own, such as function or macro parameters or block scope variables. Thereby by default the import of a module will not pollute the name space of the importer nor interfere with parameter lists.

Declaration by definition: Generally, any identifier in a module will be defined (and thereby declared) exactly once. No additional declarations in header files or forward declarations for struct types are necessary.

Brevity: An abbreviation feature for module names systematically avoids the use of long prefixed identifiers as they are necessary for usual naming conventions.

Completeness: The naming scheme using composed identifiers applies to all file scope identifiers, that are objects, functions, enumeration constants, types and macros.

Separation: Implementation and interface of a module are mostly specified through standard language features. The separation between the two is oriented along the C notion of external versus internal linkage. For an inline function, the module that defines it also provides its “instantiation”. Type declarations and macro definitions that are to be exported have to be placed in code sections that are identified with a declaration directive.

Code Reuse: We export functionality to other modules in two different ways:

  • by interface definition and declaration as described above, similar to what is usually provided through a C header file,
  • by sharing of code snippets, similar to X macros or C++’s templates. The later allows to create parameterized data structures or functions easily.

Acyclic dependency: Import of modules is not from source but uses a compiled object file. This enforces that the import relation between modules defines a directed acyclic graph. By that it can be updated automatically by tools such as POSIX’ make and we are able to provide infrastructure for orderly startup and shutdown of modules according to the import dependency.

Exchangeability: The abbreviation feature allows easy interchange of software components that fulfill the same interface contract.

Optimization: Our approach allows for the full set of optimizations that the C compiler provides. It eases the use of inline functions and can be made compatible with link time optimization.

C library structure: We provide a more comprehensive and structured approach to the C library.

Extensibility: Our approach doesn’t interfere with other extensions of C, such as OpenMP or OpenCL.

Migration path: We propose a migration path for existing software to the module model. Legacy interfaces through traditional header files can still be used for external users of modules.

Advertisements

4 Comments

  1. What license will the code referenced above in scm be (eventually) released with? I find this work interesting but the “all rights reserved” without a license for some of it is a detractor. Perhaps I missed it? Thanks for publishing what you have.

    Comment by Joe — July 20, 2015 @ 04:22

  2. The idea is to have an open licence for it, something that corresponds to the CC for my book where all of this is a part of. To be honest, I am a bit lost on which of all these different license to choose.

    Comment by Jens Gustedt — July 20, 2015 @ 05:55

    • Thanks for your reply. Simple, permissive, compatible and mainstream as (my own preferred) selectors leads you to only a handful of licenses that you can then throw a dart at–but you may have other factors to consider.

      Comment by Joe — July 20, 2015 @ 09:18

  3. http://www.gnu.org/licenses/license-list.en.html

    If you do choose a permissive license, please consider choosing one that’s compatible with the GNU GPL. 😉

    Comment by aj — July 27, 2015 @ 21:43


RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Create a free website or blog at WordPress.com.