View SlidesOn YouTube

"New" Features in C

Dan Saks at NDC TechTown 2019

A non-exhaustive overview of what has changed since 1989 when ANSI approved C89.

C Standards

1983 C Standards commitee created.
1989 ANSI C89, the first USA Standard.
1990 ISO C90, the first international standard.
Identical to the ANSI version. “C90” is often used to refer to both C89 and C90.
1999 ISO C99 (ISO/IEC 9899:1999).
2011 ISO C11 (ISO/IEC 9899:2011)
2018 ISO C18 (ISO/IEC 9899:2018).
Said to be identical to C11 except wording.

The latest publicly available C standard from 2007 is ISO/IEC 9899:TC3, available from the C working group site. The latest ISO specification, ISO/IEC 9899:2018, however is behind ISO’s paywall for €181. I think it’s a disgrace for such a widely used programming language to have its official specification paywalled, and rather ironic given how much of the world’s open-source software is in C. The GNU project gave people a free compiler, GCC, 32 years ago. Perhaps it’s time to give people free C, too. Fortunately the C working group at one point distributed the working draft for ISO/IEC 9899:2017. Thanks to Archive.org, it’s still available. It just doesn’t seem to be “officially available”, nor is it the actual final standard.

I see similarities between paywalling standards and paywalling scientific literature. The latter practice is fortunately starting to change, thanks to civil disobedience and the technical help of Sci-Hub for providing people access to research they (even if partly) funded through tax. I’m imagining something like this ought to occur in the standards world, too. There’s little ethical justification for locking up standards that are based partly on publicly funded work, or worse yet, referred to in laws.

Identifying the Standard Version

C90 introduced the __STDC__ macro to identify a standard compliant C90 compiler. However, as __STDC__ was only defined to return 1, it couldn’t later be used to distinguish between C90 and C99. For that, __STDC_VERSION__ was added. __STDC_VERSION__ was defined to return an integer representing the year and month of the standard — 201710 for C18, for example.

Identifiers

The C standard is said to consider not breaking existing code as far more important than breaking existing implementations. Newer standards and introduced keywords therefore may not conflict with existing code.

<stdio.h> declares a series of public identifiers:

EOF Macro
FILE Type
printf Function
stdout “object”

Behind the scenes, they’re implemented in terms of undocumented identifiers. As an example, FILE in GCC v9 is implemented in terms of a structure named _IO_FILE:

struct _IO_FILE;
typedef struct _IO_FILE FILE;

Another source of undocumented identifiers are header guards, such as those used to implement the standard’s requirement that any standard header may be included (e.g. #include <stdio.h>) more than once. They way these are often implemented is through a preprocessor variable at the top of the header.

#ifndef _STDIO_H
#define _STDIO_H 1
// ...
#endif

To ensure the structure behind FILE, _iobuf, or the header-guard, _STDIO_H, don’t conflict with people’s code, the C standard has a concept of reserved identifiers.

  1. Keywords are forbidden everywhere.
    These would be, among a few others:

    • auto
    • break
    • case
    • char
    • const
    • continue
    • default
    • do
    • double
    • else
    • enum
    • extern
    • float
    • for
    • goto
    • if
    • inline
    • int
    • long
    • register
    • restrict
    • return
    • short
    • signed
    • sizeof
    • static
    • struct
    • switch
    • typedef
    • union
    • unsigned
    • void
    • volatile
    • while
  2. Identifiers beginning with two underscores are reserved everywhere.

    struct foo { double __bar; };
    
  3. Identifiers beginning with a single underscore and a lowercase character are reserved only at the global scope.
    They are therefore permitted to be used in structures or variables:

    struct foo { double _bar; };
    int main() { int _foo; }
    
  4. Identifiers beginning with an underscore and an uppercase character are again reserved everywhere.

    struct foo { double _Bar; };
    

Here’s how this played out in C99, which introduced the boolean type. C99 couldn’t break existing code, therefore couldn’t also introduce the bool keyword like C++ had. They could only use one of the reserved naming conventions above. They ended up adding _Bool as the type, while adding the familiar bool and true/false words as macros in a separate file, <stdbool.h>:

#define bool _Bool
#define true 1
#define false 0

As defining them out of the box could interfere with existing code, <stdbool.h> needs to be included explicitly.

Dan explains that C++ needed a separate boolean type immediately for its overloading. A mere type definition or type alias wouldn’t have been different in the eyes of the compiler for overloading to work.

C99 Features

C99 added // comments.

Integer Types

C has implementation-defined sizes for integer types, which could be an annoyance to people that need to work with specific sizes. For a long time people were working around that with compiler and system specific type definitions, such as:

typedef char int8;
typedef unsigned long uint32;

C99 finally standardized these and defined them in <stdint.h>. They take the form of int8_t, int16_t, uint32_t and so on to cover all 8–64bit variants. Note however that the standard doesn’t require that you get all of the integer sizes, only that those you get follow the pattern above. As C was designed for a wide variety of architectures, some of those may not have small word sizes. By definition a byte in C is the smallest addressable unit. Dan mentioned digital signal processors that could have 32-bit word-sizes.

Aside from exact-size integers, C99 also defined minimum-size integers: int_least16_t and uint_least16_t et alii. Be careful as arithmetic, when depending on two’s-complement or overflow aspects, may not work as expected because the compiler is free to choose larger integers.

There’s also a fast-integer concept with int_fast16_t and others, which you can use when you need N-bits to do arithmetic and prefer to get a size where that is the fastest. As with minimum-size integers, some architectures may choose larger sizes, like ARM, where 32-bit arithmetic is faster than 16-bit or 8-bit because smaller sizes require conversion penalties. Fast-integer could be useful for loop iterator variables.

Finally there’s a greatest-size integer type intmax_t and uintmax_t.

Somewhat related, an integer types capable of containing pointers to objects (not to functions) were also introduced: intptr_t and uintptr_t. Where regular pointers in C only support addition and subtraction, these support multiplication and division. They are, however, optional, as there could be systems where pointers are not representable as integers. Ivan Godard’s Mill CPU would be an example. Ivan talked about the architecture of the Mill CPU at an LLVM meetup in 2015.

Long Long Integers

In addition to the fixed-size integer types, C99 added long long. It wasn’t just a task to add the type, the supporting library functions required additions too. Namely, atoll, for “ascii to long long” and llabs for getting the absolute value of a long long. printf also needed a %lld format string.

Complex Numbers

C99 also added support for both complex numbers (x + yi) with _Complex and imaginary numbers with _Imaginary. As before, nicer names are behind a header file:

#include <complex.h>

double complex foo = 13 + 37 * I;

C does implicit conversions between regular integers and complex integers by either assuming the imaginary part is zero or by discarding the imaginary part. There are also creal and cimag functions to get the real and imaginary parts.

Complex number types are mandatory for compilers, but the lone `imaginary part is optional.

Function Name

C99 introduced __func__ that gets you the name of the local function for debugging and function call tracing. It’s use is likely to get you a statically allocated null-terminated string:

void foo() {
  __func__; // => static char const __func__[] = "foo";
}

Declaration Ordering

C90 didn’t allow declarations after the first statement in a block, i.e a function call. It required all definitions to happen at the top, even if initialization (first assignment) happened a lot later. C99 lets you mix declarations and statements:

void foo() {
  int foo;
  bar();
  int baz;
}

This aligns with good programming practice on reducing the scope of used variables by permitting their declaration to happen closer to their place of use.

It was natural to also permit delayed declaration for loop variables, so it was no longer necessary to define i at the top of the function with a loop inside:

for (int i = 0; i < n; ++i)
  // ....
}

C99 also scopes i in the above example to within the loop. A later loop can reuse the same name.

Inline Functions

C99 also introduced the inline keyword for functions, permitting people to migrate from error-prone macros to semantically true functions with none of the overhead of function calls.

inline int max(int a, int b) { return a > b ? a : b; }

Another positive of native inline functions is that you can take the address of the function if necessary.

Inline functions have to be defined, not only declared, in an header file for the compiler to be able to inline their contents. However that raises a problem — the inline function could end up being defined in multiple object files, such as when you take the function’s address, and the linker may complain that it’s seeing duplicates. Whereas C++ decided to have the linker ignore identical duplicates, C has you pick a place for the implementation to go if you need to refer to its non-inline variant. You do that by declaring, but not defining, the function in a file of your choice.

#include "max.h"
int max(int, int);

You could prefix the declaration above with extern or extern inline, but not with inline.

Dan mentions that GCC may support C++-like inline function semantics when compiling C by requiring you to pick a place for the inline function. Depending on those features will then no longer be standard C.

Compound Literals

If you made a rational number type and want to use it inline, C90 requires you to first declarate it as a variable, initialize it and only then can you use it:

struct rational { long numerator, denominator; };
struct rational half = {1, 2};
struct rational another_half = {2, 4};
is_equal(half, another_half);

C99 introduces compound literals, where you can “cast” a compound literal to your structure:

is_equal(half, (rational) {2, 4});

This also works for arrays:

memcmp(some_array, (int[]) {1, 2, 3, 4}, 4);

Relatedly, C90 only permitted initializing the first member of the union:

union floaty { int i; float f; };
union floaty n = {42};
union floaty n = {13.37};

Even if you assigned a float value like in the last line, at best you got a conversion warning, but it still initialized the integer. C99 introduced designated initializers, which let you specify which member of the union you’re initializing:

union floaty n = {.i = 42};
union floaty n = {.f = 13.37};

This also works when initializing regular structure members:

struct date { int year; int month; int day; };
struct date today = {.year = 2015, .month = 6, .day = 18};

And for arrays, which lets you specify which of the elements are non-zero:

struct date { int year; int month; int day; };
int numbers[10] = {[1] = 10, [5] = 20};

Variable-Length Arrays

In C90 there’s no way to define an array with a dynamic number of elements. That becomes a problem when you’ve got a function that takes the number of elements and needs to allocate it. The only option was manual memory allocation and later freeing. That was especially error-prone when multi-dimension arrays were used:

void iterate(size_t rows, size_t columns) {
  int* matrix = malloc(rows * columns * sizeof(int));
  // ...
  free(matrix);
}

You couldn’t use multiple brackets to refer to nested elements (matrix[1][2]), and had to do pointer arithmetic — matrix[columns * i + j].

C99 introduced variable-length arrays with non-constant dimensions, that are both cleaned up automatically and support regular array dereferencing:

void iterate(size_t rows, size_t columns) {
  int* matrix[rows][columns];
  // ...
}

Variable-length arrays also expand the definitions of sizeof to become a run-time computation when the array dimensions are not known statically.

As only GCC added support for VLAs, the standard committee reclassified them as optional for C11.

VLAs also carried over to be used within structures, permitting arrays of dynamic size:

struct packet {
  char header[10];
  char data[];
};

Allocating space with malloc(sizeof(packet) + n * sizeof(char)) would then do the right thing — take into account the header and possible padding before the dynamically sized data array.

Restricted Pointers

The possibility of aliased pointers, that is two or more pointers pointing to the same data, preclude a number of optimizations, such as reordering memory accesses or the use of vector instructions on supported CPUs. C99 added a restrict qualifier for pointers to hint that there are no other pointers pointing to the same data.

C99’s memcpy was redefined with restrict qualifiers:

void* memcpy(void* restrict dest, const void* restrict source, size_t count);

memcpy therefore expects neither the source nor destination overlap. If they do turn out to overlap, behavior is undefined. memmove, on the other hand, is defined to work for overlapping ranges and does not declare its pointer arguments to be restricted:

void* memmove(void* dest, const void* source, size_t count);