Wowza! It’s been a while since Part 2, and so much has happened. I’m glad to be back, and I have some “unfinished business” I need to wrap up. Namely: Finish this blog series.

I know that it has been a few months, but buckle up: We’ve got a lot to cover this time around.

This post won’t have an over-arching theme, but will finish up covering all of the new syntax relevant to understanding C++ Modules. This post will cover some odds and ends that we haven’t yet poked at in prior posts.

The Module [Unit] Purview

A C++ module unit has what is known as a purview. The purview of a module unit is all tokens beginning at the module-declaration and continues to the end of the translation unit:

// <- NOT within the purview of `Mine`

export module Mine;

// <- Within the purview of `Mine`

I anticipate many readers are now saying

“Hold on… Things can appear before the module-declaration?”

The answer is yes, things can appear before this point. I personally expect usage of this feature will be fairly esoteric and limited. We’ll look at that in a later section below.

For a named module, the purview of the module is the set of purviews for each of its constituent module units. For a module with a single module unit, the module purview and the module unit purview are equivalent.

A New Type of Linkage

C++ has had two primary types of “linkage” for a long time: internal and external. The purpose of linkage is to determine what constitutes when the same name refers to the same entity:

  • Internal linkage pertains to entities which are inaccessible outside of the translation unit in which they are declared. This includes namespace-scope entities which have been declared static, and any entity declared within an anonymous namespace, variables declared with const (but not inline or volatile), and members of anonymous unions. These entities may only be named within the translation unit in which they are declared, and their definition is not visible to other translation units.
  • External linkage pertains to entities declared extern, functions, class types and their members (including static members), and variables, and all templates thereof, that are not already internal by the rules above, and all enums. An entity of this type is “the same” between translation units. That is, taking the address of a external-linkage entity will reveal the same address in all translation units. (Note: This is where we can get tripped up by ODR, when “the same” entity between translation units has differing definitions between translation units).

In addition, all external linkage entities have a language linkage, which allows the “sameness” to be used between different programming languages. The language linkage is determined by a linkage-specification of extern followed by a string literal followed by either a single declaration or a block containing a sequence of declarations, all of which inherit the language linkage of the enclosing block. The most common you’ll see is extern "C", which says that the associated names are “the same” as the ones declared within a C translation unit. There is also an extern "C++" linkage specification, but it is the default and, thus, rarely used explicitly.

You may also note that inline functions get no special treatment: They have external linkage! Taking the address of an inline function will yield the same result between all translation units.

C++ Modules introduce the new aptly named module linkage. By the definition above, you may have a good guess at what it means. If internal means that entities are “the same” within a single translation unit, and external means that they are “the same” between all translation units, then module linkage means that entities are “the same” within all translation units belonging to the module in which they are declared. Entities earn module linkage if they are not internal, not exported, and are attached to a named module.

This sounds very esoteric, so let’s pin it down with some concrete examples:

export module foo;

import :counter;

import export :arith;

// Get the value of the counter;
export int get_value() {
    return ::counter;
}
export module foo:arith;

import :counter;

// Add to the counter
export void add(int i) {
    counter += i;
}

// Subtract from the counter
export void sub(int i) {
    counter -= i;
}
module foo:counter;

// A global variable
int counter = 0;

In the above example, add, sub, and get_value all have external linkage, and counter has module linkage. There is no way to directly name and reference counter from outside of the module, but every module unit within foo has the same definition of counter.

export module bar;

class bar {};

// The bar has a counter
bar counter;

We’ve added another counter, but this time to another module bar. Despite both entities having the fully qualified name ::counter, and being accessible across translation units, they aren’t accessible to translation units that aren’t a part of the module. This gives us a good tradeoff between having a global variable that is thrown into the wild for anyone to access, and having an internal variable/class that only your single TU can access. In a way, module-linkage provides us with encapsulated globals.

If you’re familiar with ABIs, your ears might be turning at this one. With present linker technology, the only way to support module linkage is with additional name-mangling on such entities.

There is a tentative belief that such mangling will also appear on entities declared within modules with external linkage (anything exported or extern), which would have a horrible implication that moving an exported entity between modules is an ABI break, even if it is still re-exported from the same modules.

This author believes such mangling to be unnecessary. The language does not prescribe in this area, but I’m hopeful that the Ecosystem TR might be able to slap this one down before it spreads.

This author hates the trend of ABI stability, so I’ll abstain from opining (even though I just did (kind of)).

IF NDR WARNING

The phrase “ill-formed, no diagnostic required,” strikes fear into the hearts of most developers. In most cases, an “ill-formed” program is immediately obvious, and the compiler/linker will halt and tell you to fix it. For example, missing a closing brace, or declaring the same name as two different things.

With “no diagnostic required,” the compiler/linker might halt and tell you to fix it, but it might silently generate a malformed program with a subtle and hard-to-diagnose issue. It may even appear to work for a while!

C++ Modules introduce an IF NDR situation almost identical to an ODR violation, but a naive understanding of modules tells you that it should work:

export module uk;

export class bin {
    void put_refuse();
};
export module murica;

export class trash_can {
    void put_refuse();
};

export class bin {
    void store_christmas_decorations();
};

If these two modules are combined into a single program, that program becomes ill-formed, no diagnostic required. There are two conflicting definitions of ::bin!

In some cases, this is trivial to diagnose for the compiler:

import uk;
import murica;  // Hol' up.

Even though a diagnostic isn’t required, a compiler in the above situation can clearly see the two definitions of ::bin simultaneously and issue a diagnostic. It isn’t always so easy:

export module eurasia;

import uk;
export module north_america;

import murica;
export module world;

import eurasia;
import north_america;

In this example, world imports both eurasia and north_america, which both have a non-exported import of conflicting definitions of ::bin. Compiling world will not necessarily see either ::bin, as neither module re-exports the import which introduced the declaration.

On the other hand, this is explicitly banned:

export module statistics;
import uk;

export class bin { // STOP!

};

export class histogram {
    vector<bin> bins;
};

In this case, the standard requires a diagnostic for this program, as it is trivial for an implementation to see that there is already an existing ::bin. (As for why statistics is importing uk? I dunno, maybe it’s for a census.)

The Global Module

With C++ Modules, every entity must be “attached” to a single module. This leads us to the questions: “What about code not aware of modules?” and “what module is main() attached to?”

The answer to both is the global module. This is a special “implicit” module that catches all code that isn’t declared within a module.

You may see the term “named module.” This simply refers to a module with a name. The only module without a name is the global module.

Declaring a main() function at global scope attached to a named module is not allowed!

To add entities to the global module, you can simply declare them in a non-module translation unit.

To use entities from the global module… what do you do? You can’t import the global module (it’s unnamed). We’ll need to use this all the time, especially for interacting with non-module-aware libraries. Read on…

A New Preprocessor Directive

Read the entirety of this section. Do not skim the top.

C++ Modules introduces an exciting new preprocessor directive! Bet you weren’t expecting that, eh? After all, Modules are supposed to negate some uses of the preprocessor, right?

Yes, actually. Many uses of the preprocessor’s #include directive can be replaced with features introduced by C++ Modules. Nevertheless, we’ll still need the aid of the preprocessor to import non-module-aware code into our module-aware code. Should be easy, right?:

export module my_game;

#include <SDL.h>

HALT, CRIMINAL SCUM! This is horribly broken. C++ Modules is almost entirely agnostic to the existence of the preprocessor. What you’ve just done in the above code is cause every single symbol in SDL.h to become attached to my_game (with module linkage). Obviously, your module does not own the contents of SDL.h.

How would we go about accessing the content of SDL.h? We use the new preprocessor directive, of course!

export module my_game;

import <SDL.h>;

This is known as a “header-unit” import (previously called a “legacy header-unit.” The word “legacy” was dropped), and yes: this is a preprocessor directive, but it is nothing at all like #include.

You can also use double-quotes in your header-unit import:

import "my_code.h";

to facilitate the differing <> and "" lookup rules of #include.

A header-unit is created by the compiler by performing lexical translation phases 1-7 (i.e. everything except codegen), and treating the resulting code as-if it were a module in which every exportable entity is implicitly exported. (The entities are attached to the global module.)

The act of importing a header-unit is as-if importing this synthesized module. All exportable symbols from the header are made available in the importing module.

None of the above actually necessitate any usage of the preprocessor, though. The reason this is a preprocessing directive is that macros from the header become visible in the importing translation unit after the import directive is passed.

Important notes about macros with header-unit imports:

  • Ambient preprocessor state from the import-ing translation unit is not considered by the header-unit. This means that any #define macros in the importer have no effect on the content of the header-unit. The result is that we have one-way isolation of macros. Presumably, global preprocessor definitions might have an effect (think -D command-line options). (If you’re disturbed by the weasel words I’m using here: Don’t worry: So am I.)
  • You may export import header-units just like regular modules, but the macros from the header-unit are not re-exported into transitive importers: Only the regular language symbols. Only the literal import of a header-unit will expose the macros

So the answer to “How do I use macros with modules?” is here: You just import them! (There’s one other way discussed later).

I’ve Lied

Well, half-intentionally I’ve made lies. I wrote the above section before I made a discovery in the current draft with important implications. If you want to make yourself sad, read the rest of this section, otherwise: Throw away most everything in the parent section and skip to the next heading.

Header-unit-imports have an important restriction: The import of a header-unit is only allowed for what the standard deems “importable-headers.” The definition of what makes a header “importable” is entirely implementation-defined.

This gives us a bit of a pickle. Is it safe to rewrite an arbitrary #include directive to use a header-import? The answer, of course, is ABSOLUTELY NOT.

These two snippets are not equivalent:

#define _UNICODE
#include <windows.h>
#define _UNICODE
import <windows.h>;

Remember: The preprocessor state before import has no effect on the contents of the header unit. Those who know windows.h know that it does some wild changes based on the presence of _UNICODE as a macro. Thus, we cannot safely perform this rewrite without repercussions.

In fact, this is the basis for what most consider “importable” to mean: Safe from being affected by ambient preprocessor state.

Pop-quiz, is this an importable header?

#include <asdfhucini24nyuasdfuybukvbysjfdahjbasdfnvurei.h>

You can’t possibly know, and the nonsensical name is intentional: This is how the compiler sees user code. It does not speak a human language. It cannot know the semantics of <asdfhucini24nyuasdfuybukvbysjfdahjbasdfnvurei.h>, thus the header cannot be arbitrarily considered an “importable” header.

The current trend is that importable headers must be explicitly enumerated in a resource that the compiler may consult to discover which headers are candidates for import and #include-rewriting. This means you probably cannot “just import <SDL.h>,” but that you must inform the compiler that such an import is safe. On the other hand, perhaps the presence of import <SDL.h> is enough to force the compiler to deem the header “importable,” and perhaps not. Only time will tell.

Do not worry: We are not out of hope. We have another option.

The Global Module Fragment

Congratulations on making it thus far! We’re really getting deep into the nitty-gritty now.

So, we know that we cannot import <windows.h> safely without some extra work, and we can’t #include <windows.h> within the purview of a module unit, but we still want to be able to use it within a module. We have one last tool to come to our aide, and it’s a slippery and weird-looking beast: The global module fragment. It looks like this:

module;
// stuff ... [1]
module foo;
// module purview... [2]

Remember that things can appear before the module-declaration? Well, that’s what you have here. The two magic tokens module followed by ; tell the compiler that it’s about to compile a module unit, but that we first have some non-module code we need to introduce.

The “stuff” in [1] is the global module fragment, and anything in the translation unit that is declared/defined here is attached to the global module, and not to the module of the containing module unit.

There is one gigantic, enormous, immovable caveat that must be taken into consideration: Prior to preprocessor execution (translation phase 4), only preprocessor directives may appear in the global fragment. That is: You cannot write anything between module; and the module-delcaration except for preprocessing directives. Anything that you need to declare/define in the fragment must be accessed via an #include directive

This is not allowed:

// bar.cpp
module;

extern void foo();  // ILLEGAL: Not a preprocessor directive!

export module bar;

void call_foo() {
    foo();
}

However, this is valid:

// foo.h
extern void foo();
module;

#include "foo.h"  // Okay: Declares `::foo()`

export module bar;

void call_foo() {
    foo(); // Okay: `foo` was declared in the global module fragment.
}

So, if we want to use <windows.h> in our module unit, we can put it in the global fragment:

module;

#include <windows.h>

export module bar;

// Do Win32 stuff...

“Discarded” Declarations

C++ modules defines a term decl-reachable, which declares a relationship between two declarations: A declaration D may or may not be decl-reachable from another declaration S (either of which may be definition). The rules of being decl-reachable are somewhat complex, and you can read about them here, but it is sufficient to note that most uses of a declaration D from within S cause D to be decl-reachable, and that being decl-reachable is fully transitive. That is:

  1. If A is decl-reachable from B, and
  2. B is decl-reachable from C, then
  3. A is decl-reachable from C.

An implemention will discard declarations that appear in the global module fragment if-and-only-if those declarations are not decl-reachable within the module purview.

What does this mean for you, the programmer? Most importantly: Discarded declarations are neither reachable nor visible outside of the module unit. This means that the semantic properties and visibility to any entity declared within the global module fragment is propagated to importers of the containing module unit if-and-only-if that module unit makes use of such a declaration in a way that creates decl-reachability from any of that module’s declarations.

There is one very important case where a declaration is not decl-reachable that you would otherwise expect to work: If the declared entity is used in a way that it is a candidate in an overload set of a dependent expression within a function or class template.

This has some potentially goofy quirks:

// foo.hpp
template <typename T>
void do_something(T val) {
    // ...
}
module;
#include "foo.hpp"
export module acme;

template <typename T>
export void frombulate(T item) {
    do_something(item);
}

In this case:

  1. do_something is used as a name in a dependent expression within frombulate.
  2. It cannot be resolved by the first phase of name-lookup.
  3. We cannot prove that the do_something from foo.hpp is used by frombulate.
  4. It is not decl-reachable from frombulate.
  5. It is not decl-reachable within acme, and therefore
  6. do_something from foo.hpp will be discarded!

This means:

import acme;

int main() {
    frombulate(42); // ERROR: No matching overload of `do_something`!
}

Something strange happens when we tweak our module, though:

module;
#include "foo.hpp"
export module acme;

template <typename T>
export void frombulate(T item) {
    do_something(item);
}

export void use_it() {
    frombulate(true);
}

Without making any change to foo.hpp nor main.cpp, our program now compiles and works! What’s going on??

The answer is that use_it, calls frombulate, which is resolved immediately as a specialization frombulate<bool>. This concrete specialization allows the compiler to now perform the second phase of name lookup for frombulate<bool> within the context of the module unit, and it discovers a usage of do_something(bool) within the body of frombulate<bool>, which is no longer a dependent expression. Thus, do_something is now decl-reachable from frombulate, and therefore decl-reachable from acme, and it is no longer allowed to be discarded.

This “spooky action at a distance” is admittedly very contrived, but there is a 99% chance that someone will unwittingly stumble upon this quirk within the first four minutes of the release of a full C++20 module compiler.

The Last New Syntax: The private Module Fragment

You may remember from the first post a note that a module partition may not be named private. Well, now we finally get to talk about why.

There is a special syntax that denotes an aspect of the module purview called the private module fragment. In total, a module unit, with all the wizbang features, looks something like this:

// [The global module fragment - optional]
module;
#include <stuff.hpp>

// [The module preamble]
export module foo:bar;

export import widgets;
import gadgets;

// [The module interface - optional]
export void do_stuff();

// [The private module fragment - optional]
module :private;

void do_stuff() {
    std::cout << "Howdy!\n";
}

// [The end]

We’ve always wanted the benefit of keeping the interface and implementation separate (That is: keeping compile times low and hiding implementation from our consumers by reducing the propagation of implementation details into downstream users who only need to see the interface), but we don’t like to pay the price of having multiple source files to define a single logical component. Is there a way we can have the benefits of faster incremental builds without having to juggle multiple source files?

Let's do a pro-gamer move

This is where the private module fragment comes in. The way it is specified in the standard document does not mention its purpose, but rather emphasizes dozens of restrictions on code and the way it interacts with module :private, and it is apparent what its purpose is: to provide a separation of interface and implementation in a way that we can provide them together in a single source file without exposing the implementation details in the interface.

In essence, the restrictions on module :private; are specifically meant to prohibit code appearing in the fragment from having any apparent effect on the interface of the containing module. Under no circumstances should modification of the content of module :private; necessitate a re-translation of the importers of the containing module. Another restriction: If a module unit contains a private module fragment, that module unit must be the primary module interface unit, and there should be no other module units in that module. That is: A module which uses the private fragment must be a single-file module.

The necessary restrictions on the private fragment are numerous and spread all across the specification, so I won’t enumerate them all here, but a good rule of thumb is that anything that can modify a module’s interface is not permitted within the private fragment.

What Makes the “Interface” of a Module?

If you think about the very fact the the private module fragment even exists, you may be asking yourself another question: Why is it necessary at all??

Consider this simple module:

export module greet;

import <iostream>;

export void do_greet(int n) {
    while (n--) {
        std::cout << "Hello, world!\n";
    }
}

and a simple program:

// main.cpp
import greet;

int main(int argc, char**) {
    do_greet(argc);
}

When we compile, link, and run this program, it will print Hello, world! to standard-output as many times as command line argument we give it plus one.

But what if we want to print to standard-error? Let’s change do_greet:

export module greet;

import <iostream>;

export void do_greet(int n) {
    while (n--) {
        std::cerr << "Hello, world!\n";
    }
}

Now ask yourself a question: Do we need to recompile our main.cpp? Of course not! Right?

… Right?

It depends on which compiler you ask, because it depends if the compiler will propagate the body of do_greet into module importers. I doubt it controversial to say that this is a very surprising behavior. After all: C++ modules are supposed to help us encapsulate!

I think it fair to say that we probably don’t want the body of do_greet to be propagated to downstream users, and there are even cases where it is impossible for do_greet to be propagated: If do_greet uses any entity with internal linkage, the body thereof cannot be exposed outside of the module. I feel confident that implementations will converge on not propagating the body of functions, and thus improving the performance of incremental compilation.

We will finally be able to define our functions at the declaration, and not worry about massive compile times! Hooray!

Except… we won’t.

inline Ruins the Fun

Update:

This section deals with how the inline keyword affects the interface of a module. It assumes that in-class definitions of member functions are implicitly inline, which was true until the adoption of P1779, which adds an explicit exception. As such, inline is no longer implied on the declaration of member function defined within the body of a class when that class is declared within a module unit.

If we redefine do_greet as such:

export module greet;

import <iostream>;

export inline void do_greet(int n) {  // <-- Added `inline`
    while (n--) {
        std::cerr << "Hello, world!\n";
    }
}

It should be no surprise that we now cannot change the body of do_greet without affecting our importers, as the body of the function is now part of its interface.

The inline keyword is used to mean “put the body of the function into callers.”

Except for when it’s not.

It’s been well-known for many years that inline is merely a suggestion to a compiler to embed the body of the function into callers, but under the as-if rule an optimizer can inline functions as it pleases.

The effect of inline on the inliner is hotly debated, but completely irrelevant to this post. What I want to talk about is a sad story regarding a side-effect of the inline keyword.

Because a function declared as inline might be inlined into its callers, the definition of the function must be visible within every translation unit in which it is used, but this goes against another rule: ODR. inline functions have external linkage, and this means that every translation unit in a program must agree on the definition, which includes the address of the function itself.

How can we consolidate the requirement that every translation unit contain the definition of an inline function, while also satisfying that there only be one definition in a program?

The answer is: Don’t even try. Just punch a hole in the rules that permits inline functions to a definition in multiple translation units, provided all definitions are token-wise identical.

In practice, this is implemented by annotating the generated symbol as “weak” on ELF or “selectany” on COFF with MSVC. When the linker merges translation units, it will assume that all definitions are equivalent, and throw away all except a single definition.

This is actually a pretty neat trick. In fact, it was so neat that we wanted to be able to do this with more entities:

class Foo {
    // ...
public:
    Foo() = default;
};

In here, we have a defaulted definition of the constructor for Foo, and this definition is included in every translation unit that sees the definition of Foo itself.

class Foo {
    string _name;
public:
    void say_something() const {
        std::cout << "My name is " << _name << '\n';
    }
};

And here, we have an in-class definition of the function Foo::say_something. Just like with the defaulted constructor, the definition thereof will be included within every translation unit that sees Foo.

But we didn’t declare them inline, and we still have to satisfy ODR. Hmm… How can we fix this?

Of course! Let’s just implicitly slap inline on there! Every in-class definition of a member function is implicitly inline! [Edit: As of P1779, inline is no longer implicit for in-class member function definitions within a module unit.]

In fact inline has taken on this role so much that we grant these semantics to variables using the inline keyword. In this case, the inline has nothing at all to do with inlining code, and is purely a directive to the linker that we want to allow multiple definitions of the variable.

How does this relate to modules, then?

Recall that the body of an inline function is part of the interface of that function. Since in-class definitions of member functions are implicitly inline, the in-class definitions of member functions are part of a module unit’s interface.

export module acme;

export class Widget {
    int _age = 0;

private:
    void _activate() {
        std::cout << "Bzzt!\n";
        ++_age;
    }

    void _explode();

public:
    void use_it() {
        if (_age > 500) {
            _explode();
        } else {
            _activate();
        }
    }
};

module :private;

void Widget::_explode() {
    // ...
}

Note:

The below two paragraphs are no longer correct. Refer to P1779.

In this module, the body of Widget::_explode is hidden from importers, while the body of Widget::_activate and Widget::use_it are visible to users. Any modifications to _activate or use_it necessitate recompilation of anyone that imports acme. Yes: Even though _activate is private and cannot even be referenced by users, it’s body is also part of the module interface, since it is declared as exported and (implicitly) inline.

Only _explode, whose definition is provided out-of-line and in the private fragment, is safe to modify without triggering a cascading re-compile. Placing the definition of the method in the private fragment is the closest we can get to keeping the declaration and definition together without causing cascading recompilation when the body is modified.

More inline Goofs

You may remember my frightening side-note about “linkage promotion.” You’ll be glad to hear that linkage promotion is a thing of the past, and it is now a hard-error to use an entity of internal linkage in a way that its definition must be exposed to importers.

Note:

This example is out-dated after the adoption of P1779. While it is true that internal-linkage entities must not be used from inline functions, in-class member function definitions are no longer implicitly inline.

Unfortunately, one such illegal-usage is usage within an inline function. Since in-class definitions are implicitly inline, this means that using an entity of internal linkage is illegal within the body of an in-class member function definition:

export module acme;

// Internal-linkage:
static const char* greeting = "Go go gadget: Compiler error!";

// Exported class
export class Gadget {
public:
    // implicit `export inline` function:
    void boom() {
        puts(greeting); // ERROR: Usage of internal-linkage entity
                        // within exported inline function!
    }
};

The fix:

export module acme;

// Internal-linkage:
static const char* greeting = "Go go gadget: Compiler error!";

// Exported class
export class Gadget {
public:
    // implicit `export` function (not `inline`):
    void boom();
};

// Out-of-line definition:
void Gadget::boom() {
    puts(greeting); // Okay
}

That’s All for Now

“For now?” I hear you ask. Yes, there’s more to talk about! Fortunately, we’ve covered all of the new syntax and the semantics of that syntax, but we haven’t addressed some of the effects modules play on the semantics of existing syntax, that is primarily: Name and overload resolution.

In the next post, I expect to finish talking about what modules are, but there will be a lengthy post dedicated to what modules are not.

Hopefully it won’t take so long to get to part four.

Stay tuned for more!