Fun with Concepts: Do You Even Lift, Bool?
Sorry to anyone expecting another novel-length blog post. This one will be “relatively short” (Read: Still very long).
Arthur O’Dwyer recently wrote a blog post about function overloading in C++. I recommend giving it a read, as it can spell out some of the pitfalls involved with providing function overloads with significantly disparate behavior. (I’ll assume that the slug “overloading-considered-harmful” is simply a tough-in-cheek nod to the infamous “considered harmful” meme.)
The example in Arthur’s post specifically involves a “refactoring gone wrong,” wherein a benign-looking refactoring of the code causes overload selection to subtly choose a different overload. Specifically, it looks something like this:
void foo(std::string); // [1]
void foo(bool); // [2]
void meow() {
foo(std::string("I am a string")); // Calls [1]
foo("lol nope"); // Calls [2] !!
}
The basic explanation is that overload resolution ranks overload candidates to determine the “best match.”
- The first call to
foo
passes astd::string
that is explicitly constructed from a character array, thus we get no surprise that we call the string overload offoo
. - The second call to
foo
passes a character array directly. There is no overload offoo
that accepts a character array, so we need to choose which of the candidates is the “best match.”- The first overload of
foo
takes astd::string
. The compiler checks if we can do a conversion from the character array tostd::string
, and finds that yes, we can by decaying the array to a pointer and passing that pointer to the constructor ofstd::string
. - The second overload of
foo
takes abool
. The compiler sees that there’s a conversion from any pointer type tobool
. The character array will eagerly decay to a pointer, and we can convert that pointer tobool
. - Both overloads are valid candidates, but which one is a better match?
- In the face of these two possible conversions, one is program-defined:
The conversion from the character pointer to a
std::string
- The other conversion is language-defined: The conversion from a pointer
to a
bool
- When given a program-defined conversion and a language-defined conversion, the language-defined conversion is a preferred.
- Thus, the call
foo("lol nope")
will callfoo(bool)
as-if we had writtenfoo(true)
.
- In the face of these two possible conversions, one is program-defined:
The conversion from the character pointer to a
- The first overload of
I’ll fall into the stereotype of the C++ programmer and blame this wat on C legacy, wherein many types happily convert between each other.
Typing
Like the foo
written above, the word “typing” has also been overloaded:
- Static typing is opposed to dynamic typing. This means that declarations have types that are fixed. C and C++ are statically typed programming languages. Other examples include Java, C#, TypeScript, and Haskell.
- Dynamic typing is the opposite of static typing, and means that the types of a declaration can vary at runtime. Examples include Python, JavaScript, Ruby, and Erlang.
- Strong typing is opposed to weak typing. It refers to a type system that strictly enforces type compatibility.
- Weak typing is opposed to strong typing, and refers to a type system that allows some intermixing and implicit converting between types.
Not So Binary
For any type system, it is generally impossible to declare “Language X is strongly typed,” because most languages fall along a spectrum between weak/strong typing and static/dynamic typing. It is important to also recognize that the relationship between being type strong versus being type static is completely orthogonal.
For example, Python is dynamically typed, but it is also very strongly typed. Python will not implicitly convert types in a “best guess” to make things work. It will instead yell at you:
v = 12
s = 'There are ' + v + ' monkeys jumping on the bed.'
results in an exception:
TypeError: must be str, not int
On the other hand, JavaScript is dynamically typed and weakly typed. Giving
the above Python code to a JavaScript interpreter results in a valid value for
s
:
"There are 12 monkeys jumping on the bed."
JavaScript will happily see you attempting to add a string to a number, and will implicitly convert that number to a string in order to perform the concatenation. This can lead to some strange (and amusing) results.
C ?
So, does C have a weak type system, or a strong type system?
As noted, there is no “either/or” answer. The following example is a hard-error:
struct foo {};
struct bar {};
void bark(struct foo p);
void meow() {
struct foo f;
struct bar b;
bark(b); /* <-- Error: Incompatible type for argument one */
}
The compiler will not convert the bar
instance into a foo
instance, as such
a mechanical conversion is nonsensical. However, if we change the parameter to
bark
to be a pointer:
struct foo {};
struct bar {};
void bark(struct foo* p);
void meow() {
struct foo f;
struct bar b;
bark(&b); /* <-- Okay! */
}
The compiler will happily convert the bar*
into a foo*
. Any decent compiler
will issue a warning, but it is otherwise required to accept the above code.
C++ ?
While C++ inherits a lot from C, it decided to stengthen its type checking, and the prior C code will actually be rejected by a C++ compiler.
However, there are a few places where C’s implicit conversions remain, and (as we saw in the first example and in Arthur’s blog post) they can still come back to bite us.
Why Pointer to Bool?
Note that the first example involved a conversion of a pointer to a bool
. This
is inherited from C, and comes from this common idiom:
void foo(struct bar* ptr) {
if (ptr) { /* Is 'ptr' a NULL pointer? */
do_something(ptr);
} else {
do_other_thing();
}
}
The “implicit bool” is such a pervasive idiom that it permeates most programming languages today. Even Python’s otherwise strong type system inherits this implicit-conversion behavior:
def foo(val: int) -> None:
if val: # Is 'val' non-zero?
do_something(val)
else:
do_other_thing()
def bar(name: str) -> None:
if name: # Is 'name' non-empty?
do_something(name)
else:
do_other_thing()
Controlling Type Conversions
C++ offers the ability to define conversions between class types and any other type by way of two special member functions, (1) the conversion constructor and (2) the conversion operator:
class widget {
// 1: Implicitly convert from 'gadget' to a 'widget'
widget(gadget);
// 2: Implicitly convert from 'widget' to a 'gadget'
operator gadget();
};
With these two functions defined, we could say there is “weak typing” between
widget
and gadget
. Weak typing is not inherently bad, but must be used with
extreme care. If you have any doubts, do not write the implicit conversion
functions.
While we have implicit conversion methods, C++ also offers us a way to provide explicit conversion methods:
class widget {
// 1: Explicitly convert from 'gadget' to a 'widget'
explicit widget(gadget);
// 2: Explicitly convert from 'widget' to a 'gadget'
explicit operator gadget();
};
The explicit variants should always be the first choice.
NOTE: Be especially aware that any constructor callable with a single argument becomes an implicit converting constructor!
class user { user(string name, optional<address> addr = nullopt); };
In the above, even though the constructor accepts two arguments, since it is callable with one argument, it will implicitly convert a
string
into auser
!I have all too often seen developers unwittingly write a class constructor accepting a single argument, not knowing just how dangerous it is.
This is how C++ has been since the beginning, and is now “widely regarded as a bad move.” If we could do it all over again, we’d drop the
explicit
specifier for animplicit
keyword that has the inverse behavior.
Building a Better bool
, using… Concepts?
C++ has a difference from many other languages in that program-defined types play in the type system almost identically to the language’s builtin types. There is no additional overhead (e.g. garbage collection, boxing, dynamic dispatch) involved unless it is requested.
We also get to define how we interconvert between other types. This leads to a
question: Can we make a better bool
?
The answer has been “yes” for a long time, but with C++ Concepts, we can write
one more concise, more simple, and better behaved than ever before. I present, a
better bool
:
struct boolean {
bool _val;
// [1]
template <convertible_to<bool> T>
explicit(!same_as<T, bool>)
constexpr boolean(T b) noexcept
: _val(b) {}
// [2]
template <constructible_from<bool> T>
explicit(!same_as<bool, T>)
constexpr operator T() const noexcept
{ return T(_val); }
// [3]
constexpr boolean operator!() const noexcept
{ return !_val; }
// [4]
constexpr boolean operator==(bool other) const noexcept
{ return bool(*this) == other; }
constexpr boolean operator!=(bool other) const noexcept
{ return bool(*this) != other; }
// [5]
constexpr auto operator<=>(bool other) const noexcept
{ return bool(*this) <=> other; }
};
Let’s walk through this thing.
1: The converting constructor:
template <convertible_to<bool> T>
explicit(!same_as<T, bool>)
constexpr boolean(T b) noexcept;
This is a function template constrained to accept any type that is convertible
to bool
, with a conditional explicit
depending on whether the argument is
a bool
. If the argument is a bool
, we allow the conversion to be implicit,
otherwise it must be an explicitly requested conversion.
This begs the question: Why not a non-template constructor with parameter type
bool
? The answer is that we would then be allowing any type that implicit
converts to bool
to implicitly convert to boolean
.
For example, if the constructor were boolean(bool b)
, then attempting to
boolean v = 12
would see the compiler simply implicitly converting the literal
12
to a bool
, then converting that bool
to a boolean
.
2: The Conversion Operator
template <constructible_from<bool> T>
explicit (!same_as<T, bool>)
constexpr operator T() const noexcept;
This member function handles requests to explicitly convert the boolean
to
another type. It also has a conditional explicit
: If the requested type is
bool
, we will allow the implicit conversion, otherwise the conversion must be
explicit.
This avoids the following issue:
void eat_cookies(int count, bool leave_crumbs);
void santa(int num_cookies) {
bool leave_crumbs = num_cookies > 4;
eat_cookies(leave_crumbs, num_cookies); // !!
}
The above code compiles without error, but is almost certainly not what we
intended. Built-in bool
will implicitly convert to int
, with value 0
for
false
and 1
for true
.
On the other hand, boolean
will fail:
void eat_cookies(int count, boolean leave_crumbs);
void santa(int num_cookies) {
boolean leave_crumbs = num_cookies > 4;
eat_cookies(leave_crumbs, num_cookies); // Error!
}
<source>:43:17: error: cannot convert 'boolean' to 'int'
43 | eat_cookies(leave_crumbs, num_cookies); // Error
| ^~~~~~~~~~~~
| |
| boolean
<source>:39:22: note: initializing argument 1 of 'void eat_cookies(int, bool)'
39 | void eat_cookies(int count, bool leave_crumbs);
| ~~~~^~~~~
<source>:43:31: error: could not convert 'num_cookies' from 'int' to 'boolean'
43 | eat_cookies(leave_crumbs, num_cookies); // Error
| ^~~~~~~~~~~
| |
| int
And we are forced to fix our code or insert a cast if it is what we really meant to do.
The templated-ness of this operator means that we can explicitly convert to
anything that is explicitly constructible from bool
. This means that a type
with an explicit constructor taking bool
as its only parameter is fair game,
and the conversion will work just as-if the constructor were given a plain
bool
.
3: Negation
constexpr boolean operator!() const noexcept;
This is a simple negation operator. It will negate the boolean
, but keep the
type as boolean
.
4: Equality Checks
constexpr boolean operator==(bool) const noexcept;
constexpr boolean operator!=(bool) const noexcept;
These define equality and inequality between boolean
objects. One may note
that we have only provided versions that accept bool
on the right-hand side.
We are able to do this for two reasons:
- A
boolean
on the right-hand side will implicitly convert tobool
, satisfyingboolean == boolean
- With C++20, the compiler is allowed to assume that
operator==
andoperator!=
are reflexive, so it is allowed to treat this as an expression with the operands reversed. Thus,bool == boolean
will also resolve withboolean == bool
.
Those aware of the operator-rewrite rules may also note that the compiler is
allowed to rewrite a != b
into !(a == b)
, meaning that we only need to
provide a single operator==
to provide all four of a == b
, b == a
,
a != b
, and b != a
(A drastic improvement from C++17 and earlier!).
However, we can’t get away with that here. The compiler will only rewrite
boolean != boolean
in terms of !(boolean == boolean)
if operator==
returns
bool
. Ours returns boolean
, which breaks that assumption. Thus, we must
provide our own operator!=
. (We could just have operator==
return plain
bool
, but I want to keep things in terms of boolean
for as much as
possible.)
5: Ordering Operator
constexpr auto operator<=>(bool other) const noexcept;
The three-way-comparison operator allows us to provide all four relational
operators (<
, >
, <=
, and >=
) with only a single member function! Mighty
convenient!
There are actually two ways we could define the comparison operator for
boolean
: The first is the one we’ve written above, and the second is to
provide each individual comparison operator individually.
The reason we might want to provide them individually is that
boolean [relop] boolean
defined as in terms of <=>
will have type bool
,
not type boolean
. Thus, comparing to boolean
objects will actually drop us
back to regular bool
. For laziness reasons, I’ve just gone with <=>
, since
it is quite rare to use comparison operators on boolean values directly.
Another Solution
I’ve often heard people use “concepts give us better error messages” as the compelling reason for the feature. I was skeptical beforehand, but now I’m definitively not in that camp. The error messages from concepts are at least as verbose as those of SFINAE-abuse, but certainly more intuitive to the average programmer.
To me, the compelling use for concepts is all about constraining overloads and specializations. Never before has it been so easy to do so.
I’ve written all about this “better bool” that doesn’t implicitly convert, but we can actually solve the initial problem from this post in a much simpler fashion, also using concepts and constraints:
void foo(std::string);
void foo(same_as<bool> auto b);
void meow() {
foo(std::string("I am a string")); // Calls [1]
foo("lol nope"); // Calls [1]
foo(true); // Calls [2]
}
Done.
Of course, this has its own downsides, as foo(same_as<bool> auto)
is now a
function template and must be defined inline. foo(boolean)
could remain
defined in another translation unit. It’s up to you to decide what’s appropriate
in your codebase.