A Macro-Based Terse Lambda Expression
If you’ve read my past post, Now I Am Become Perl, you know that my wildest dream feature for C++ would be an expression-lambda expression: That is: A concise lambda expression that consists of a single expression that is used as the return value of that lambda expression, plus some extras. Imagine a syntax like this:
void foo(range auto people) {
range auto young_people_names =
people
| filter([] = _1.age() < 15)
| transform([] = _1.name());
// ...
}
In this imagined syntax, the syntax [<capture-list>] "=" <expression>
creates an expression-lambda expression roughly equivalent to the following:
[<capture-list>] (<magic>) mutable
noexcept(noexcept(<expression>))
-> decltype(<expression>)
requires requires { <expression> }
{ return <expression>; }
For example, the above expression-lambda [] = _1.age() < 15
is roughly
equivalent to the following long-form lambda expression:
[] (auto&& _1) mutable
noexcept(noexcept(_1.age() < 15))
-> decltype(_1.age() < 15)
requires requires { _1.age() < 15; }
{ return _1.age() < 15; }
That’s quite a lot of typing saved!
Can we simulate such a feature using only a preprocessor macro? Kind of.
Note: This solution is originally based on the macro defined by Quincunx271/TerseLambda. This version improves upon it by making use of new C++20 features.
The Naive Solution
Let’s declare a very simple macro that implements something akin to an expression-lambda:
#define TerseLambda(...)
[&] (auto&& _1) mutable -> decltype(auto) { return __VA_ARGS__; }
(Note: end-of-line continuations have been omitted from these code samples to make syntax highlighting play nice. In the real implementation, all these newlines need to be escaped for the preprocessor definition.)
This is… okay, but entirely insufficient. Firstly: We are missing the
noexcept
and requires
clause. While not strictly necessary, having
our call operator constrained only to parameters for which the given expression
is valid is hugely beneficial (and sometimes mandatory) when passing such an
expression into generic APIs. This is simple enough to fix:
#define TerseLambda(...)
[&] (auto&& _1) mutable
noexcept(noexcept(__VA_ARGS__))
-> decltype(auto)
requires requires { __VA_ARGS__; }
{ return __VA_ARGS__; }
Now our generated terse expression-lambdas will have a correct noexcept
and a
requires
clause.
However, we have a major downside: we must accept exactly one argument. If
we want to make a terse lambda that accepts zero or more than one argument, we
need to get trickier. In the above sample compiler-rewrite, I wrote <magic>
in
place of the lambda expression’s parameter list. This is because we need to
somehow synthesize the placeholder arguments _1
, _2
, _3
, etc. on-demand,
depending on how many arguments the caller has provided, at the call site (not
at the lambda definition).
Variadic Terse Lambdas
It is entirely possible to declare a lambda expression with a variadic parameter set:
[](auto&&... _args) ...
But how can we make use of this to declare our placeholders? One might think that we could get away with defaulted parameters:
struct nothing_t {};
// ...
[](auto&& _1 = nothing_t{},
auto&& _2 = nothing_t{},
auto&& _3 = nothing_t{}) ...
and a placeholder for an unprovided argument will be an instance of nothing_t
.
However: This does not work. The compiler is unable to deduce the types of the
parameters from their default values. I am not sure why there is a rule that
fails this deduction, but it would be a half-solution anyway. Suppose I wanted a
terse-lambda like this:
foo(TerseLambda(bar("some string", _args...)));
Where we call bar
with "some string"
followed by every other argument
passed to the call operator. If we use explicit placeholders in the
parameter list, there is no _args
to pass in to bar()
!
To start, we can note that we never specified that the return
statement is
the only thing in the lambda body. We can declare our placeholders as
local variables!
#define TerseLambda(...)
[](auto&&... _args) mutable {
auto&& _1 = nth_arg<0>(FWD(_args)...);
auto&& _2 = nth_arg<1>(FWD(_args)...);
auto&& _3 = nth_arg<2>(FWD(_args)...);
auto&& _4 = nth_arg<3>(FWD(_args)...);
return (__VA_ARGS__);
}
In the above, within the expansion of __VA_ARGS__
in the return
statement,
the _1
, _2
, _3
, and _4
identifiers will refer to the first, second,
third, and fourth lambda parameters respectively. It takes a small bit of
code to implement nth_arg
:
struct nothing_t {};
// Base case: No argument provided
template <int>
constexpr nothing_t nth_arg() noexcept { return {}; }
template <int N, typename Head, typename... Tail>
constexpr decltype(auto)
nth_arg(Head&& head, Tail&&... tail) noexcept {
// If we are asking for the first arg, return that
if constexpr (N == 0) {
return FWD(head);
} else {
// Recurse
return nth_arg<N-1>(FWD(tail)...);
}
}
In the above, if no arguments are passed to nth_arg
, then we call the base
case and nothing_t
is returned. We’ll use nothing_t
to represent that an
argument has not been provided for the placeholder parameter.
And now, our TerseLambda
macro works! Kind of.
Missing noexcept
and requires
Obviously, our TerseLambda
macro is missing its noexcept
specifier and
its constraints. Easy enough to add, right?
#define TerseLambda(...)
[](auto&&... _args) mutable
noexcept(noexcept(__VA_ARGS__))
-> decltype(auto)
requires requires { __VA_ARGS__; }
{
auto&& _1 = nth_arg<0>(FWD(_args)...);
auto&& _2 = nth_arg<1>(FWD(_args)...);
auto&& _3 = nth_arg<2>(FWD(_args)...);
auto&& _4 = nth_arg<3>(FWD(_args)...);
return (__VA_ARGS__);
}
Not so fast!
Imagine what happens if we were to use our TerseLambda
? This:
TerseLambda(_1.name())
Becomes this:
[](auto&&... _args) mutable
noexcept(noexcept(_1.name()))
-> decltype(auto)
requires requires { _1.name(); }
{
auto&& _1 = nth_arg<0>(FWD(_args)...);
auto&& _2 = nth_arg<1>(FWD(_args)...);
auto&& _3 = nth_arg<2>(FWD(_args)...);
auto&& _4 = nth_arg<3>(FWD(_args)...);
return (_1.name());
}
See the problem? Within the noexcept()
specifier and within the requires
clause, we refer to the name _1
, but at this scope that doesn’t refer to
anything.
tl.test.cpp:23:25: error: ‘_1’ was not declared in this scope; did you mean ‘std::placeholders::_1’?
23 | auto l = TerseLambda(_1 + 2);
| ^~
It’s not entirely clear, but this error message is referring to the _1
inside
the noexcept
specifier and the requires
clause, not the actual _1
within
the lambda body.
Even if we foregoe the requires
clause from C++20 and just stick to C++17,
we’re still out-of-luck with regards to the noexcept
specification. In
C++17, the best we can do is the macro in the prior section, which is never
noexcept
and is completely unconstrained. This will work in many cases, but
can subtly break in case that generic code checks on the invocability of our
generated closure type.
auto func = TerseLambda(_1.name());
using FuncType = decltype(func);
static_assert(std::invocable<FuncType, person>); // Okay: Expected
static_assert(std::invocable<FuncType, int>); // Passes: What??
Even though calling func(12)
will be a hard compile-time error, inspecting
whether FuncType(int)
is valid will always yield true
, because the compile
will not do deep inspection into the lambda body to validate it. The compile
will only check whether it is callable with the given number of arguments and
that the function’s constraints are satisfied. In the case of a variadic
TerseLambda
with no constraints, it is always invocable with any
arguments. This is definitely not what we want!
C++20 Saves the Day
The C++20 requires
expression is a magical beast. It will solve both the
noexcept
and the function constraints. It provides us will a magic ability: We
can introduce new identifiers as if they were variables for the scope of the
requires expression. Since noexcept
and the requires
clause expect a
boolean constant expression, and the requires
-expression evaluates to a
bool
, we can use requires
in both cases.
The syntax of a requires
-expression is mostly straightforward:
requires (<parameter-list>) {
<requirement-list>
}
where <parameter-list>
is the same as with any function parameter list. We can
declare variables here that are visible in the scope of the
<requirement-list>
.
We’ll start by solving the constraints on the generated lambda expression, since it is the most straightforward.
To declare the parameters of the requires
-expression, we need to have the
types of those parameters to declare. If we use auto&&...
as the parameter
list ot the lambda expression, we can’t get at those types so easily.
Fortunately, C++20 allows us to provide a template parameter list to a lambda
expression, thus given us a name to the variadic pack of types that have been
passed to the call operator:
[&] <typename... TlArgs> (TlArgs&&...) { /* ... */ }
For reasons that will be made clear later, I will define the
requires
-expression as another macro:
#define TerseLambda_Requires(...)
requires ( /* ??? */ )
{ __VA_ARGS__; }
So what do we put in ???
? Consider that we need to have the same identifiers
available (with identical types) to the identifiers that will be visible in
the expansion of the return
statement. For our purposes, that is _args
,
_1
, _2
, _3
, and _4
. Fortunately, _args
is already visible in the
scopes that we need it. We only need to find out how to declare the placeholds
_1
, _2
etc.
#define TerseLambda_Requires(...)
requires (/* ??? */ _1,
/* ??? */ _2,
/* ??? */ _3,
/* ??? */ _4)
{ __VA_ARGS__; }
We need to know how nth_arg
is going to return within the lambda body.
Fortunately, we have decltype()
:
template <int N, typename... Args>
using nth_arg_t = decltype(nth_arg<N>(declval<Args>()...));
#define TerseLambda_Requires(...)
requires (nth_arg_t<0> _1,
nth_arg_t<1> _2,
nth_arg_t<2> _3,
nth_arg_t<3> _4)
{ __VA_ARGS__; }
That’s all there is to it!
Putting it All Together
Here is how we use TerseLambda_Requires
with our TerseLambda
macro for
both the constraints and the noexcept
specifier:
#define TerseLambda(...)
[&]<typename... TlArgs>
(TlArgs&&... args) mutable
noexcept(TerseLambda_Requires({ __VA_ARGS__ } noexcept))
-> decltype(auto)
requires TerseLambda_Requires(__VA_ARGS__)
{
auto&& _1 = nth_arg<0>(FWD(_args)...);
auto&& _2 = nth_arg<1>(FWD(_args)...);
auto&& _3 = nth_arg<2>(FWD(_args)...);
auto&& _4 = nth_arg<3>(FWD(_args)...);
return (__VA_ARGS__);
}
For TerseLambda(_1.name())
, the above requires
-clause above expands as:
requires requires(nth_arg_t<0> _1,
nth_arg_t<1> _2,
nth_arg_t<2> _3,
nth_arg_t<3> _4)
{ _1.name(); }
which will only be valid when the first argument to the call operator has a
callable .name()
method.
As for the noexcept
specifier, it expands as:
noexcept(
requires(nth_arg_t<0> _1,
nth_arg_t<1> _2,
nth_arg_t<2> _3,
nth_arg_t<3> _4)
{ { _1.name() } noexcept; }
)
Note the extra { ... } noexcept
around the return-value expression: This is a
syntax in requires
-expressions that specifies that the expression within the
braces be both valid and noexcept(true)
. If any of the requirements listed
in a requires
-expression fail, then the entire requires
-expression evaluates
to false
. Since we are using it as the boolean parameter to a noexcept
specifier, if the inner expression _1.name()
is noexcept(false)
, then the
closure type’s call operator is noexcept(false)
. Note that this evaluation
happens only at the call site:
class person {
string name() const noexcept;
};
class location {
string name() const; // Not noexcept!
};
// ...
person p;
location loc;
auto func = TerseLambda(_1.name());
// Okay:
static_assert(noexcept(func(p)));
// Also okay:
static_assert(!noexcept(func(loc))); // 'func(loc)' might throw
Because our generated closure object has proper constraints, the closure may also be used with generic APIs:
auto call_it(invocable<string> auto func) { // [1]
string s = get_string();
return func(s);
}
auto call_it(invocable<int> auto func) { // [2]
return func(44);
}
// ...
auto get_length = TerseLambda(_1.length());
auto more_than_four = TerseLambda(_1 > 4);
call_it(get_length); // Calls [1]
call_it(more_than_four); // Calls [2]
Without the constraints, the above calls to call_it()
would be ambiguous since
the lambda would be arbitrarily invocable.
Gotchas!
There are two quirks (of which I am aware) with the expression-lambda macro defined here:
Nullary Functions
If you define an expression-lambda that doesn’t use any parameters and simply returns the value of some expression:
// Creates a function that always returns 42
auto func = TerseLambda(42);
then the resulting function object is std::invocable
with anything, because
it doesn’t consult the parameters and is still variadic:
func("egg salad"); // Okay: Returns 42
This would make its usage with call_it
in the prior example ambiguous, since
it is both invocable<string>
and invocable<int>
.
If we want to define a terse lambda that simply returns an unparameterized expression, we’d be better off defining a new macro:
#define TerseLambda_Just(...)
[&] () mutable
noexcept(noexcept(__VA_ARGS__))
-> decltype(auto)
requires requires { (__VA_ARGS__); }
{ return __VA_ARGS__; }
// ...
auto func = TerseLambda_Just(42);
func(); // Okay.
func("hello"); // Invalid!
In this case, we don’t need to do any wild dance for the placeholders, since we don’t accept any arguments.
Reference Return Values
Unlike a regular lambda expression, whose return type is deduced via auto
rules, our expression-lambda deduces via decltype(auto)
. When deducing as
auto
, we strip off reference types and const
/volatile
qualifiers:
struct person {
const string& name() const;
static person at_address(address);
};
// ...
person p;
auto n1 = p.name();
decltype(auto) n2 = p.name();
In the above, n1
, deduced via auto
, will have type string
, and the return
value of p.name()
will be copy-constructed into n1
. On the other hand, n2
,
deduced with decltype(auto)
, will have type const string&
, and the return
value of p.name()
will be captured by reference.
Now observe the following:
auto name_at_address =
TerseLambda(person::at_address(_1).name());
In this code sample, calling func()
will result in undefined behavior. The
lambda body is effectively:
[](args...) -> decltype(auto) {
address& _1 = nth_arg<0>(args...);
decltype(auto) __tmp = person::at_address(_1);
decltype(auto) __ret = __tmp.name();
return __ret;
}
The rules of decltype(auto)
return-type-deduction is that the return type is
the decltype()
of the return-expression. In this case, __ret
is like n2
in
the prior example, and has a declared type of const string&
. Thus, the return
type of the call, as decltype(__ret)
, is const std::string&
.
Note that the lifetime of __ret
is equivalent to the lifetime of
__tmp.name()
, which itself is (almost certainly) tied to the lifetime of
__tmp
, which is declared as a local variable person __tmp
. Thus, the
lifetime of the returned const std::string&
is tied to the lifetime of the
local variable __tmp
, and will immediately go out of scope! We are turning a
reference to a subobject of a local variable! Bad!
The C++ specification has a special construct that would help use here:
DECAY_COPY(x)
. However, DECAY_COPY(x)
is exposition-only, and does not
actually exist in the language (yet). The semantics of DECAY_COPY(x)
is to
create a copy of the value-type of x
by passing the declared-type of x
as a
sole parameter to a copy constructor, including move constructors.
We can implement a DECAY_COPY
function template, actually:
template <typename T,
typename Ret = std::remove_cvref_t<T>>
constexpr Ret decay_copy(T&& arg) noexcept(Ret(FWD(arg))) {
return FWD(arg);
}
This function is guaranteed to return a value type (not a reference) by copying
or moving the given arg
as the first parameter of the copy/move constructor of
the underlying type of arg
.
We can use decay_copy
with our terse lambda to enforce a non-reference return
type:
auto name_at_address =
TerseLambda(decay_copy(person::at_address(_1).name()));
name_at_address(some_address()); // Okay!
Note that a future version of C++ may see a language-supported version of
DECAY_COPY
in the form of anauto(x)
expression.
Other Considerations
As some homework, consider the following:
- Our
TerseLambda
always uses a default by-reference capture. This may not be wanted. Consider how to use an extra level of indirection to allow customizing this macro’s capture list. - Consider how
TerseLambda
might behave better than a naively-hand-written lambda expression with generic APIs
Using This
While it isn’t as awesome as a language-supported version of a terse lambda syntax, I’ve found it pretty fun to use myself.
With recent updates to MSVC, this terse lambda macro now works on all major C++
compilers.
A more thoroughly written version of TerseLambda
is available in neo-fun
as NEO_TL
,
and is currently available to pull via dds
. Try it out.
See if you can break it. Send me bug reports. Have fun!