Now I Am Become Perl —
Destroyer of verbosity.
A Defence of Terseness
Perl gets picked on for its syntax. It is able to represent very complex programs with minimalist tokens. A jumble of punctuation can serve to represent an intricate program. This is trivial terseness in comparison to programming languages like APL (or its later ASCII-suitable descendants, such as J), where not a single character is wasted.
The Learning Curb
Something can be said for terseness. Rust, having chosen fn
to denote
functions, seems to have hit a balance in that regard. There is very little
confusion over what fn
means these days, and a simple explanation can
immediately alleviate any confusion. Don’t confuse initial confusion with
permanent confusion. Once you get over that initial “curb” of confusion, we
don’t have to worry any more.
Foreign != Confusing
You’ll also find when encountering a new syntax that you will immediately not understand, and instead wish for something much simpler. Non-C++ programers, for example, will raise an eyebrow at the following snippet:
[&, =foo](auto&& item) mutable -> int { return item + foo.bar(something); }
I remember my first encounter with C++ lambdas, and I absolutely hated the syntax. It was foreign and unfamiliar, but other than that, my complaints stopped. I could have said “This is confusing,” but after having written C++ lambda expressions for years the above syntax has become second nature and very intuitive. Do not confuse familiarity with simplicity.
Explicit is Better than Implicit…
…except when it needlessly verbose.
Consider the following code:
template <typename T, typename U, int N>
class some_class {};
Pretty straightforward, right?
Now consider this:
class<T, U, int N> some_class {};
Whoa… that’s not C++!
Sure, but it could be, if someone were convinced enough that it warranted a proposal, but I doubt it will happen any time soon.
So, you know it isn’t valid C++, but do you know what the code means? I’d wager that the second example is quite clear to almost all readers. It’s semantically identical to the former example, but significantly terser. It’s visually distinct from any existing C++ construct, yet when shown the two “equivalent” code samples side-by-side you can immediately cross-correlate them to understand what I’m trying to convey.
There’s a lot of bemoaning the verbosity of C++ class templates, especially in
comparison to the syntax of generics in other languages. While they don’t map
identically, a lot of the template
syntax is visual noise that was inserted
to be “explicit” about what was going on, so as not to confuse a reader that
didn’t understand how template syntax works.
The template
syntax, despite being an expert-friendly feature, uses a
beginner-friendly syntax. As someone who writes a lot of C++ templates, I’ve
often wished for terseness in this regard.
foo
and bar
considered harmful.
Consider this:
auto foo = frombulate();
std::sort(
foo.begin(),
foo.end(),
[](auto&& lhs, auto&& rhs) {
return lhs.bar() < rhs.bar();
}
);
… What?
What does the code even do? Obviously auto
is harmful. It’s completely
obscuring the meaning of our code! Let’s fix that by adding explicit types:
std::vector<data::person> foo = frombulate();
std::sort(
foo.begin(),
foo.end(),
[](const data::person& lhs, const data::person& rhs) {
return lhs.bar() < rhs.bar();
}
);
Looking at the API for data::person
, we can see that bar()
is a deprecated
alias of name()
, and frombulate()
is deprecated in favor of get_people()
.
And using the name foo
to refer to a sequence of data::person
seems silly.
We have an English plural people
. Okay, let’s fix all those things too:
std::vector<data::person> people = get_people();
std::sort(
people.begin(),
people.end(),
[](const data::person& lhs, const data::person& rhs) {
return lhs.name() < rhs.name();
}
);
Perfect! We’re now know exactly what we’re doing: Sorting a list of people by name.
Crazy idea, though… Let’s put those auto
s back in and see what happens:
auto people = get_people();
std::sort(
people.begin(),
people.end(),
[](auto&& lhs, auto&& rhs) {
return lhs.name() < rhs.name();
}
);
Oh no! Our code has suddenly become unreadable again and… oh.
Oh wait.
No, it’s just fine. We can see that we’re sorting a list of people by
their name. No explicit types needed. We can see perfectly well what’s going on
here. Using foo
and bar
while demonstrating why some syntax/semantics are
bad is muddying the water. No one writes foo
and bar
in real
production-ready code. (If you do, please don’t send me any pull requests.)
Even Terser?
std::sort
in the above example takes an iterator pair to represent a “range”
of items to iterate over. Iterators are pretty cool, but the common case of
“iterate the whole thing” is common enough to warrant “we want ranges.”
Dealing with iterables should be straightforward and simple. With ranges, the
iterator pair is extracted implicitly, and we might write the above code like
this:
auto people = get_people();
std::sort(
people,
[](auto&& lhs, auto&& rhs) {
return lhs.name() < rhs.name();
}
);
That’s cool! And we could even make it shorter (even fitting the whole sort()
call on a single line) using an expression lambda:
auto people = get_people();
std::sort(people, [][&1.name() < &2.name()]);
What? You haven’t seen this syntax before? Don’t worry, you’re not alone: I
made it up. The &1
means “the first argument”, and &2
means “the second
argument.”
Note: I’m going to be using range-based algorithms for the remainder of this post, just to follow the running theme of terseness.
A Modest Proposal: Expression Lambdas
If my attempt has been successful, you did not recoil in horror and disgust as the sight of my made-up “expression lambda” syntax:
[][&1.name() < &2.name()]
Here’s what I hope:
- You are over the “learning curb” as you’ve seen how the syntax corresponds to an earlier syntax. (The “expression lambda” is roughly equivalent to the lambda in the prior example).
- You have seen how a prior “foreign” example (“terse” templates) can be understandable, even if not perfect.
- You know exactly what it means because the example does not simply use
“dummy” identifiers (
foo
,bar
,baz
, etc.) and actually acts in a real-world-use-case capacity.
Yes, the lead-in paragraphs were me buttering you up in preparation for me to unveil the horror and beauty of “expression lambdas.”
Prior Art?
But Vector, this is just Abbreviated Lambdas!
I am aware of the abbreviated lambdas proposals, and I am aware that it was shot down as (paraphrasing) “they did not offer sufficient benefit for their added cost and complexity.”
Besides that, “expression lambdas” are not abbreviated lambdas. Rather, the original proposal document cites this style as “hyper-abbreviated” lambdas. The original authors note that their abbreviated lambda syntax “is about as abbreviated as you can get, without loss of clarity or functionality.” I take that as a challenge.
For one, I’d note that all their examples use simplistic variables names, like
a
, b
, x
, y
, args
, and several others. The motivation for the
abbreviated lambda is to gain the ability to wield terseness where verbosity is
unnecessary. Even in my own example, I named my parameters lhs
and rhs
to
denote their position in the comparison, yet there is very little confusion as
to what was going on. I could as well have named them a
and b
. We
understood with the context what they were. The naming of parameters when we
have such useful context clues is unnecessary!
I don’t want abbreviated lambdas. I’m leap-frogging it and proposing hyper-abbreviated lambdas, but I’m going to call them “expression lambdas,” because I want to be different (and I think it’s a significantly better name).
Use-case: Calling an overload-set
C++ overload sets live in a weird semantic world of their own. They are not objects, and you cannot easily create an object from one. For additional context, see Simon Brand’s talk on the subject. There are several proposals floating around to fill this gap, but I contend that “expression lambdas” can solve the problem quite nicely.
Suppose I have a function that takes a sequence of sequences. I want to iterate
over each sequence and find the maximum-valued element within. I can use
std::transform
and std::max_element
to do this work:
template <typename SeqOfSeq>
void find_maximums(Seq& s) {
std::vector<typename SeqOfSeq::value_type::const_iterator> maximums;
std::transform(s,
std::back_inserter(maximums),
std::max_element);
return maximums;
}
Oops! I can’t pass std::max_element
because it is an overload set, including
function templates. How might an “expression lambda” help us here? Well, take a
look:
template <typename SeqOfSeq>
void find_maximums(Seq& s) {
std::vector<typename SeqOfSeq::value_type::const_iterator> maximums;
std::transform(s,
std::back_inserter(maximums),
[][std::max_element(&1)]);
return maximums;
}
If you follow along, you can infer that the special token sequence &1
represents “Argument number 1” to the expression closure object.
What if we want to use a comparator with our expression lambda?
template <typename SeqOfSeq, typename Compare>
void find_maximums(Seq& s, Compare&& comp) {
std::vector<typename SeqOfSeq::value_type::const_iterator> maximums;
std::transform(s,
std::back_inserter(maximums),
[&][std::max_element(&1, comp)]);
return maximums;
}
Cool. We capture like a regular lambda [&]
and pass the comparator as an
argument to max_element
. What does the equivalent with regular lambdas look
like?
template <typename SeqOfSeq, typename Compare>
void find_maximums(Seq& s, Compare&& comp) {
std::vector<typename SeqOfSeq::value_type::const_iterator> maximums;
std::transform(s,
std::back_inserter(maximums),
[&](auto&& arg) -> decltype(std::max_element(arg, comp)) {
std::max_element(arg, comp)
});
return maximums;
}
That’s quite a bit more. And yes, that decltype(<expr>)
is required for
proper SFINAE when calling the closure object. It may not be used in this exact
context, but it is useful in general.
What about variadics?
Simple:
[][some_function(&...)]
What about perfect forwarding?
Well… we’re still in the boat of using std::forward<decltype(...)>
on that
one. Proposals for a dedicated “forward” operator have been shot down
repeatedly. As someone who does a lot of perfect forwarding, I would love to
see a dedicated operator (I’ll throw up the ~>
spelling for now).
The story isn’t much better for current generic lambdas, though:
[&](auto&&... args) -> decltype(do_work(std::forward<decltype(args)>(args)...)) {
return do_work(std::forward<decltype(args)>(args)...);
}
“Expression lambdas” would face a similar ugliness:
[&][do_work(std::forward<decltype(&...)>(&...))]
At least it can get away from the -> decltype(...)
part.
If we had a “forwarding operator”, the code might look something like this:
[&](auto&&... args) -> decltype(do_work(~>args...)) {
return do_work(~>args...);
}
And this for “expression lambdas”:
[&][do_work(~>&...)]
Are we Perl yet?
Tell me if and why you love or hate my “expression lambda” concept.