C++ Can't Abandon Raw Pointers
…yet
This post was born from a Reddit thread regarding observer_ptr
.
For those unfamiliar, observer_ptr
is the “dumb” smart pointer. It simply
provides some of the semantics of a T*
in a class type.
Why would I want a class to wrap
T*
when I can just useT*
?
Here’s a few good reasons to use an observer_ptr
:
observer_ptr<T>
captures your intent, just likeunique_ptr
,shared_ptr
andT&
. In this case, your intent is to have a non-owning reference to an object.observer_ptr<T>
can be instrumented with runtime debug checks if you try to use it in an invalid manner. This alone is my favorite reason.observer_ptr<T>
does not provideoperator[]
nor arithmetic.
The thread in question was regarding the name “observer_ptr
”. The name
doesn’t quite roll off the tough, and it may cause
confusion with concepts like the observer pattern, to which it
is unrelated.
I like the idea of observer_ptr
. I hope the proposal goes through, but I’d
like to address an orthogonal issue and suggest a completely new
pointer/reference type that I haven’t seen yet:
optional_ref<T>
Or maybe just
opt_ref<T>
It will fill some gaps we have in our reference/pointer toolbox.
Why Do We Keep Using T*
?
Some modern C++ evangelists say “Never use raw pointers!” Other evangelists
admit that they have a valid purpose and instead limit themselves to “Never use
raw new
or delete
.”
I’m split between the two camps. I’d rather see the T*
syntax be relegated to
low-level libraries and never used again, but I admit it still has valid use
cases with no good alternatives.
Why T*
Is Evil
A huge problem with raw pointers is overloaded semantics. A raw pointer T*
may be used as:
- A reference to a
T
- A reference to a
T
that can be “re-bound” - A reference to an array of
T
- A reference to an array of
T
that can be “re-bound” - An optional reference to a
T
- An optional reference to an array of
T
This is a big problem with using T*
.
especially in code that follows certain style guides mandating its use. You know the one.
Modern C++ provides us with the following alternatives for each use case of
T*
, each with a distinct meaningful use case :
-
We have
T&
to represent non-nullable references. It’s one of my favorite C++ features. Implicitly nullable references in other languages are complete insanity. It should be the first choice when choosing a reference type. -
We have
std::reference_wrapper<T>
. It may be a mouthful and a lot to type, but where it really shines is when used in containers or as a data member of a class.Using a
T&
as a class member is a difficult. It prevents implementation of class assignment and effectively makes the class “immovable.” Maybe this is what you want: Go for it.Using a
T&
in many containers simply doesn’t work. Since many containers perform assignments internally, we reach the same issue as usingT&
as a data member.In old C++ we’d reach for
T*
to answer this problem, but now we have a better solution:std::reference_wrapper<T>
. It has no default constructor, so one won’t accidentally leave it “un-bound” like one might with aT*
. It provides a never-null guarantee like a realT&
, and it automatically converts to and from aT&
, so passing and returning it as aT&
is completely invisible and unsurprising.The only downside is that you can’t use the regular
.
operator to get at the members ofT
. You need to call.get()
or bind it to a realT&
. Not a big problem for the upsides, in my opinion. Maybe someday we can overloadoperator.
too. -
For references to contiguous sequences, we have a few alternatives, the only standard one being iterator pairs. Using iterator pairs is great if you want to support non-contiguous ranges with no performance penalty on contiguous ones.
Soon, we’ll also have Ranges, which are even better.
For contiguous sequences, let’s hope for
span<T>
orarray_view<T>
. -
Iterators pairs and
span<T>
/array_view<T>
solve the rebinding problem nicely. Nothing more to say. -
All we have today is
T*
or… shudderstd::optional<std::reference_wrapper<T>>
. This is where anoptional_ref<T>
could save the day. -
Optional arrays can be conveyed with
std::optional
of another type.
We’re really missing two big cases: (4) and (5).
(4) is solved by gsl::span
, and will hopefully have a standard counterpart
in the future.
(5) has an ugly solution: std::optional<std::reference_wrapper<T>>
. Can’t
want.
A Brief and Horribly Incomplete History Lesson on std::optional<T>
std::optional<T>
is one of my favorite library additions ever. In technical
terms, it creates a type with the domain of all T
values plus a null T
value. Even if T
is conceptually infinite (like std::string
, where there
are infinite possible values), we can construct a value outside the domain of
T
, which we represent in code with std::nullopt
.
It’s a way to say “Maybe there’s a T
. Or there’s this sentinel ‘null’ value”.
Before std::optional
was std::optional
, it was boost::optional
. It had
very similar semantics, with a two notable differences:
std::nullopt
isboost::none
.boost::optional
supports references. (!!!)
Why did std::optional
drop support for reference type parameters?
The problem with having a std::optional<T&>
boils down to code like this:
int a = 1729;
int b = 42;
std::optional<int&> int_ref = a;
int_ref = b;
cout << a; // <-- What should this print?
There are two semantics that might be taken here:
int_ref = b
re-binds the reference ofint_ref
to refer tob
.int_ref = b
assigns through toa
.
In case (1), the value of a
is unaffected by the assignment, so it prints
1729
. In case (2), the value of b
is assigned to a
, and we print 42
.
boost::optional
takes the first route and re-binds the reference.
As an aside, I used to ardently believe that case (1) is the only sensible behavior to expect. Even while writing this post, I found an insightful comment from Tony van Eerd on the exact topic. Even though behavior (2) is possible to occur, I’d be extremely surprised to see it ever happen should
std::optional<T>
have been given valid behavior for reference types. I still thinkstd::optional<T&>
should have gone through with rebind semantics, but what’s done is done.
This reference-binding confusion (along with some concerns about total ordering
of std::optional
) was a big hold-up that prevented us from getting
std::optional
in C++14 as originally hoped.
What about std::optional<std::reference_wrapper<T>>
?
A few things:
operator*
andoperator->
onstd::optional<std::reference_wrapper<T>>
return astd::reference_wrapper<T>
, not aT&
. If you’re immediately passing the object as a parameter this isn’t a problem, but it makes usage code ugly with calls to.get()
everywhere.- it has more overhead than a raw pointer
T*
(without some standard-library trickery).std::reference_wrapper<T>
can be implemented using a simple raw pointer, and the simpleststd::optional<T>
implementation consumes anstd::aligned_storage<T>
plus abool
to represent the state. A smart standard library could detect thereference_wrapper
parameter and instead use a regular pointer withnullptr
being the “dis-engaged” state. - It’s so much typing, and we’re lazy.
And worst of all:
It’s not constructible from a T&&
.
This is an intentional design of std::reference_wrapper
, to prevent this code:
std::reference_wrapper<const std::string> my_str = std::string("Hello!");
// my_str now points into the void
Unfortunately, that also prevents this code from working:
void lang_ref(const MyClass&);
void lib_ref(std::reference_wrapper<const MyClass>);
void bar() {
lang_ref(MyClass{}); // <-- Okay
lib_ref(MyClass{}); // <-- ERROR!
}
The reference_wrapper<const MyClass>
parameter will not bind to a MyClass&&
in the above example. This isn’t usually a problem because using
std::reference_wrapper<T>
as a function parameter is rarely what you actually
want.
Because of this prohibition on r-values for reference_wrapper
binding, we
similarly can’t do this:
void with_optional_ref(std::optional<std::reference_wrapper<const MyClass>>)
void bar() {
with_optional_ref(MyClass{});
}
optional<T>
is only convertible from a U
if U
is implicitly convertible
to T
. For a U
of MyClass&&
and a T
of reference_wrapper<const MyClass>
,
this convertibility is not allowed, so the U
cannot convert to optional<T>
.
Dang.
An Alternative: opt_ref<T>
An extremely primitive implementation of an “optional reference” type might look like this:
template <typename T>
class opt_ref {
T* _ptr = nullptr;
public:
opt_ref() noexcept = default;
opt_ref(T& reference) noexcept
: _ptr(std::addressof(reference)) {}
opt_ref(std::nullopt_t) noexcept
: _ptr(nullptr) {}
explicit operator bool() const noexcept { return _ptr != nullptr; }
T& operator*() noexcept {
assert(_ptr != nullptr && "Dereferencing null opt_ref");
return *_ptr;
}
T* operator->() noexcept {
assert(_ptr != nullptr && "Dereferencing null opt_ref");
return _ptr;
}
};
Here I’ve chosen the name opt_ref
for brevity, and because I think it suffices
to explain its purpose.
This definition affords a few niceties over both optional<reference_wrapper<T>>
and T*
as an “optional reference” type.
Over optional<reference_wrapper<T>>
:
- It’s convertible from a
T&&
for the case ofopt_ref<const T>
. This can be dangerous if misused, but that’s what C++ is, right? - The
operator*
andoperator->
both return the underlyingT&
. Nice. - The same size overhead as a
T*
. - It’s less to type. A lot less.
Over T*
:
- None of the overloaded semantics. It is clear in what it represents.
- It will bind to plain expressions. No need to invoke
operator&
orstd::addressof
. - We can put debug assertions on it’s
operator*
andoperator->
.
“Why Should I Care?”
It’s may be difficult to see utility in such a tiny class template. It doesn’t solve a huge swath of problems we face. It doesn’t fundamentally change the way one write’s code. It’s just a silly wrapper around a pointer.
But that’s the thing: Everything is just a wrapper around pointers. Some are
just bigger than others. std::vector
? Just a wrapper around some pointers.
std::unique_ptr
? Just a wrapper around a pointer. std::reference_wrapper
?
Just a wrapper of a pointer.
C++ is about building abstractions. The abstractions lead to better applications. After all, libraries are useless until they end up as part of an application.
Libraries like Jonathan Müller’s output_parameter<T>
and type_safe
library, Niall Douglas’s
Outcome (hopefully someday std::expected
),
or std::chrono
are the real bread and butter of C++. These aren’t enormous
all-encompassing frameworks that implement half the universe. They’re just
little libraries that build upon lower-level primitives to reduce errors and
create expressive, meaningful code.
I won’t claim opt_ref
is anything as fantastic as the libraries I just
mentioned. It’s a concept anyone could have easily come up with and implemented
in five minutes like I did.
It was a five minutes well spent. For the relevant project, I swapped out all
places using a T*
and changed to opt_ref<T>
. The payoff was fast, and I
could now have real optional reference parameters. It saved me time debugging as
well when I accidentally dereference them without checking and I see an
assertion fire exactly where the code hit, giving me a clear and meaningful
error message in my stderr output.
Finally, I ask: Why keep using pointers when we have better alternatives?