I’ve been professionally developing software with C++ for a few years now and I’ve been seeing a pattern in the C++ world that’s bothered me for quite some time now. That pattern is the use of std::pair
from the <utility>
header in C++’s STL in non-generic code.
TL;DR
Use of C++’s std::pair
unnecessarily obfuscates what’s being represented in the pair when used outside of generic code. We can use simple struct
s instead to allow our code to become more self-documenting.
What is std::pair?
It’s prudent to start with what std::pair
is for the uninformed. std::pair
is a data structure in <utility>
intended to provide a generic pairing of two objects. Note the emphasis on “generic”. We’ll come back to the reason for that later. For now, we can define a pair like this in C++:
template <class T, class U>
struct pair {
T first;
U second;
};
In fact, this is essentially how libstdc++ implements std::pair once you read past all of the template boilerplate. std::pair
is also used as the internal value type of some generic containers in the STL like std::map in libstdc++.
So what’s the problem?
If std::pair
is good enough for the standard, then it’s good enough for you right? No. Well, maybe. See, it depends. The problem isn’t really with std::pair
on it’s own; on it’s own, it’s a fine little data structure. The problem starts when we see std::pair
being used for a temporary mapping of concrete data. This is different to what we see in the standard’s use of std::pair
; it’s being used to store or return a generic pairing of two values like in std::map
’s implementation. std::map
itself stores a collection of key to value mappings of generic types. It follows that each of the mappings themselves (the pairs) must be generic as well. Since the pairing is generic, we need a generic way to reference the first value and the second value of the mapping.
This reasoning is why std::pair
names its member variables first
and second
. They represent a way of referring to the first and second element of the pair. What do first
and second
represent? We know what types they are but we don’t know what those types represent. We could have a std::pair<int, int>
but we don’t know what those types are meant to represent based on either the type definition or the names of the members.
An example
Let’s work through a basic example to see where this could be problematic. Suppose we are writing a game and in the game, we need to fetch the current player’s score and keep track of it. Here’s an implementation that could work:
std::pair<std::string, int> getPlayerNameToScore(const std::string& playerName)
{
// look up the score somehow and return it
int score = ...;
return {playerName, score}
}
// client code
auto playerNameToScore = getPlayerNameToScore(getCurrentPlayerName());
This code ticks most of the boxes that we generally look for when evaluating code: it is well documented, the code is concise, it is easy to read, it applies const-correctness where possible, and it probably works. Once we dig in a little deeper, we start to notice a few problems.
First, I’ve actually exposed an underlying problem with the code’s design through my good naming convention; I’ve shown that there is redundancy in the code. I’ve had to state both in the function name and in the returned object that it is mapping a player name to score. If I didn’t do this and named the returned value “score” or something like that, my code would lose readability. This redundancy propagates the more that the std::pair<std::string, int>
is used in the code. Second, I’ve moved the burden onto the reader to constantly remember that the std::pair<std::string, int>
returned exclusively represents a player’s name mapped to their score. After all, the members of that pair do nothing to remind the reader that that’s what they are supposed to represent. Finally, I’ve provided an unnecessarily tight coupling between a player and his score. What if I decide to add another element that I’d like to keep track of like the player’s running highest score? My code would need to change in a lot of places to add that in.
The solution
If we know that all players have a score and the player and score are constantly referenced together, then we can redesign our code to follow the spatial locality and DRY principles of OOD.
struct Player {
std::string name;
int score;
};
We have solved all of our problems by doing a simple redesign of the code.
- Eliminated redundancy. Every name is important and distinct in this structure.
- Deleted code. We remove the need for the
getPlayerNameToScore
function entirely. Why? Because if we know who the player is, we know what their score is by design. - Increased documentation. By keeping commonly used data together like this, we’re telling our readers that a “Player” has a name and a score. Moreover, readers now know what the
std::string
andint
in the structure represent thanks to the descriptive names assigned to each. Havingplayer.score
represent a score is much more readable than havingplayer.second
represent a score. - Insulated code from changes. We also lifted the level of abstraction of our code. Instead of writing our code to pass around player names and scores, we can write our code to pass around
Player
objects. This has the benefit of allowing our code to be insulated from change to thePlayer
structure.