New Age C++

More C++ Classes: Copy and Move Semantics

From a purely object-oriented perspective, "copy semantics" is the right way to preserve control over object ownership. But in those scenarios where ownership becomes irrelevant, C++11 "move semantics" is an efficient complement.

Continuing the coverage of C++ classes, this month I'll look at copy and move semantics. Consider the following code:

T t1, t2;				// objects t1 and t2 live in the stack
t2 = t1;				// t1 is copied to t2
unique_ptr<T> t_ptr{ new T() };         // the object pointed by t_ptr 
                                        // lives in the heap

The operator new is only used when the object must be created in the heap; in this case, new returns a pointer to the object. Assignments between stack objects (operator "=") perform a copy from the right side to the left. This is different from C# and Java. In these languages, t1 and t2 share the same instance after the operation. C++ classes, instead, have "copy semantics".

By default, all classes get:

  • An overridable copy constructor, which creates an object by copying class fields from another one.
  • An overridable copy assignment (operator "="), which copies the right-side object fields to their left-side counterparts.

From a purely object-oriented perspective, "copy semantics" is the right way to preserve control over object ownership. Once an assignment is made, both source and destination can be altered without impacting each other.

In C#, Java and Objective-C, assignments like "t1 = t2" affect both variables when the state of the object to which they refer changes. A C# class must implement the interface System.ICloneable when shared ownership isn't what the developer wants.

T t2 = t1.Clone();	// you define how the method Clone() creates a new instance based on t1
Overriding C++ Copy Semantics
You may have reasons to redefine the default copy semantics. For instance, in this generic binary tree container:

template <typename T>
class tree {
public: ...
private:
  class node {		// node is a nested type
  public: ...
  private:
    T value_;
    unique_ptr<node<T>> left_branch_, right_branch_;
  }
  unique_ptr<node<T>> root_;
}

The tree is a recursive, node-based structure. Inserted elements become nodes at the root itself: its left branch or the right one. This tree implementation is dynamic: rather than containing nodes, it contains pointers to nodes.

The default copy semantics between trees doesn't compile because unique_ptr has its copy semantics deleted by design (hence the "unique" designation). I could use error-prone primitive pointers and it would compile fine; the default copy semantics would copy pointers, though, rather than their pointed instances. Consequently, both trees would share the same nodes, rather than identical copies.

The default copy semantics performs a so-called "shallow copy", as opposed to a "deep copy" which duplicates objects referred by pointers. I redefined the copy behavior in Listing 1.

Copy Semantics and Rvalues
Prior to C++11, C++ developers had a problem. Certain functions, like the one here, create and return objects:

tree<int> create_millionaire_tree() {
  tree<int> t;
  ... // insert a million elements
  return t;
}

tree<int> t1 = create_millionaire_tree();

The last assignment invokes the copy-constructor to create t1 based on the result of create_millionaire_tree(). Therefore, a million elements are unnecessarily copied. The problem occurs because create_millionaire_tree lacks access to t1, so it creates a temporary tree t. When it finishes, the copy constructor begins, but the returned t is an rvalue whose name is out of scope. A possible workaround would make the function to receive t1 by reference:

void populate_millionaire_tree(tree<int>& t) {
  ... // empty t and insert a million elements into it
}

tree<int> t1;
populate_millionaire_tree(&t1);
How C++11's Move Semantics Changes the Game
C++11 solves this dilemma by allowing you to keep the original function create_millionaire_tree, but at the same time avoiding unnecessary copies. The double ampersand ("&&") when declaring parameters represents rvalue references: you could think of it as "the result of a function being received as an argument". Listing 2 shows how "move semantics" leverage rvalue references, to specially deal with rvalues in constructions and assignments.

A move constructor will be implicitly generated that will do a member-wise move if there is no user-defined destructor or user-defined copy or move constructors (this also applies to move assignment operators). In the rest of those cases, you must explicitly declare and define move semantics to have them available or copy semantics will just deal with both rvalues and lvalues as they did in previous versions of C++.

Listing 3 Code Notes
Listing 3 implements a binary tree with copy and move semantics. I recommend the Code::Blocks IDE with the MinGW toolchain (Visual Studio doesn't implement many C++11 features used in this sample).

The main function plays with a few tree variables, constructing some variables from others or assigning them. You'll see console traces at all times that explain what's going on. You can disable or enable move semantics by commenting or uncommenting the line:

#define MOVE_SEMANTICS

in unbalanced_binary_tree.h. When run with move semantics, you'll see that this line in the function main():

integer_tree = make_word_size_tree(lincoln_tree);

generates the following output:

Leaving function make_word_size_tree().
Beginning tree move-assignment...
Clearing out previous content at left-side...
Moving root to the left-side.
Deleting tree...

Run the sample again, this time commenting the #define. Compare the output:

Leaving function make_word_size_tree(). Local tree<unsigned> t about to lose vis
ibility...
Beginning tree copy-assignment...
Clearing out previous content at left-side...
Node(1) copy-constructed.
Node(2) copy-constructed.
(...)
Node(4) copy-constructed.
Deleting tree...
Destroying node(4).
Destroying node(5).
(...)
Destroying node(1). 
Manage Instance Member Ownership, without Micromanaging
C++ copies instances during construction and assignments when it gets another instance as argument. Copy semantics are shallow in their default version as pointer members are copied, but not their pointed instances. However, the developer can override these semantics to foster instance state ownership control through deep copies. You must avoid copy semantics when the source is a temporary instance accessible through an rvalue that some function returns. Define move semantics for those cases.

I’ll see you next time!

About the Author

Diego Dagum is a software architect and developer with more than 20 years of experience. He can be reached at [email protected].

comments powered by Disqus

Featured

Subscribe on YouTube