New Age C++

More C++ Classes: Copy and Move Semantics

From a purely object-oriented perspective, "copy semantics" is the right way to preserve control over object ownership. But in those scenarios where ownership becomes irrelevant, C++11 "move semantics" is an efficient complement.

Continuing the coverage of C++ classes, this month I'll look at copy and move semantics. Consider the following code:

T t1, t2;				// objects t1 and t2 live in the stack
t2 = t1;				// t1 is copied to t2
unique_ptr<T> t_ptr{ new T() };         // the object pointed by t_ptr 
                                        // lives in the heap

The operator new is only used when the object must be created in the heap; in this case, new returns a pointer to the object. Assignments between stack objects (operator "=") perform a copy from the right side to the left. This is different from C# and Java. In these languages, t1 and t2 share the same instance after the operation. C++ classes, instead, have "copy semantics".

By default, all classes get:

  • An overridable copy constructor, which creates an object by copying class fields from another one.
  • An overridable copy assignment (operator "="), which copies the right-side object fields to their left-side counterparts.

From a purely object-oriented perspective, "copy semantics" is the right way to preserve control over object ownership. Once an assignment is made, both source and destination can be altered without impacting each other.

In C#, Java and Objective-C, assignments like "t1 = t2" affect both variables when the state of the object to which they refer changes. A C# class must implement the interface System.ICloneable when shared ownership isn't what the developer wants.

T t2 = t1.Clone();	// you define how the method Clone() creates a new instance based on t1
Overriding C++ Copy Semantics
You may have reasons to redefine the default copy semantics. For instance, in this generic binary tree container:

template <typename T>
class tree {
public: ...
private:
  class node {		// node is a nested type
  public: ...
  private:
    T value_;
    unique_ptr<node<T>> left_branch_, right_branch_;
  }
  unique_ptr<node<T>> root_;
}

The tree is a recursive, node-based structure. Inserted elements become nodes at the root itself: its left branch or the right one. This tree implementation is dynamic: rather than containing nodes, it contains pointers to nodes.

The default copy semantics between trees doesn't compile because unique_ptr has its copy semantics deleted by design (hence the "unique" designation). I could use error-prone primitive pointers and it would compile fine; the default copy semantics would copy pointers, though, rather than their pointed instances. Consequently, both trees would share the same nodes, rather than identical copies.

The default copy semantics performs a so-called "shallow copy", as opposed to a "deep copy" which duplicates objects referred by pointers. I redefined the copy behavior in Listing 1.

Copy Semantics and Rvalues
Prior to C++11, C++ developers had a problem. Certain functions, like the one here, create and return objects:

tree<int> create_millionaire_tree() {
  tree<int> t;
  ... // insert a million elements
  return t;
}

tree<int> t1 = create_millionaire_tree();

The last assignment invokes the copy-constructor to create t1 based on the result of create_millionaire_tree(). Therefore, a million elements are unnecessarily copied. The problem occurs because create_millionaire_tree lacks access to t1, so it creates a temporary tree t. When it finishes, the copy constructor begins, but the returned t is an rvalue whose name is out of scope. A possible workaround would make the function to receive t1 by reference:

void populate_millionaire_tree(tree<int>& t) {
  ... // empty t and insert a million elements into it
}

tree<int> t1;
populate_millionaire_tree(&t1);
How C++11's Move Semantics Changes the Game
C++11 solves this dilemma by allowing you to keep the original function create_millionaire_tree, but at the same time avoiding unnecessary copies. The double ampersand ("&&") when declaring parameters represents rvalue references: you could think of it as "the result of a function being received as an argument". Listing 2 shows how "move semantics" leverage rvalue references, to specially deal with rvalues in constructions and assignments.

A move constructor will be implicitly generated that will do a member-wise move if there is no user-defined destructor or user-defined copy or move constructors (this also applies to move assignment operators). In the rest of those cases, you must explicitly declare and define move semantics to have them available or copy semantics will just deal with both rvalues and lvalues as they did in previous versions of C++.

Listing 3 Code Notes
Listing 3 implements a binary tree with copy and move semantics. I recommend the Code::Blocks IDE with the MinGW toolchain (Visual Studio doesn't implement many C++11 features used in this sample).

The main function plays with a few tree variables, constructing some variables from others or assigning them. You'll see console traces at all times that explain what's going on. You can disable or enable move semantics by commenting or uncommenting the line:

#define MOVE_SEMANTICS

in unbalanced_binary_tree.h. When run with move semantics, you'll see that this line in the function main():

integer_tree = make_word_size_tree(lincoln_tree);

generates the following output:

Leaving function make_word_size_tree().
Beginning tree move-assignment...
Clearing out previous content at left-side...
Moving root to the left-side.
Deleting tree...

Run the sample again, this time commenting the #define. Compare the output:

Leaving function make_word_size_tree(). Local tree<unsigned> t about to lose vis
ibility...
Beginning tree copy-assignment...
Clearing out previous content at left-side...
Node(1) copy-constructed.
Node(2) copy-constructed.
(...)
Node(4) copy-constructed.
Deleting tree...
Destroying node(4).
Destroying node(5).
(...)
Destroying node(1). 
Manage Instance Member Ownership, without Micromanaging
C++ copies instances during construction and assignments when it gets another instance as argument. Copy semantics are shallow in their default version as pointer members are copied, but not their pointed instances. However, the developer can override these semantics to foster instance state ownership control through deep copies. You must avoid copy semantics when the source is a temporary instance accessible through an rvalue that some function returns. Define move semantics for those cases.

I’ll see you next time!

About the Author

Diego Dagum is a software architect and developer with more than 20 years of experience. He can be reached at email@diegodagum.com.

comments powered by Disqus

Reader Comments:

Tue, Dec 4, 2012 lightness1024

This statement "The last assignment invokes the copy-constructor to create t1 based on the result of create_millionaire_tree()." Is wrong. Before C++11 a thingie known as Return Value Optimization existed. And it is not compiler specific, it is guaranteed by standard. So please revise your article, to make it correct you just need to separate declaration from initialization. Cheers

Fri, Sep 7, 2012 Diego Dagum Kirkland, WA

Answer: only an instance of Tree will live in the stack, but none of its nodes. Not even the root as I erroneously said before. The way its designed, class Tree holds a unique_ptr to root (a Node instance), which in turn holds two unique_ptrs to its branches. The instantiation of all Tree nodes are made by calling new Node and giving the result to some unique_ptr, it's valid to say that all Nodes live in the heap. The stack will only have a Tree containing a unique_ptr pointing to a root living in the heap.

Fri, Aug 31, 2012 Diego Dagum Kirkland, WA

(Sorry about my recent typo: I wanted to mean "only one (node) will live in the stack")

Fri, Aug 31, 2012 Diego Dagum Kirkland, WA

Greetings, Emerth. From the million nodes in that tree, only one will leave in the stack while the remaining 999,999 will be in the heap. Can you see why? I'll leave it as a quiz for a couple of days, and promise to explain why it won't happen that the contained elements aren't in the stack.

Wed, Aug 29, 2012 emerth

"tree create_millionaire_tree() { tree t; ... // insert a million elements return t; }" Uhhh, maybe don't create your great big data containers on the stack?

Mon, Aug 20, 2012 sarah

very informative post indeed .being enrolled in http://www.wiziq.com/course/5776-object-oriented-programming-with-c i was looking for such articles online to assist me and your article helped me a lot. i really like that you are providing such information.

Mon, Aug 13, 2012 Diego Dagum Kirkland, WA

Well observed, Greg!! I really appreciate it. I'll be posting an update based on your comment soon. Thanks for helping me improve this article.

Mon, Aug 13, 2012 GregM

"Unlike copy semantics, which are implicitly defined, you must explicitly declare and define move semantics to have them available. Otherwise, copy semantics will just deal with both rvalues and lvalues, like in previous versions of C++." That is not quite accurate. A move constructor or move assignment operator will be implicitly generated that will do a member-wise move if there is no user-defined destructor, and no user-defined copy/move constructors/assignment operators. This was decided as a compromise between the safety of never providing them and the utility of always providing them.

Add Your Comments Now:

Your Name:(optional)
Your Email:(optional)
Your Location:(optional)
Comment:
Please type the letters/numbers you see above

.NET Insight

Sign up for our newsletter.

I agree to this site's Privacy Policy.