New Age C++

Improving C++ Encapsulation with the Pimpl Idiom

.NET and Java developers are perplexed about the indiscreet way C++ discloses private class details. Pimpl (pointer-to-implementation) solves this problem by keeping secrets hidden from peepers.

I mentioned in a recent column that class declaration and definition usually go in different files. Declaration depicts the class structure in an .h file (header) that is later included by other classes that use it. Class definition, instead, goes in .cpp files and implements its behavior.

Like managed languages, C++ classifies member visibility in three categories: public, protected and private. However, all members must be exposed in the declaration header, even those only accessible by the class exclusively. Because of this, C++ encapsulation may seem strange.

There's a reason for this contradiction: dependencies between objects must be resolved at compile time, as everything must have been already wired by runtime. Listing 1 shows that if a "B" class member is of a certain type "A", then each "B" instance will contain an instance of "A". In managed languages, "B" would have had a reference to an instance of "A".

Listing 1 demonstrates why entire structures must be disclosed in header declarations: they're needed to estimate field offsets. This doesn't convince managed-language developers, though, who wonder if there's a more efficient solution in C++. There is. It's known as the Pimpl Idiom.

The Pimpl Idiom
Pimpl got its name from a frequent name developers give to class private pointers to the hidden implementation. pImpl (with lowercase p and capital I) comes from Hungarian notation. So if a class field represents a pointer to, let's say, a city, then the field name is pCity and is read "pointer to city".

To understand the Pimpl idiom, we'll review forward declarations first. In its basic form, a type is forward-declared with

class my_type; // alternatively struct

This just asks the compiler to accept "my_type" as a type name that will be later declared and defined (though possibly not in that same compilation unit). The Pimpl idiom leverages forward declarations to hide details this way:

// some_public_class.h
class some_public_class {
  struct some_private_impl; // forward-declared nested type
  unique_ptr<some_private_impl> pImpl;
public:
  some_public_class();
  void some_function(int i);
   … // other public members follow
}

A source file that includes some_public_class.h won't complain about any missing some_private_impl data during compilation. The real field that class some_public_class has is a unique_ptr called pImpl. In any case, some_private_impl is declared and defined in some_public_class.cpp (not in the header):

// some_public_class.cpp
struct some_public_class::some_private_impl {
   … // implementation follows
}

// constructor
some_public_class::some_public_class() : pImpl{new some_private_impl{}} {}

void  some_public_class::some_function(int i) {
  … // implementation delegating in pImpl
}

The nested type that models the implementation might be class instead of struct. I prefer struct because by default its fields are public. If I chose a class, I'd have to explicitly declare them as public members. When defining some_public_class functions, I can freely access fields at the private inner struct as if they were part of the public class.

Listing 2 shows an example of a message queue internally backed as a primitive array of 1,000 positions.

main.cpp contains the main program that uses my_queue. It doesn't know anything about the 1,000-entry array. It only knows that my_queue contains a field, pImpl, of type unique_ptr, to some type queue_impl, whose internal structure is unknown.

There's an immediate benefit in this obscurity: no client recompilation is needed if the private implementation changes. Listing 3 offers an alternative implementation backed on a standard library list.

Notice how copy and move semantics are implemented. unique_ptr plays a compelling role in the implementation of the latter, allowing me to just loop back on the default C++11 implementation.

At the time of this writing, however, Visual Studio 2012 doesn't feature default implementations. I recommend the Code::Blocks IDE and the MinGW toolchain to test these snippets.

It's also worth mentioning that the Visual C++ team is starting to catch up with competitors like Clang or GCC in terms of C++11 conformance. Take a look at a recent compiler refresh published at the beginning of November: It's still in an alpha stage (or Community Technology Preview, in Microsoft's own terminology). Consequently, this refresh isn't part of the recent Visual Studio 2012 Update 1, released a few days ago. There isn't yet an official date for the production-ready compiler upgrade.

Implementation Challenges
While this idiom hides implementation details and prevents unnecessary recompilation, you may hear criticism from those who try it:
  • It's hard to implement because the more API functions the public class has, the more delegation points to stuff.
    Response: This isn't necessarily true: my two queue implementations are passive structs with no behavior. I didn't delegate calls to my_queue into calls to queue_impl. I used queue_impl fields as if they always belonged to my_queue.
  • There is slower function call turnaround, since the public class becomes a proxy of its nested implementation.
    Response: Again, this still assumes that the internal implementation is a standalone class and my_queue a mere go-between. It wasn't my case, so I disagree.
  • Uneasy inheritance. If you wanted to subclass the queue, which class should you depart from? my_queue or queue_impl? Perhaps both?
    Response: Subclassing is indeed difficult, but certainly not impossible. Check Listing 3: I substituted queue_impl and relinked the code based. I admittedly sealed the issue at compile time, without substituting inheritance. Inheritance would have allowed me instantiate indistinctly the superclass or its subclass. I could keep coexistent queue_impl's, using patterns like Strategy, Abstract Factory or Decorator at runtime, as opposed to compile time; I'd choose the one that fits best.
  • There's no room for generic programming. For instance, my_queue only works with standard strings. If I were to make a generic queue no matter the type of its elements, it wouldn't be possible to keep the definition hidden.
    Response: This is a known issue with generic programming in C++, although it can be overcome as well. I'll demonstrate in my next article.
Keep it Hidden
I've shown how the structural implementation of a class can stay hidden from those classes that include its header file. This improves not only encapsulation; it also cuts superfluous dependencies that would otherwise force unnecessary recompilation cascades.

About the Author

Diego Dagum is a software architect and developer with more than 20 years of experience. He can be reached at [email protected].

comments powered by Disqus

Featured

Subscribe on YouTube