Modern C++

Applying the Range-for Statement in C++

Want to iterate through a sequence with C++11? If so, the easiest way will be to use the range-for statement – Kenny Kerr shows you how.

The simplest way to iterate through a sequence with C++11 is to use the range-for statement. Next to auto and nullptr, it's one of the simplest constructs introduced by the C++11 standard. However, to appreciate where it can take us, we need to remember where we've been.

Imagine a container with some intriguing elements:

auto c = vector<int> { 1, 2, 3, 4, 5 };

I can use subscripting to access each element with a traditional for statement:

for (auto i = 0u; i != c.size(); ++i)
{
  printf("%d\n", c[i]);
}

This works and is very efficient. It's also such a common pattern for hardened C++ developers that we can write it perfectly without thinking. Still, it's tedious and we do occasionally get it wrong. The bigger concern is that it doesn't generalize to containers and sequences in general. For that, the standard library provides the iterator model:

for (auto i = begin(c); i != end(c); ++i)
{
  printf("%d\n", *i);
}

Here I'm still using a for statement, but "i" is now an iterator. Iterators are wonderful abstractions akin to pointers in that they provide indirect access to an element. Given two iterators, a sequence may be described as a half-open range. The first iterator points to the first element in the sequence, while the second iterator points to the one-beyond-the-last element. That's quite a mouthful, so the standard library just uses the terms "begin" and "end."

Notice in the for statement above that there isn't a whole lot that's required of iterators -- not in this example, anyway. The prefix form of the ++ operator -- the increment operator -- is needed to logically move the iterator forward so that it points or refers to the next element in the sequence. The != operator -- the not-equal or inequality operator -- is used to test whether the iterator has reached the end of the sequence. This is used in favor of the < operator -- the less-than operator -- as support for the < operator is not as widely available and != expresses more directly what it is that we're trying to determine. Finally, the * operator -- the access or dereference operator -- is used to refer directly to the element that the iterator points to.

As long as the begin and end functions return suitable iterators, this form of iteration may be applied to most any sequence of elements. The sequence can be sequential or associative. It can be ordered or unordered. It can even be a sequence consisting of an indeterminate number of elements. The range-for statement is built on this simple abstraction:

for (auto e : c)
{
  printf("  %d\n", e);
}

This loop uses the range-for iteration statement to access each element in the container. As much as we might pat ourselves on the back for being able to write the equivalent long forms of this loop, it's elegant and deliberately simple. It has a simple set of requirements and is itself the simplest way to loop through a sequence of elements.

What makes it interesting has more to do with the iterators that it relies on than the range-for construct itself. By default, the compiler first looks for begin and end methods -- member functions -- that may be used to define a range. In the example above, I'm iterating over a standard vector of integers, and the standard vector container happens to provide suitable begin and end methods. The compiler is thus happy to use those. If that fails, the compiler next looks for a suitable pair of begin and end functions in the enclosing scope. This is often viewed as a way for the range-for statement to iterate over a built-in array. In practice, the compiler can and may well implement the range-for statement without any help.

On the other hand, the fact that the compiler will consider non-member begin and end functions opens the door for some interesting possibilities. Consider the standard multimap container:

auto c = multimap<int, int>
{
  { 1, 101 },
  { 2, 201 },
  { 2, 202 },
  { 3, 303 },
};

A multimap is an associative container that stores an ordered sequence of key-value pairs. The "multi" part of its name implies that the keys need not be unique. I can easily use the range-for statement to iterate over its elements:

for (auto const & e : c)
{
  printf("%d, %d\n", e.first, e.second);
}

But what if I want to iterate over a sub-sequence? I might want to limit the iteration to those elements whose key is 2 using the equal_range method. It would be great if I could write this:

for (auto const & e : c.equal_range(2))
{
  printf("%d, %d\n", e.first, e.second);
}

Unfortunately, that won't work since equal_range returns a standard pair of iterators and, while those iterators do represent a valid range or sequence, the compiler doesn't know that. The standard pair class template doesn't provide begin and end methods. The enclosing scope, the std namespace in the case of the pair class, doesn't provide begin and end functions specialized or overloaded for the pair class template. The rationale was that not all pairs of iterators represent a valid range, but you can certainly define such functions yourself:

namespace std
{
  template <typename T>
  auto begin(pair<T, T> const & range) -> T
  {
    return range.first;
  }

  template <typename T>
  auto end(pair<T, T> const & range) -> T
  {
    return range.second;
  }
}

Given these functions, this range-for statement works perfectly:

for (auto const & e : c.equal_range(2))
{
  printf("%d, %d\n", e.first, e.second);
}

It's certainly easy to work with preexisting iterators, but there are many cases where valid iterators may not be available. In such cases, it may require a bit more work up front to get the range-for statement to work. Consider the spell-checking example from my Modern C++ column back in August. One of the possible corrective actions provided by the Spell Checking API took the form of a list of suggestions. I might use the spell checker object to retrieve a list of suggestions as follows:

auto suggestions = ComPtr<IEnumString> {};

HR(checker->Suggest(word.c_str(),
                    suggestions.GetAddressOf()));

The challenge comes when I try to enumerate the list. I ended up having to writing a rather quirky loop that looked something like this:

for (;;)
{
  wchar_t * suggestion { nullptr };

  if (S_OK != suggestions->Next(1, &suggestion, nullptr))
  {
    break;
  }

  // Use suggested word here ...
  
  CoTaskMemFree(suggestion);
}

This is quite tedious and error-prone. I must remember to call the CoTaskMemFree function to free the string and need to consider exception safety. It sure would be nice if I could simply write this:

for (auto suggestion : suggestions)
{
  printf("%S\n", suggestion);
}

Surely, that's not possible, right? Not so fast! Given what we know about how the compiler implements the range-for statement, can't we provide the necessary scaffolding to satisfy the compiler? Let's give it a try. I'll start by defining an iterator class.

class EnumStringIterator
{
   IEnumString * m_sequence;
   wchar_t * m_string;
  // ...
};

Care must be taken to ensure that iterators are efficient. It's one thing to use a lovely abstraction such as a range-for statement, but it should not introduce additional overhead. The iterator will assume that the backing range will remain valid and won't assume additional references. The iterator will also disable copy semantics and provide move semantics to ensure correct ownership behavior of the resource pointed to by the m_string member variable.

A normal and explicit constructor is used for defining the backing sequence:

explicit EnumStringIterator(IEnumString * sequence) throw()
  : m_sequence { sequence },
    m_string {}
{
}

A destructor then takes care of freeing the "element," according to the requirements of the IEnumString interface:

~EnumStringIterator()
{
  if (m_string)
  {
    CoTaskMemFree(m_string);
  }
}

A typical move constructor and move assignment ensures proper ownership is transferred as needed. You can check out the complete class definition at the end of this article, but it's quite straightforward. As I mentioned, the * operator is used to refer to the element being pointed to, in this case the string:

auto operator*() const throw() -> wchar_t const *
{
  ASSERT(m_sequence);
  ASSERT(m_string);

  return m_string;
}

I'll also define the == and != operators simply to check whether the end of the sequence has been reached. Technically, only the latter is required, but I like to define these two together:

auto operator==(EnumStringIterator const & other) const -> bool
{
  ASSERT(m_sequence == other.m_sequence);
  return m_string == other.m_string;
}

auto operator!=(EnumStringIterator const & other) const -> bool
{
  ASSERT(m_sequence == other.m_sequence);
  return m_string != other.m_string;
}

It's the ++ operator that does most of the heavy lifting. It needs to conditionally free any existing string and then retrieve the next string from the sequence, if any.

auto operator++() throw() -> EnumStringIterator &
{
  ASSERT(m_sequence);

  if (m_string)
  {
    CoTaskMemFree(m_string);
  }

  if (S_OK != m_sequence->Next(1, &m_string, nullptr))
  {
    m_string = nullptr;
   }

  return *this;
}

Given that the sequence in this case is just a ComPtr<IEnumString>, I can simply provide begin and end functions to produce the necessary half-open range:

auto begin(ComPtr<IEnumString> const & sequence) -> EnumStringIterator
{
  auto i = EnumStringIterator { sequence.Get() };
  ++i;
   return i;
}

auto end(ComPtr<IEnumString> const & sequence) -> EnumStringIterator
{
  return EnumStringIterator { sequence.Get() };
}

And that's about it. I can now write the following range-for loop, and the compiler is quite happy to chew on it for me:

for (auto suggestion : suggestions)
{
  printf("%S\n", suggestion);
}

The amazing thing is that this code is no more costly than the hand-rolled implementation introduced in my previous column. It's also much simpler and safer, since the iterator ensures that each string is freed in due course. Listing 1 shows the complete iterator class. It's certainly a minimal iterator implementation, but that's all that's required for the range-for statement to work. You can also take these same techniques and apply them to a variety of other logical sequences to produce a more elegant and modern programming model.

Listing 1: The Complete Iterator Class
class EnumStringIterator
{
  IEnumString * m_sequence;
  wchar_t * m_string;

  EnumStringIterator(EnumStringIterator const &);
  auto operator=(EnumStringIterator const &) -> EnumStringIterator &;

public:

  explicit EnumStringIterator(IEnumString * sequence) throw()
    : m_sequence { sequence },
      m_string {}
  {
  }

  EnumStringIterator(EnumStringIterator && other) throw()
    : m_sequence { other.m_sequence },
      m_string { other.m_string }
  {
    other.m_sequence = nullptr;
    other.m_string = nullptr;    
  }

  ~EnumStringIterator()
  {
    if (m_string)
    {
      CoTaskMemFree(m_string);
    }
  }

  auto operator=(EnumStringIterator && other) throw() -> EnumStringIterator &
  {
    if (this != &other)
    {
      m_sequence = other.m_sequence;
      m_string = other.m_string;

      other.m_sequence = nullptr;
      other.m_string = nullptr;
    }

    return *this;
  }

  auto operator*() const throw() -> wchar_t const *
  {
    ASSERT(m_sequence);
    ASSERT(m_string);

    return m_string;
  }

  auto operator++() throw() -> EnumStringIterator &
  {
    ASSERT(m_sequence);

    if (m_string)
    {
      CoTaskMemFree(m_string);
    }

    if (S_OK != m_sequence->Next(1, &m_string, nullptr))
    {
      m_string = nullptr;
    }

    return *this;
  }

  auto operator==(EnumStringIterator const & other) const -> bool
  {
    ASSERT(m_sequence == other.m_sequence);

    return m_string == other.m_string;
  }

  auto operator!=(EnumStringIterator const & other) const -> bool
  {
    ASSERT(m_sequence == other.m_sequence);

    return m_string != other.m_string;
  }
};

auto begin(ComPtr<IEnumString> const & sequence) -> EnumStringIterator
{
  auto i = EnumStringIterator { sequence.Get() };
  ++i;
  return i;
}

auto end(ComPtr<IEnumString> const & sequence) -> EnumStringIterator
{
  return EnumStringIterator { sequence.Get() };
}

About the Author

Kenny Kerr is a computer programmer based in Canada, an author for Pluralsight and a Microsoft MVP. He blogs at kennykerr.ca and you can follow him on Twitter at twitter.com/kennykerr.

comments powered by Disqus

Featured

  • Build Your First AI Applications with Local AI

    "AI right now feels like a vast space which can be hard to jump into," says Craig Loewen, a senior product manager at Microsoft who is helping devs unsure about making that first daunting leap.

  • On Blazor Component Reusability - From Day 0

    "We want to try to design from Day One, even Day Zero, with reusability in mind," says Blazor expert Allen Conway in imparting his expertise to an audience of hundreds in an online tech event on Tuesday.

  • Decision Tree Regression from Scratch Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of decision tree regression using the C# language. Unlike most implementations, this one does not use recursion or pointers, which makes the code easy to understand and modify.

  • Visual Studio's AI Future: Copilot .NET Upgrades and More

    At this week's Microsoft Ignite conference, the Visual Studio team showed off a future AI-powered IDE that will leverage GitHub Copilot for legacy app .NET upgrades, along with several more cutting-edge features.

  • PowerShell Gets AI-ified in 'AI Shell' Preview

    Eschewing the term "Copilot," Microsoft introduced a new AI-powered tool for PowerShell called "AI Shell," available in preview.

Subscribe on YouTube