Code Focused

Use Iterators in VB Now

TECHNOLOGY TOOLBOX: VB.NET

In Visual Studio 10, Visual Basic and C# will officially attempt to converge rather than diverge. The two languages are touted by Microsoft to have the same major language features as each other; Microsoft also promises "language parity" and "an end to the sibling rivalry." For C#, this means the inclusion of long-standard VB features such as late binding and optional parameters. For VB, it means the inclusion of multi-expression lambdas and auto properties. This parity is not absolute: For example, C# won't have XML literals, making VB still the best language for working with XML. One thing VB is not likely to get in the next release is iterators.

Iterators in C# are a language feature that enables you to write a complete implementation for an enumerable object in a single method. An enumerable object is one that you can perform For Each loops or LINQ queries on. Arrays, Lists, and Collections are all common examples of enumerable objects. In fact, enumerable objects are so common in the .NET Framework that you rarely need to define your own. Still, it's important you understand the basics of what enumerable objects are and how to create them should the need ever arise.

Let's kick things off with some common examples for iterators you'll see in C#. The first is from the C# documentation:

public int[] items;

public IEnumerable BuildCollection()
{
    for (int i = 0; i < items.Length; i++)
    {
        yield return items[i];
    }
}

I hope you're laughing at that sample. If you aren't, some explanation might help you see the humor. The code shows a method that returns an IEnumerable: This is what an iterator is in C#. The "yield return" statement indicates what is returned when the code is enumerated with a For Each loop or similar. So, this code iterates over an array and returns each item in the array. In other words, it is superfluous and just adds overhead, as an array is already enumerable. That method could have just returned the array cast to an IEnumerable.

You might think that's just a one-off, bad-code example. Sadly, I have seen many cases where iterator blocks are superfluous. Superfluous code is an evil because it adds to the testing and maintenance burdens. But not all cases are superfluous. Bill Wagner in his December 2008 C# column ("What VB Devs Should Know About C#") showed a good example of creating an iterator that will give you the letters "a" to "z" as you enumerate it:

public static IEnumerable<char> Letters()
   {
      char currentCharacter = 'a';
      do
      {
         yield return currentCharacter;
      } while (currentCharacter++ < 'z');
   }

This code shows an important aspect of iterators: They're state machines. The variable currentCharacter is modified each time you enumerate the IEnumerable returned from Letters. To achieve the same result in VB could require some substantial work on your part, but there are, as they say, many ways to skin a cat (note: no felines were harmed in the production of this code). For example, you could create an array of Char, populate it, and return that:

Dim letters(0 To 25) As Char
For i As Int32 = 0 To 25
   letters(i) = ChrW(&H61 + i)
Next

This will work, but it's a bit messy. An alternative approach is to refactor this method. The code inside the For i As Int32 = 0 to 25 loop is an expression in a range, so you can use the Enumerable.Range method as part of a LINQ expression:

Dim letters = From i In Enumerable.Range(0, 26) _
   Select ChrW(&H61 + i)

This is a little better, but there are other ways to approach this problem, as well. A String is in fact enumerable and implements IEnumerable(Of Char), so you can simplify everything to a single string declaration:

Dim letters = "abcdefghijklmnopqrstuvwxyz"

This alternative certainly seems far more maintainable and a lot more obvious too.

Of course, this example is a simple one. If it were more complex, such as fetching values from somewhere or calculating the values, the differences between the solutions become much more meaningful. Both the first and last solution are based on having the complete set of characters in memory to begin with, whereas the solution that uses the Enumerable.Range method loads the characters only when you iterate over the letters. This is often referred to as "delayed evaluation."

Delayed evaluation, or "on-demand evaluation" as I prefer to call it, is an important aspect of iterators and especially of LINQ. It's also something you need to be wary of when translating C# code to VB or when you look for alternatives to iterators. Typically, LINQ expressions and utility methods such as Range provide solutions that have the desired, on-demand evaluation behavior.

However, there are times when you will find it necessary to write an iterator. One such case occurred for me recently. I wanted to read from a stream line-by-line, search for a substring, and return the first couple of matches. Unfortunately, StreamReader doesn't have an iterator that returns each line. Without something that is enumerable, you can't form the basis of a LINQ query, so you're forced to write your query imperatively. That's not as bad as it might first sound; VB developers were doing this successfully for years before LINQ came along:

strLine = myStreamReader.ReadLine
Do While strLine IsNot Nothing
   If strLine.Contains(value) Then
      count += 1
      resultList.Add(strLine)
      If count >= 3 Then Exit Do
   End If
   strLine = myStreamReader.ReadLine
Loop

Building the query into the imperative procedural block does work, although it's kind of on the ugly side. I look at that code, and it screams to me that it needs to be beautified. I'm not the type of person who tends to talk about beautifying code -- the May 2007 On VB column headline, "Beautify Your Code," was created by an editor -- but an extension method is the kind of thing that can help you add some beauty to this code.

If you have an extension method named Lines that returns an IEnumerable with each item a line, then you can create a query like this:

Dim result = From line In myStreamReader.Lines _
    Where line.Contains(value) _
    Take 3

I think that approach is a clear winner for the most beautiful solution. The difficulty is in creating the Lines extension method. If you don't mind having a C# assembly as a reference, it's a simple task to create an Extension method that's an iterator:

static class StreamExtensions
{
   static public IEnumerable<String> Lines(
      this TextReader rdr)
   {
      String line;
      while ((line = rdr.ReadLine()) != null)
      {
         yield return line;
      }
   }
}

This can take substantially more work in VB. You have to implement IEnumerable(Of T) and IEnumerator(Of T), which also includes the non-generic IEnumerable, IEnumerator, and IDisposable interfaces. That's a class (or classes) with at least seven members, eight if you include the constructor (see Figure 1). The good news is that most of these methods are typically implemented with the same boilerplate code. This means that you can streamline creating iterators in VB by using techniques such as snippets, templates, or generic classes.

You must implement three methods for the IEnumerator interface: MoveNext, Reset, and Current, a ReadOnly property. Reset is seldom used. It can't be used inside a For Each loop, so for most of your implementations you can Throw a NotImplementedException as the body of the Reset method. The important method is MoveNext. In the MoveNext method, you determine whether there are more items to return. If there are, you store the next item to return in the Current property. This means the Current property needs to return only a field.

Next, add the generic interface IEnumerator(Of T). Then add Current As T, a strongly typed ReadOnly Property, as well as an IDisposable.Dispose implementation. At this point, you have two ReadOnly properties named Current that won't compile, or you have one named Current and the other named Current1. A good way to deal with this is to make the least strongly typed method Private. This allows you to give the method a name for internal use. You can still access the private method externally through a cast to the interface:

Private m_Current As T

Public ReadOnly Property _
   Current() As T _
   Implements IEnumerator( _
   Of T).Current
   
   Get
      Return m_Current
   End Get

End Property

Private ReadOnly Property _
   IEnumerator_Current() _
   As Object _
   Implements IEnumerator.Current
   Get
      Return Current
   End Get
End Property

You must also implement IEnumera­ble(Of T). This includes a single function (GetEnumerator), in which you return your IEnumerator(Of T) implementation. As part of the IEnumerable(Of T) interface, you must also implement the non-generic IEnumerable interface. IEnumerable also has a GetEnumerator function, but its return type is the non-generic IEnumerator. As you did with the Current property and IEnumerator, make the IEnumerable GetEnumerator method Private and have it call the more strongly typed generic method:

Public Function GetEnumerator() _
   As IEnumerator(Of T) _
   Implements IEnumerable( _
   Of T).GetEnumerator
   'TODO: add implementation here
End Function

Private Function _
   IEnumerable_GetEnumerator() _
   As IEnumerator Implements _
   IEnumerable.GetEnumerator
   Return GetEnumerator()
End Function

The code you put inside the GetEnumerator method deserves careful consideration. You need to think through how you expect these implementations to be used, as well as how they will behave in cases such as multi-threaded environments or when changes occur. Enumerators are normally considered to be a snapshot. For example, List(Of T)'s enumerator will throw a runtime exception if the list is changed during a For Each iteration. Other iterators might just ignore changes. In the case of a stream, your decision is a bit tougher.

You change a stream's position as you read from it. For example, if there are two iterators reading from the one stream, the results could be unpredictable, changing depending on threading behaviors. There are two ways around this problem: Use a null stream and have the iterators return no data, or throw an exception in GetEnumerator if it has been called previously. Throwing an exception makes it easier to debug what went wrong. Whichever choice you make, take the time to document it by completing the XML comments on the method:

''' <summary>
'''    Get's an enumerator to read lines 
'''	   from a stream</summary>
''' <returns>IEnumerator(Of String)</returns>
''' <remarks>Can only be called once due to the 
'''  underlying stream.
'''  Not thread safe.</remarks>
''' <exception cref="InvalidOperationException">
''' thrown if GetEnumerator is called more than once
''' </exception>
Public Function GetEnumerator() As IEnumerator(Of String) _
   Implements IEnumerable(Of String).GetEnumerator
   Static iFirstTime As Int32
   If iFirstTime = 0 AndAlso _
     Threading.Interlocked.Increment(iFirstTime) = 1 Then
      Return Me
   Else
      Throw New InvalidOperationException( _
       "GetEnumerator can only be called once on a stream")
   End If
End Function

There are a couple of points worth discussing in this code. The code relies on a static variable inside the code to check whether the method has been called previously, and uses Threading.Interlocked.Increment to change the value. This ensures that it can be called only once, even in a thread-race condition. If it has been called more than once, an InvalidOperationException is thrown. Note that the method returns a Me reference when called for the first time. This is because the class implements both the IEnumerable(Of T) interface and the IEnumerator(Of T) interface. You can separate them into different classes, but often it's useful to have them in a single class because you generally need to store some shared state in them (see Listing 1 for a complete example of a StreamReaderLineEnumerator).

The VB version is obviously lengthier than the C# version, but the VB version has some significant benefits. First, your implementation is likely to be more robust because you've been forced to look at the implementation details. Second, the C# version doesn't prevent GetEnumerator being called more than once on the same stream. This could result in some obscure threading problems. It could also produce seemingly null result sets for files if the iterator is called twice. Or, it could potentially result in some object-disposed exceptions occurring that aren't explicit in describing the real problem. The C# version does provide a quick-and-dirty way to write the iterators: The quick part is inviting, but it's often the dirty part that we spend a lot more time cleaning up after.

It's important to note that iterators in C# aren't always dirty. Sometimes they do provide an elegant and clean way to express an IEnumerable. And it's likely VB will get a similar syntax in the not-too-distant future. In VB10, you will be able to use multi-statement lambdas that capture surrounding variables. If you're willing to accept the same limitations that iterators have in C#, then you can use a lambda as the basis for the MoveNext method and combine this with a generic template to give you the same functionality from VB.

To do this, you need to define a delegate in your generic template that has the current item as a ByRef parameter:

Public Delegate Function MoveNextFunc( _
   ByRef nextItem As T) As Boolean

Your generic iterator takes a delegate in its constructor, stores a reference to it, and calls that in the MoveNext method. Once you define the GenericIterator class (see Listing 2), you can use this much as you use inline iterators in C#. For example, you can create a line-by-line iterator with a StreamReader:

Dim lines = New GenericIterator(Of String) _
   (Function(ByRef nextItem As String) As Boolean
      nextItem = rdr.ReadLine
      Return nextItem IsNot Nothing
   End Function)

The lack of a formal syntax for iterators in VB doesn't mean you can't create iterators in VB. In fact, the download for this article includes a snippet, templates, and examples of how you can write iterators easily in the current version of VB. The download also includes an example that illustrates how to use a generic template and lambdas in VB10. Using the downloadable snippet or templates in the current version of VB, you can quickly create iterators that are more flexible and more robust than what the C# iterator syntax allows for.

As LINQ and PLINQ become more a part of your programming, your reliance on iterators will increase, so now's a good time to learn how to write them properly. Being exposed to the implementation details will give you a better understanding of iterators and their pitfalls, which means you'll be less likely to yield to the temptation of using them poorly when the day comes that VB too adds some syntactic sugar for creating iterators.

So grab these downloads and start writing your own iterators today.

About the Author

Bill McCarthy is an independent consultant based in Australia and is one of the foremost .NET language experts specializing in Visual Basic. He has been a Microsoft MVP for VB for the last nine years and sat in on internal development reviews with the Visual Basic team for the last five years where he helped to steer the language’s future direction. These days he writes his thoughts about language direction on his blog at http://msmvps.com/bill.

comments powered by Disqus

Featured

Subscribe on YouTube