VSM Cover Story

Implement Custom Generic Collections

Take advantage of custom generic collections to manipulate data more easily in real-world applications.

Technology Toolbox: Visual Basic, C#

Manipulating data is central to almost everything you do as a developer. You know the routine: Pull the data from a database, store it in a Dataset, and display it to the user. Often, the application requires that you store the data as classes in an array or collection.

The introduction of .NET gave you new ways to manipulate data, including ArrayLists in .NET 1.x and generics in .NET 2.0. These new tools were welcome additions to the developer toolbox, but both have been somewhat abused in demos and short article samples.

Interesting demos are one thing; the real world is another. Many of the samples you see for manipulating data within .NET—whether for ArrayLists in .NET 1.x or generics in .NET 2.0—have fundamental holes that can make them next-to-useless in a real-world project. It's something we're used to as developers, of course. You go to a conference, you see an amazingly slick demo that helps you solve a particular problem, but when you go to try it out yourself you begin to discover the myriad ways in which the short demo you witnessed simply doesn't live up to expectations when building practical business applications.

In many cases, it's not the fault of the tool, but of the people who want to show off what the tool can do. To make a point, people often opt for something sharp and flashy that over-simplifies how you should actually use a given tool. So you see absurdly simple examples of how to build a Web service that distort the real work involved in creating something robust enough for a business environment. Similarly, some of the examples that purport to show off generics suffer from overly simplistic code snippets that illustrate the general utility of generics, but aren't remotely adequate for implementing any real-world solutions.

That said, generics are a great tool for manipulating data that you need to store in an array or collection, especially when you can implement custom generic collections that give you both added robustness and more maintainable code. I'll walk you through the process of implementing custom generic solutions, pointing out a couple of the important caveats you need to keep in mind along the way. I'll also discuss some of the alternative approaches that aren't as robust, illustrating how they don't live up to the demands of most business applications. As an added bonus, I'll provide examples in both VB.NET and C#, so you can try these solutions using your preferred .NET language.

The sample class itself I've used exposes techniques for examining basic baseball statistics. I admit that tracking batting averages isn't a common business problem, unless you happen to be a general manager for a baseball team or perhaps work for the Elias Sports Bureau, but you can easily adapt this example to something more prosaic and common, such as collections of employees or collections of customer accounts. How you adapt the code will depend on your own specific circumstance, but the class itself is highly adaptable and illustrates the important points of implementing generic custom collections (see Figure 1).

Don't Try This at Work
It might be helpful to begin by looking at a couple examples where you need to store data in an array or collection. In .NET 1.x, a common technique was to add objects into an ArrayList. An almost equally common objection to this approach was that it isn't "type safe." Consider an implementation of a sample class using ArrayLists (see Listing 1).

The class itself is rather simple. The constructor for the class is overloaded to accept basic data, advanced data, or no data at all. Using the class is also simple:

' VB
Dim plyrs As ArrayList = New ArrayList(10)
plyrs.Add(New Player("Blalock", 0.266, 0.401, 591))
plyrs.Add(New Player("Young", 0.314, 0.459, 691))
plyrs.Add(New Player("Teixeira", 0.282, 0.514, 628))
plyrs.Add(New Player("Kinsler", 0.286, 0.454, 423))
plyrs.Add(New Employee())

For Each p As Player In plyrs
   Dim s As String = p.LastName
Next

// C#
ArrayList players = new ArrayList(10);
players.Add(new Player("Blalock", 0.266, 0.401, 591));
players.Add(new Player("Young", 0.314, 0.459, 691));
players.Add(new Player("Teixeira", 0.282, 0.514, 628));
players.Add(new Player("Kinsler", 0.286, 0.454, 423));
players.Add(new Employee());
foreach (Player p in players)
{
   String s = p.LastName;
}

You can add new Player objects to the ArrayList, as well as an instance of an Employee class. Note that this code compiles, even if the Employee class doesn't expose a Name property. The problem arises when iterating through the players ArrayList object. When the iteration hits the instance of the Employee class, the attempted assignment of the Employee instance to the variable p raises an exception. All is not lost; the fix is to write a custom ArrayList that restricts someone who uses the class to adding only a single type to the collection.

Of course, you might be saying: Hey, we have .NET 2.0 now, so generics make this all a moot point. You would be correct, too. In .NET 2.0, you can use a generic collection to specify what type you accept in your collection. For example, this code uses a standard generic List<> construct to create a type specific collection:

'
' VB
Dim playerList As List(Of Player) = _
    New List(Of Player)()
playerList.Add(New Player())
playerList.Add(New Player())
playerList.Add(New Player())
playerList.Add(New Employee())

// C#
ArrayList players = new ArrayList(10);
players.Add(new Player());
players.Add(new Player());
players.Add(new Player());
players.Add(new Employee());

I should add that this isn't technically a collection, but it is in the Collections.Generic namespace, and it performs like a collection. So, for the sake of convenience, I'll refer to it as a collection.

In this case, Visual Studio's compiler generates an error when you try to compile this code because the addition of an instance of an Employee class is not valid for the players list.

But consider this twist, based on the fact that your users often have trouble making up their minds. You create this great form, gather some data, and display it on your beautifully laid out form. Then the user adds new players to the collection, removes a few others, but then decides to cancel all those changes and return to the previous state. Or, assume the user accepts the work just performed, and you remove the newly deleted Players from the List<>.

Now you face the question: How do you know which players to delete when you try to update the database? The items are now gone from your List <> object. The same problem exists for newly added items: How do you know which ones were added when you have a List<> object?

Build Your Own Generic
This is one of those cases where the neat demo just doesn't make it in the real world. Obviously, the List<> generic is good for basics, but not for more advanced data management. Addressing any of these scenarios is a lot of work. Fortunately, you have many ways to address this particular problem. One technique is to add IsNew and IsDeleted properties to your Player class and set these properties appropriately when the objects are created or deleted in the List<>. That can work, but I'll show you an interesting alternative approach, where you customize your collection with additional code to handle the addition and deletion of Player objects.

Consider the fact that a List<> class is nothing more than a container for instances of a single class. There is no mechanism built in to handle adding methods or properties to the List<>. To get the desired result, you need to write a class that is based on one of the .NET 2.0 generic constructs.

Initially, I fought hard to make generics work for me by deriving my class from a generic construct. It never worked as smoothly or as cleanly as I wanted it to. After several more iterations, I realized that I could create my own generic class, derived from a tried and true friend, CollectionBase:

' VB
Public Class CollectionEx(Of T)
   Inherits System.Collections. _
      CollectionBase
End Class

// C# 
public class CollectionEx : 
System.Collections.CollectionBase
{
}

You need to add the basic collection functionality a user would expect to this class. The CollectionBase class exposes an IList that contains the list of elements stored in the instance. The methods of the IList consist of the common methods a user would expect of your CollectionEx class, such as Insert, Add, and Remove. Implementing the VB version of the code is simple enough:

Public Function Contains(ByVal _
   item As T) As Boolean
   Return MyBase.List.Contains(item)
End Function

Public Function IndexOf(ByVal _
   item As T) As Integer
   Return MyBase.List.IndexOf(item)
End Function

Public Sub Insert(ByVal _
   index As Integer, _
   ByVal item As T)
   MyBase.List.Insert(index, item)
End Sub

Default Public Property Item(ByVal _
   index As Integer) As T
   Get
      Return CType( _
         MyBase.List(index), T)
   End Get

   Set(ByVal Value As T)
      List(index) = Value
   End Set
End Property

Public Function Add(ByVal _
   value As T) As Integer
   Return MyBase.List.Add(value)
End Function

Public Sub Remove(ByVal item As T)
   MyBase.List.Remove(item)
End Sub

The C# code is a bit more succinct:

public bool Contains(T item)
{
   return base.List.Contains(item);
}

public int IndexOf(T item)
{
   return base.List.IndexOf(item);
}

public void Insert(int index, T item)
{
   base.List.Insert(index, item);
}

public virtual int Add(T value)
{
   return base.List.Add(value);
}

public void Remove(T item)
{
   base.List.Remove(item);
}

The collection also incorporates other standard methods, such as Count, Clear, and Capacity. You implement these additional methods just as you do Insert, Add, and Remove, passing the call on to the inner List object instead of the IList object of the CollectionBase.

Keep Track of Added and Deleted Items
We've covered the basics, so let's reconsider the initial problem of having your class keep track of added and deleted items. Start by adding two private ArrayLists to your CollectionEx class. You use these ArrayLists to keep a record of what the user adds and removes (see Listing 2).

This code also illustrates how to implement methods to add new objects to the collection, how to remove objects from the collection, and how to reset the collection to its original state. Note the additional step required to remove an item from your collection. Calling the Remove method on a standard collection and passing an object that isn't contained in the collection is perfectly valid because it results in no change to the collection. However, you do need to avoid marking an item that never existed as deleted in this custom generic collection. This means you must first check to see whether the object exists in your internal List<> before adding it to your deleted list. The sample class also includes a new set of methods that deal with the status of your collection: Has the collection changed? Or, to put it in common programming parlance: Is it dirty?

This brings us to one of the more difficult aspects of implementing a solution that relies on custom collections. To wit, when is a collection considered "dirty"? You must address this issue in several instances, including when you populate a collection initially. Calling the Add method when you load a collection with data from any source marks the collection as dirty, so you must set the IsDirty property to false before you let your application use the collection.

You must also consider and address the situation where the user wants to cancel any changes to the collection and put the collection back to its original state. The code in Listing 2 also includes a Reset method that does precisely that. This method walks through the class's internal ArrayLists and either removes the added items or puts the deleted items back in the list. I have seen several implementations of custom lists that store the original list internally, using that list to reset a list to its original state. The problem with that approach is that a collection can hold hundreds or thousands of objects. Storing a duplicate of the entire list would require a rather large working set in your application. Typically, your users would be making fewer than 100 additions or deletions to a collection during any one session, so the sample in this article assumes that it's better to store only the changes, and rebuild the collection as required by performing simple adds and deletes.

Determine When Data Changes
The complexity of deciding whether something is dirty comes when changes are made to the data in the classes that the collection contains. Determining whether the data has been changed in any single data instance stored in your collection requires that you support an IsDirty property on the T type of your collection. Use a read-only IsDirty property to check whether any child class in your collection has been changed:

' VB
Public ReadOnly Property IsDirtyChildren() As Boolean
   Get
      Dim obj As T
         For Each obj In MyBase.List
            If obj.IsDirty Then
               Return True
            End If
         Next
      Return False
   End Get
End Property

// C#

public bool IsDirtyChildren
{
   get 
   {
      foreach (T obj in base.IList)
      {
         if (obj.IsDirty)
            return true;
      }
   return false;
   }
}

Note that this code won't compile because the compiler doesn't know that T contains the property IsDirty. You can fix this by adding a constraint to the CollectionEx definition that tells the compiler that what you want to do is supported by the class you want to add:

' VB
 _
Public Class CollectionEx(Of T As ICollectionEx)
   Inherits System.Collections.CollectionBase

Public Interface ICollectionEx
   Property IsDirty() As Boolean
   Property IsDeleted() As Boolean
   Property IsNew() As Boolean
End Interface

// C#
[Serializable]
public class CollectionEx : 
   System.Collections.CollectionBase 
   where T : ICollectionEx

public interface ICollectionEx
{
   bool IsDeleted
   {   get;set;    }
   bool IsDirty
   {   get;set;    }
   bool IsNew
   {   get;set;    }
}

Your code will compile once you make this change. Using the constraint requires that any class that you add to this collection implements the ICollectionEx interface. Once you define the constraint, the CollectionEx<> class can reference any methods or properties defined in the interface. Let's revisit the class definition from Listing 1, first using VB:

<Serializable()> 
Public Class Player
Implements ICollectionEx
Private _isDirty As Boolean = False
Private _isNew As Boolean = False
Private _isDeleted As Boolean = False

Public Property IsDirty() As Boolean
   Get
      Return _isDirty
   End Get
   Set(ByVal value As Boolean)
      _isDirty = value
   End Set
End Property

Public Property IsNew() As Boolean
   Get
      Return _isNew
   End Get
   Set(ByVal value As Boolean)
      _isNew = value
   End Set
End Property

Public Property IsDeleted() As Boolean
   Get
      Return _isDeleted
   End Get
   Set(ByVal value As Boolean)
      _isDeleted = value
   End Set
   End Property
End Class

And here's the same concept, this time using C#:

[Serializable]
public class Player : ICollectionEx
{
   private bool _isDirty = false;
   private bool _isNew = false;
   private bool _isDeleted = false;

   public bool IsDirty
   {
      get { return isDirty; }
      set { isDirty = value; )
   }
   public bool IsNew
   {
      get { return isNew; }
      set { isNew = value; )
   }
   public bool IsDeleted
   {
      get { return _isDeleted; }
      set { _isDeleted = value; )
   }
}

At this point, you've implemented the ICollectionEx interface and supplied the required properties. Note that you've also added the Serializable attribute to the class.

Copy, Find and Sort
The next step is to add more functionality to your collection class, which will make it an even more powerful tool in your arsenal. You can add a Copy feature quite easily using VB:

Public Function Copy() As CollectionEx(Of T)
   Dim list As CollectionEx(Of T) = Nothing

   Dim rawBytes() As Byte = Nothing
   Try
      Dim streamWriter As MemoryStream = _
         New MemoryStream()
      Dim serializerWriter As BinaryFormatter = _
         New BinaryFormatter()
      serializerWriter.Serialize(streamWriter, Me)
      rawBytes = streamWriter.GetBuffer()
   Finally
   End Try

   Try
      Dim streamReader As MemoryStream = _
         New MemoryStream(rawBytes)
      Dim serializerReader As BinaryFormatter = _
         New BinaryFormatter()
      list = CType(serializerReader.Deserialize( _
      streamReader), CollectionEx(Of T))
   Finally
   End Try
   Return list
End Function

The C# version of this code is also easy to implement:

public CollectionEx<T> Copy()
{
   CollectionEx<T> list = null;

   byte[] rawBytes = null;
   using (MemoryStream streamWriter = 
      new MemoryStream())
   {
      BinaryFormatter serializerWriter = 
         new BinaryFormatter();
      serializerWriter.Serialize(streamWriter, this);
      rawBytes = streamWriter.GetBuffer();
   }

   using (MemoryStream streamReader = 
      new MemoryStream(rawBytes))
   {
      BinaryFormatter serializerReader = 
         new BinaryFormatter();
      list = serializerReader.Deserialize(streamReader) 
         as CollectionEx<T>;
   }

   return list;
}

Most developers -- and I used to be among their number -- start a deep copy by looping through the existing collection and adding a copy of each object to a new collection. But using serialization instead means that you can create a deep copy simply by serializing the collection to memory and deserializing it back into a new collection class. This approach is simple, easy, and effective. For this to work, however, your class must add the [Serializable] attribute to its definition.

You've dispatched the Copy function with relative ease; next, you want to look at sorting your collection. The first difficulty you must address is the fact that with a generic collection you can't determine which criteria will be used to search and/or sort on. As authors of the collection, you have absolutely no idea what types of classes your users will use or what criteria they will want to use when sorting or finding items in the collection. Remember that this is a generic collection. Many examples assume a search on such good old standbys as a name, an ISBN number, or even a price. But you will have to show some creativity to create a truly generic collection that can be sorted. Classes hold data, and that data is normally accessed through the use of properties. Each use of the collection might involve a different class or set of classes, and the only commonality among all of the potential classes is that they will all—one hopes—expose properties. For example, consider the Player class. You could sort the collection by LastName, SluggingBasePercentage, or BattingAverage because the class exposes all these characteristics as properties:

' VB
players = New CollectionEx(Of Player)(10)

...

players.Sort("Average", ListSortDirection.Ascending)

Public Sub Sort(ByVal propertyName As String, _
   ByVal sortDirection As ListSortDirection)
   InnerList.Sort(New Comparer(propertyName, _
      sortDirection))
End Sub
Public Sub Sort(ByVal propertyName As String)
   InnerList.Sort(New Comparer(propertyName))
End Sub

// C# 
CollectionEx players = 
   CollectionEx(10);

...

players.Sort("FullName");
   public void Sort(String propertyName, 
      ListSortDirection sortDirection)
   {
      InnerList.Sort(new Comparer(propertyName, 
         sortDirection));
   }
   public void Sort(String propertyName)
   {
      InnerList.Sort(new Comparer(propertyName));
   }

The full implementation of the Comparer class is a bit more involved (see Listing 3). The real trick of this class is that it takes advantage of reflection, using the property name to get values from the objects that you use to make your comparisons (see sidebar "Using Reflection"):

You implement the find functionality similar to the way you implement the sort techniques. The IndexOf methods of the basic IList performs a find on a single exact object match. The Find method (see Listing 4) enables a user find any objects in the collection that match the values of any single property of the object. Here is the VB version:

CollectionEx ps = _
   players.Find("LastName", "Fergus")
CollectionEx ps = players.Find( _
   "SluggingPercentage", _
   0.650, FindRange.GreaterThan)

Public Enum FindRange
   Equal = 0
   GreaterThan
   LessThan
End Enum

Public Function Find(ByVal PropertyName _
   As String, ByVal ValueToFind As String, _
   ByVal compareType As FindRange) _
   As CollectionEx(Of T)
   For Each item As T In Me
      Dim pi As PropertyInfo = _
         item.GetType.GetProperty(PropertyName)
      Dim valueOfX As Object = _ 
         pi.GetValue(item, Nothing)

      If CompareValues(valueOfX, ValueToFind, _
         compareType) Then
         List.Add(item)
      End If
   Next
   Return List
End Function

The C# version is straightforward:

players.Find("OnBasePercentage", 0.300, 
   FindRange.GreaterThan);
players.Find("LastName", "Fergus");
players.Find("ERA", 2.50, FindRange.LessThan);

public enum FindRange
{
   Equal = 0,
   GreaterThan,
   LessThan
}

This implementation also lets you specify whether you want to base your search on something more than just equality. For example, this Find code returns any player who is batting better than .300 or who has an ERA of less than 2.50.

Add the Nice Touches
As promised, the CollectionEx class now has a considerable amount of functionality to aid you in your next project. However, any good programmer knows there is always more that you can do to make the class better and more useful. For example, the CollectionBase class supports events that occur when items are added, deleted, changed, and a few others. The final code includes overrides for all of the possible events added to your class. I'm betting you can find a good use for them. One use for a couple of the events is to handle tracking deletions and additions to your collection:

' VB
Protected Overrides Sub OnClear()
   MyBase.OnClear()
End Sub

Protected Overrides Sub OnClearComplete()
   MyBase.OnClearComplete()
End Sub

Protected Overrides Sub OnRemove(ByVal index _ 
   As Integer, ByVal value As Object)
   MyBase.OnRemove(index, value)
End Sub

Protected Overrides Sub OnRemoveComplete( _
   ByVal index As Integer, ByVal value As Object)
   MyBase.OnRemoveComplete(index, value)
End Sub

// C#
protected override void OnRemove(int index, 
   object value)

{
   base.OnRemove(index, value);
}

protected override void OnRemoveComplete(int index, 
   object value)

{
   _deletedList.Add(value);
   base.OnRemoveComplete(index, value);
}

protected override void OnInsertComplete(int index, 
   object value)

{
   base.OnInsertComplete(index, value);
   _addedList.Add(value);
}

The IList events come in pairs: "before" and "after." This pair of events gives you two chances to make changes to your data. In this code, the events enable you to add and delete objects from your internal lists. Putting the code in the events would enable you to remove the objects from the Remove and Add methods on your class. It's quite possible that you will find a use for IList events in your own applications, but trying to reset your collection back to its original list of objects is much trickier if you use events, so for this example you should keep the list maintenance code of the CollectionEx class in your Remove and Add methods.

Using an ArrayList to store a collection of objects will likely prompt the realization that you need to use an array of the objects instead. The code would look like this in .NET 1.x:

' VB
Dim pArray1 As Player() = _
   CType(plyrs.ToArray(GetType(Player)), Player())

// C#
ArrayList plyrs = new ArrayList(10);

However, taking advantage of generics makes it much easier to implement the ToArray method in your class:

' VB
Public Function ToArray() As T()
   Return CType(Me.InnerList.ToArray(GetType(T)), T())
End Function

Dim pArray1 As Player() = players.ToArray()

// C#
public T[] ToArray()
{
   return (T[])this.InnerList.ToArray(typeof(T));
}

Player[] pArray = players.ToArray();

Simply put, generics make the handling of collections of objects much easier to use and support. Using a mixture of collection objects and generics, you now have a functional class to use in your next project when you start handling and storing data. With a little more imagination and creativity, you can continue to add and expand the CollectionEx class as you feel necessary. VSM

comments powered by Disqus

Featured

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

  • .NET 9 Preview 3: 'I've Been Waiting 9 Years for This API!'

    Microsoft's third preview of .NET 9 sees a lot of minor tweaks and fixes with no earth-shaking new functionality, but little things can be important to individual developers.

  • Data Anomaly Detection Using a Neural Autoencoder with C#

    Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.

  • What's New for Python, Java in Visual Studio Code

    Microsoft announced March 2024 updates to its Python and Java extensions for Visual Studio Code, the open source-based, cross-platform code editor that has repeatedly been named the No. 1 tool in major development surveys.

Subscribe on YouTube