Code Focused

Inside Arrays

Drill down on how arrays work and learn some cool tips and tricks for taking advantage of them in .NET.

TECHNOLOGY TOOLBOX: VB.NET

Arrays are fundamental building blocks for in-memory storage. Arrays are pervasive in .NET, forming the major backbone of many classes, including virtually all collections. Yet many myths surround arrays, their type safety, and their ability to have non-zero bounds. Where there are myths, there are often coding practices that fail to recognize true best practices.

This article explores the intricate details of arrays in .NET. I'll explain how they work, how they came to work as they do, and then conclude with some tips for safe and flexible working practices when utilizing arrays in .NET.

A bit of history often helps us understand why or how we arrived at the present; arrays in .NET are no different in this regard. Arrays are built on the history of Windows programming. Simple arrays in C are purely blocks of memory, and there's no bounds checking, which means it's possible for the programmer to reference inadvertently a block of memory he or she shouldn't. In COM and earlier versions of VB, the OLE SafeArray added type information and bounds for each dimension. VB6 wrapped the interface to a SafeArray with language constructs to attempt to ensure safety, but the underlying SafeArray structure could be hacked easily, and the pointer to the data modified or the bounds modified, such that the relation between what was allocated and what was actually being referenced could be completely lost. That wasn't necessarily a bad thing: VB developers -- myself included -- leveraged this fact to simulate pointers in VB6 and earlier.

The evolution continued with .NET, which set out to provide feature rich, yet extremely safe arrays. Safety in .NET arrays includes bounds checking, fixed-size allocation, and type checking. Microsoft also wanted arrays in .NET to have great performance characteristics. To this end, there are actually two distinct forms of arrays in .NET: the general form of array and a specialized form called an SZArray.

The SZArray is a Single dimension array, with a Zero base for the index. It is the high performance array in .NET. All arrays in .NET derive from the System.Array class, but for the SZArray, most operations are compiled to special IL instructions, and the class is seldom used. This is similar to how Integers work in .NET: The runtime provides intrinsic instructions, so the code can compile to straight IL instructions rather than object method calls. The other type of array in .NET, which for want of a better term I'll call the general form of array, can have multiple dimensions and non-zero lower bounds. Access to elements is through virtual method calls, rather than the intrinsic IL support SZArrays have.

This code sample illustrates different ways of creating arrays in .NET:

Dim a(0 To 9) As Int32

Dim b(0 To 9, 0 To 9) As Int32

Dim c As Array = _
   Array.CreateInstance(GetType(Int32), 10)

Dim d As Array = _
   Array.CreateInstance(GetType(Int32), 10, 10)

Dim e As Array = _
   Array.CreateInstance(GetType(Int32), _
   New Int32() {10}, New Int32() {1})

Dim f As Array = _
   Array.CreateInstance(GetType(Int32), _
   New Int32() {10, 10}, New Int32() {1, 1})

Arrays a and c are the only SZArrays. The arrays in c and d are identical to a and b respectively, but the variables c and d are declared as System.Array making access to the elements awkward: You have to use the SetValue method to set a value, and you have to use the GetValue of System.Array to retrieve an element's value. Both of these methods are typed as Object, which causes boxing and other performance issues. To make working with the arrays c and d easier, you cast them to their actual type; in this case, Int32() and Int32( , ) respectively:

Dim temp As Array = _
   Array.CreateInstance( _
   GetType(Int32), 10)
Dim c() As Int32 = _
   DirectCast(temp, Int32())
Dim temp As Array = _
   Array.CreateInstance( _
   GetType(Int32), 10, 10)
Dim d( , ) As Int32 = _
   DirectCast(temp, Int32( , ))

The arrays e and f are different from the other four arrays in that their lower bounds are 1 instead of 0. You can cast f to an Int32( , ), just as you can with d:

Dim temp As Array = Array.CreateInstance(GetType(Int32), _
   New Int32() {10, 10}, New Int32() {1, 1})
Dim f( , ) As Int32 = DirectCast(temp, Int32( , ))

Unlike b and d, f has lower bounds of 1 for both dimensions; if you try to access element f(0,0), it results in an index out of bounds error at runtime. Yet when array f is declared as Int32(,) it can be used anywhere arrays b or d can be used. Therefore, you need to be aware that multi-dimensional arrays in .NET can have non-zero based lower bounds and actively safeguard your code for that case. When dealing with multi-dimensional arrays, you should use VB's LBound and UBound methods or the GetLowerBound and GetUpperBound methods of the array:

For col = b.GetLowerBound(0) To b.GetUpperBound(0)
   For row = b.GetLowerBound(1) To b.GetUpperBound(1)
      ' do work here
   Next
Next

Or:

For col = LBound(b, 1) To UBound(b, 1)
   For row = LBound(b, 2) To UBound(b, 2)
      ' do work here
   Next
Next

VB's LBound and UBound methods number the first dimension as 1 (referred to as rank in IntelliSense), whereas the GetLowerBound and GetUpperBound methods number the dimensions from 0. For consistency, you should try to avoid switching between 1- and 0-based functions. That means using GetLowerBound and GetUpperBound is preferable. Of course, you can use LBound and UBound, but be aware of the different base for rank.

The key thing you need to remember is that all multi-dimensional arrays can have non-zero lower bounds. But what about a single-dimension array? If you look at the first set of examples I gave, arrays a, c, and e are all single-dimension arrays. Both a and c are SZArrays. However, array e is an oddity in the .NET world: It has a single dimension, but with a non-zero lower bound. This single dimension non-zero lower bound array (SNZArray) is not interchangeable with an SZArray; it's a different type. Consider an array of Int32's: The type name of an SZArray is Int32[], whereas the type name of an SNZArray is Int32[*]. SNZArrays have no direct language support in either VB or C#, so in those languages you can work with SNZArrays only as System.Array. As such, no method callable from or written in C# or VB can ever be defined as using a strongly typed SNZArray because there is no way to specify Int32[*]. Hence, SNZArrays are an oddity in .NET; they exist, but they're never seen or used. (See sidebar about whether this a good idea or not.) For all practical purposes, there are only SZArrays and multi-dimensional arrays in VB.NET (or C#).

Treating Arrays As Objects
An array is more than a block of elements in .NET: The array itself is an Object that also provides methods and interfaces. Arrays in .NET all implement the ICollection, IEnumerable, and IList interfaces, but some of the members of those Interfaces aren't supported. Both SZArrays and multi-dimensional arrays implement IEnumerable in its entirety. For ICollection, SZArrays support all members, whereas multi-dimensional arrays support all members except the CopyTo method. When it comes to IList, SZArrays support all members except Add, Insert, Remove, or RemoveAt. Multi-dimensional arrays support only the Clear, IsFixedSize, and IsReadOnly members of IList (see Table 1, opposite page, for a complete list of which members are supported).

Of the IList members any array supports, IList.IsFixedSize always returns True, and the IList.IsReadOnly method returns False; an array is a fixed size, but you can change any of the elements. Apart from the instance methods on arrays, the System.Array class has some useful Shared members such as the Sort, Reverse, BinarySearch, Find(Of T), and FindAll(Of T) methods, all of which are supported only for SZArrays. Multi-dimensional arrays have limited functionality in comparison to the rich set of interfaces and methods SZArrays support.

As of .NET 2.0, SZArrays also support the generic interfaces ICollection(Of T), IEnumerable(Of T), and IList(Of T). These generic interfaces aren't implemented by multi-dimensional arrays. The CLR implements these interfaces through a hidden helper class at runtime, not by adding them to the base class System.Array. The omission of these interfaces from System.Array, and hence for multi-dimensional arrays, is symptomatic of a "leaky abstraction." Looking at multi-dimensional arrays, the non-generic IList Interface has poor (if any) support in System.Array. Adding IList(Of T) to this seems rather pointless. I'm guessing the CLR team felt it was more correct to branch SZArray out on its own, rather than break the common base it had in versions 1.0 and 1.1 of the runtime.

The lack of generic interfaces in System.Array and the lack of support for them in multi-dimensional arrays mean that code must rely on the non-generic interfaces. For multi-dimensional arrays in VB.NET, it also means that a For Each loop uses the non-generic IEnumerable interface, which results in boxing of value types and the necessary casting from Object. (C# avoids this overhead by emitting nested For loops that iterate from lower bounds to upper bounds, rather than calling the Array's GetEnumerator method.)

For SZArrays, the outcome is a lot better, but there are some details you should be aware of. One problem is the lack of documentation for the details of these generic interface implementations: The documentation is lacking because the document-generation tools don't see the generic interfaces on System.Array. Another quirk, or so it would seem, is that IList and IList(Of T) differ in behavior for the Clear and IsReadOnly members: IList(Of T).IsReadOnly returns True, whereas IList.IsReadOnly returns False; IList(Of T).Clear is not supported, but IList.Clear is. You need to understand the relationship of these interfaces to understand why this is so.

The interfaces IList and IList(Of T) are not as similar as their names suggest. You might expect IList(Of T) to be a strongly typed version of IList, but they actually have a different set of members, especially in their inheritance chain. IList inherits ICollection, which inherits IEnumerable. IList(Of T) inherits ICollection(Of T), which inherits IEnumerable(Of T) and IEnumerable (see Table 2 for a list of the main differences between the generic interfaces ICollection(Of T) and IList(Of T) and the non-generic interfaces ICollection and IList).

The ICollection(Of T) interface contains all the functionality for adding and removing items, for determining how many items there are, for determining whether an item already exists in the collection, and for iterating the collection (through IEnumerable(Of T)). This is significantly different from ICollection, which practically provides only a count and a means to iterate over the items. For ICollection(Of T), the IsReadOnly member indicates whether you can add or remove items. Likewise, if you call ICollection.Clear, then ICollection.Count should be zero because the items are all removed.

Arrays in .NET are always a fixed size, so the members Add and Remove can't be supported when cast to an ICollection(Of T). The size can't be changed, so Clear can't be supported because the Count cannot be zeroed. And because Add and Remove aren't supported, the ICollection(Of T).IsReadOnly must return True.

The confusion lies in the naming because the non-generic IList's IsReadOnly member returns False. From the IList interface perspective, you can modify the list items through the Items indexer, such as sorting the array, and so on. So the IList interface does provide a means to make the array mutable, which means its IsReadOnly member should and does return False. The generic IList(Of T) is also mutable, so it would seem logical that its IsReadOnly member returns False. Yet when viewed as an ICollection(Of T), the same implementation should return True, which is impossible for the one implementation. The problem is that the name for the ICollection(Of T) member shouldn't have been IsReadOnly; rather, it should have been CanAddOrRemove or IsFixedSize. For more history on this design flaw, see a blog entry by Brian Grunkemeyer and the corresponding Connect bug.

An important lesson to be learned from this debacle is that you need to be careful when naming interface members to ensure they stay within their specific realms and don't encroach on other interfaces or implementations that might derive from your interface. Sometimes this requires a bit of crystal ball gazing, or in this case, less tunnel vision.

Apart from this naming hiccup, the functionality of generic interfaces on arrays can be incredibly powerful. In fact, arrays provide the only place where VB.NET or C# can currently use generic variance. You can cast an array of Kangaroos to an IList(Of Kangaroo) or an IList(of Marsupial) or an IList(Of Mammal). The reason for the inclusion of this feature is arrays in .NET have always supported array covariance for reference types. Array covariance means you can cast an array of a derived type to an array of its base type, such as casting an array of Kangaroo to an array of Marsupial.

This covariance does have a serious implication: Type safety cannot be determined until runtime. Consider the case of an array of Kangaroo cast to an array of Marsupial. As an array of Marsupial, you could set one of the items to be a Koala, but because the underlying type is an array of Kangaroo, a runtime type mismatch exception will be thrown. This is only an issue when trying to set an Item and applies to any code that either uses arrays or the generic interface IList(Of T). For arrays, you can easily check to see if covariance is being used by examining the runtime type:

Sub WorkWithAnimals(ByVal items() _
   As Animal)
   If items.GetType IsNot GetType( _
      Animal()) Then
      'array covariance is being used
      '...
   End If
End Sub

Similarly, if you are working with IList(Of T) and need to create or set an element, you should check to see whether the IList(Of T) is using variance on an array:

Sub WorkWithIList(Of T)(ByVal items _
   As IList(Of T))
   If items.GetType.IsArray AndAlso _
      items.GetType IsNot GetType(T()) _
      Then
      'array covariance is being used
      '...
   End If
End Sub

Tips and Tricks
OK, so far I've focused the article on how arrays are implemented in .NET. It's necessary to know this information if you want to use arrays correctly and formulate rules and guidelines for their safe use. In no particular order, here's a list of tips and tricks that can help you get the most out of arrays in .NET.

  1. Use the 0 to n syntax when declaring arrays. Consider this case:
    Dim x(0 to 9) As Int32

    Using the 0 to n syntax makes it obvious to everyone that the array is declared by its bounds, not its length. This is especially valuable if some of your team is more proficient in C# than VB.

  2. Use jagged arrays. Jagged arrays are arrays of arrays. This technique lets you use SZArray, as well as obtain an object reference to an entire row. It also means your code will generally be more flexible because it can leverage the SZArray features. Begin by declaring a jagged array like this one:
    Dim x(0 to 9)() as Int32

    Next, loop through the outer array dimensioning each row:

    Dim z(0 To 9)() As Int32
    For i = 0 To z.Length - 1
       z(i) = New Int32(0 To 19) {}
    Next
  3. Check for null references before getting an array's length. Arrays declared as Dim x() as Int32 are null references until they are dimensioned.
  4. To create an empty array, declare the upper bound as -1. This comes from the definition that the upper bound is the length of the array minus one: Dim x(0 to -1) As Int32
  5. Try to avoid multi-dimensional arrays. You want to avoid these because their performance is typically poor. You'll get better results by favoring SZArrays and jagged arrays.
  6. When working with multi-dimensional arrays, check the lower bounds and upper bounds as the array can have non-zero lower bounds.
  7. Favor the instance methods on System.Array over VB's built-in methods. For example, use x.GetLowerBound(0) rather than LBound(x, 1) because this helps you keep all your methods zero-based.
  8. Arrays are fixed size. This means operations such as ReDim Preserve create a new array and copy the items over. Try to avoid unnecessary or frequent resizing of arrays.
  9. Be aware that array covariance can mean that type safety cannot be determined until runtime.

Taken as a whole, this set of tips and tricks should help you get the most out of arrays in .NET. Arrays in .NET offer a lot more functionality than an in-memory data store; searching, sorting, and variance are just some of the features they offer. Used wisely, arrays provide rich and powerful functionality to meet a wide range of requirements. Apply these few rules, and you can use arrays with a great deal of confidence in their robustness and safety.

comments powered by Disqus

Featured

Subscribe on YouTube