Straighten Out Your Strings

Get better performance and productivity by relying on .NET's native string handling capabilities.

Technology Toolbox: VB .NET, VB6, VB5

The most common type of data to manipulate in most applications is textual data, which developers usually store in strings. String handling has always been a strong point for the Basic family of languages, and the .NET version of Visual Basic gives you quite a few new tricks for manipulating strings, and many capabilities that you once had to create yourself are now built directly into the .NET framework.

Using Visual Basic to the fullest for string manipulation requires learning some new concepts and capabilities. Some programmers who come from classic versions of Visual Basic (VB6 and earlier) seem determined to avoid using the new features in .NET. In many cases, they are relying on the VB6 compatibility library to do string manipulation as they always have. That's too bad. If you're relying exclusively on the VB6 compatibility library for string manipulation, you're ignoring powerful techniques that are right at your fingertips.

The best part: It's easy to get more out of string handling in the .NET version of VB. That said, you might have to rethink how you approach string manipulation. I'll show you a couple of the better known tips for string manipulation, including how to exploit the StringBuilder that ships with .NET. I'll also show off some tips that are less well known, but also highly effective. All of the techniques presented work equally well in any .NET version of Visual Basic. Many of the techniques also work in C#, as long as you make the appropriate modifications to the syntax.

.NET's StringBuilder can help you achieve a significant performance boost when manipulating strings. The biggest issue you face when using strings in .NET if you come to the language from VB5 or VB6 is that strings in .NET look superficially like strings in VB6 and earlier. For example, you use the same syntax to create and manipulate them. But under the covers, strings in .NET differ significantly from those in VB6, and, in certain circumstances, the difference can have a big effect on performance if you try to code as you always have.

Strings in .NET are immutable, which means that they cannot be modified in place. Every time an operation is performed on a string to change its contents, a new string is created to hold the altered version.

This performance difference is difficult to measure when working with a small number of routine string operations, but the difference can be dramatic if one string must undergo repeated modifications. Some string operations that offered acceptable performance in VB6 are unacceptable in VB 2003 or VB 2005.

Fortunately, there's an easy way to address this. The VB version of .NET includes a special class just for string processing called the StringBuilder. StringBuilders make it possible to manipulate strings in place and doesn't require the creation of a new copy for each operation.

The difference is especially noticeable when performing string concatenations. Assume that you want to create a loop that concatenates a string 15,000 times:

Dim sTest As String
For i As Integer = 0 To 15000
     sTest = sTest + "Strings in VB"

This code takes more than nine seconds to run on my reference machine (a 2.0 GHz laptop). Now perform the equivalent operation using StringBuilder:

Dim sTest As New System.Text.StringBuilder()
For i As Integer = 0 To 15000
     sTest.Append("Strings in VB")

The StringBuilder is a class, so you must declare it with New, and you append a string using the StringBuilder's Append method. Functionally, the results are identical in the two concatenation operations, but the execution times are not. The StringBuilder code runs so fast that it's hard to time it. Typically, it runs in about one-hundredth of a second or less on my reference machine. The same operation using the old method took about a thousand times as long.

Parse Your Strings
Many string related functions in the VB compatibility library have exact equivalents in the .NET Framework. These equivalents are all methods of the String class (see Table 1). For these cases, the one you use is mostly a matter of taste because it's hard to measure any difference in performance. That said, .NET is object based, so I prefer to go with the new object-oriented syntax over the old VB function syntax.

One common operation in string manipulation is parsing, where you split a string into a set of subsections that are delimited by some character. For years, I kept handy a function I called Parse that did exactly that. I've retired my venerable Parse function because that capability is now built into the .NET String class.

The method .NET uses to parse a string into parts is called Split. It operates on a string, and requires a set of delimiter characters as an argument. It then returns an array of strings, where each array element is one of the subsections you split apart.

For example, assume you have a String variable named sSentence:

Dim sSentence As String = _
     "The quick red fox jumped over the lazy dog."

Now assume you want to split the sentence into its constituent words. You can do this with the Split operation using a space as the delimiter. This code parses the sentence into its individual words, then writes the output onto the Console (see Figure 1a):

Dim sWords() As String
Dim chrSeparators() As Char = _
     " ".ToCharArray
sWords = sSentence.Split(chrSeparators)
For iIndex As Integer = 0 _
     To sWords.GetUpperBound(0)

The Split method can also take an argument for a maximum number of pieces you want returned:

sWords = sSentence.Split( _
     chrSeparators, 4)

This code writes The, quick, and red to separate lines, but leaves the rest of the sentence intact on its own line (see Figure 1b). This is quite useful when you need to parse a string for the first few elements, but later need to parse more of the string. Note that the value of the parameter you give to the Split method should be one greater than the number of pieces you want to work on.

Split takes as many character separators as you want to pass it. You pass these separators in the form of a character array, which is an array of type Char. In the immediately preceding example, the character array only has one element, the space character. The line that declares the array of separators (chrSeparators) initializes the array using another new .NET method for strings, ToCharArray.

ToCharArray takes a string and changes it into a character array, with one character array element for each character in the string. Now change your sentence so it reads:

This-code splits.sentences!various ways

Next, alter the line where you declare your chrSeparators like this (see Figure 1c):

Dim chrSeparators() As Char = _
     " .!?-".ToCharArray

This code snippet is a little trickier but as you can see, our ToChar-Array now contains five different delimiters; space, period, exclamation point, question mark, and dash, that we can use to parse the sentence.

Split also returns an empty string in the array to account for cases where two delimiters are together in the string. Now try putting one more exclamation point in the sentence, right next to the first dash:

This-!code splits.sentences!various ways

This produces a corresponding change in output (see Figure 1d).

Parsing Intricacies
The dash and the exclamation point are both delimiters, so Split returns an empty string as the second element of the string array. This indicates that there are no characters between those two delimiters.

Two consecutive commas in comma-delimited records usually mean that there is a blank field in that part of the record. However, there are also times when you want to parse strings to generate output differently. Consider parsing words in a sentence. You don't want a blank word returned just because the sentence happened to contain two spaces in a row. What you want in that case is to consider any number of spaces (or tabs, or carriage returns, or any combination thereof) as a delimiter between words. To handle such a case, the Split method has overloads that let you leave empty strings out of the results array. For example, suppose you modify the line containing the Split method in the earlier example to look like this:

sWords = sSentence.Split( _
     chrSeparators, _
     StringSplitOptions. _

In cases where two or more delimiters are together, they are now treated as one delimiter.

So far, I've shown you how to break apart text. The inverse operation of splitting apart a string is joining strings together, with a delimiter between each pair, and getting one large string. The String class includes a Join method that helps you build larger strings from shorter ones:

' Get an array of words
Dim sSentence As String = _
     "The quick red fox jumped " & _ 
     "over the lazy dog."
Dim sWords() As String
Dim chrSeparators() _
     As Char = " ".ToCharArray
sWords = _

Dim sNewSentence As String
sNewSentence = String.Join( _
     "<->", sWords)

This code uses the Split method to acquire an array of words. Next, it joins the words together with the string "<->" between each word (see Figure 1e). The string that goes between the words can be any number of characters. This kind of technique is especially effective from a performance standpoint if you have an array of items that needs to go into a delimited record.

If you want to replace part of a string with another string in VB6, you accomplish this with the Replace function. Note that this functionality was specific to VB6 only; VB5 and earlier didn't have a Replace function. You can still use the Replace function in .NET versions of VB, but .NET also introduces a new, object-oriented approach.

As you become immersed in object-oriented thinking, it becomes more natural to use the Replace method of the String class rather than a Replace function. Functionally, the two approaches yield the same results, but the syntax differs:

Dim s1 As String = _
   "Time flies like an arrow. Fruit flies like a banana."
Dim s2 As String

' First version, with VB Replace 
s2 = Replace(s1, "flies", "~~~~")

' Second version, with String.Replace 
s2 = s1.Replace("flies", "~~~~")

Both approaches yield the same output:

Time ~~~~ like an arrow. 
Fuit ~~~~ like a banana.

.NET Version Provides New Capabilities
There are a couple significant differences between the VB6- compatible Replace function and the Replace method of a String. The new Replace method includes an additional capability to control the position in the string where replacement begins, as well as the ability to specify how many replacements you want to make. Implementing these features requires only a couple changes to your code:

s2 = Replace(s1, "flies", "~~~~", 3, 1)

The first extra parameter for the Replace function (the "3") tells it to begin at the third character. The second extra parameter tells the function to only make one replacement. This updated version of the function strips out only the first instance of "flies":

Time ~~~~ like an arrow. 
Fruit flies like a banana.

On the other hand, the String.Replace method gives you the option of specifying what you want to replace and what you want to replace it with as a Char instead of a String:

Dim chrE As Char = "e"
Dim chrSquiggle As Char = "~"
s2 = s1.Replace(chrE, chrSquiggle)

Your output now looks like this:

Tim~ fli~s lik~ an arrow. 
Fruit fli~s lik~ a banana.

This offers a minor speed advantage. You could also use it in a simple loop to replace any of several different characters with a single character, but it turns out that regular expressions (accessed through the .NET Framework RegEx class) are usually better for these kinds of changes.

The VB6 compatibility library includes two functions for which there are no exact .NET equivalents–the Left and Right functions. These functions grab a certain number of characters on the left side or the right side of a string:

Dim s1 As String = "ABCDEFG"
Console.Writeline(Microsoft.VisualBasic.Left(s1, 3))
Console.WriteLine(Microsoft.VisualBasic.Right(s1, 5))

Running these functions gives you this output:


Note that I have included the full, explicit namespace path for the Left and Right functions. That's necessary when working in any situation where "Left" and "Right" might be ambiguous, which is especially true when working with Windows Forms. Windows Forms have both Left and Right properties, so it's necessary to qualify the Left and Right functions to distinguish them from the Left and Right properties of the form.

That single annoyance has driven me to use the closest .NET equivalents. There are no exact equivalents, so what I use depends on the task at hand. If you're checking to see whether a string starts with a given set of characters, the classic code looks like this:

If Microsoft.VisualBasic.Left(s1, 3) = "ABC" Then

You can replace this with a method on the string class called StartsWith:

If s1.StartsWith("ABC") Then 

This is quite a bit more readable, so I highly recommend that you use this approach instead. There's also an equivalent for the other side of the string called EndsWith. It works exactly the same way:

If s1.EndsWith("CDEFG") Then

Work Around Drawbacks
Unfortunately, this approach doesn't work if you're trying to perform an assignment. For example, assume you want to cleave off the first three characters of a string and put them in another string. The classic approach looks something like this:

Dim s1 As String = "ABCDEFG"
Dim s2 As String
s2 = Microsoft.VisualBasic.Left(s1, 3)
' s2 now contains "ABC"

Without using the Left function, the best .NET equivalent is this:

Dim s1 As String = "ABCDEFG"
Dim s2 As String
s2 = s1.SubString(0,3)
' s2 now contains "ABC"

Here you use the SubString method of the String class. It takes a starting position (0 in this code) where you want to extract a string, and then a length of string you want to extract (3 in the example). This isn't too bad as a replacement for Left, but the SubString replacement for the Right function is a bit messier:

Dim s1 As String = "ABCDEFG"
Dim s2 As String
s2 = s1.Substring(s1.Length - 5, 5)
' s2 now contains "CDEFG"

I don't like these SubString approachs because it's not at all obvious what the code is doing. So I still sometimes use Left and Right, because I think they make my code a lot more readable. But to keep down the annoyance factor, I typically put this line at the top of the module:

Imports VB = Microsoft.VisualBasic

Including this Imports statement helps you create cleaner code. Without the Imports statement, you have to code your Left and Right calls like this:

Dim s1 As String = "ABCDEFG"
Console.WriteLine( _
     Microsoft.VisualBasic.Left(s1, 3))
Console.WriteLine( _
     Microsoft.VisualBasic.Right(s1, 5))

Or, you can code them in this far more readable form:

Dim s1 As String = "ABCDEFG"
Console.WriteLine(VB.Left(s1, 3))
Console.WriteLine(VB.Right(s1, 5))

The SubString method is functionally equivalent to the VB Mid function, with one important exception. Note that the example above used "0" as the start position for the string. That's because SubString considers the position of the character in a string to be zero-based, so the first character is at position 0.

By contrast, Mid is one-based, so the first character for this kind of string is at position 1. This can be confusing, and is a consequence of the changes made to classic VB to make it consistent with other .NET languages. It's best if you decide which you want to use (Mid or SubString) and stick with it.

I generally recommend that you use the .NET version even when there are nearly equivalent techniques in the VB compatibility library. For example, I recommend using SubString, but with one caveat. There is a little-known capability of Mid that SubString doesn't replicate.

Typically, you use Mid and SubString on the right side of an assignment:

Dim s1 As String = "ABCDEFGHIKLM"
Dim s2 As String
s2 = Mid(s1, 3, 5)
' s2 contains "CDEFG"
' Equivalent SubString works fine...
s2 = s1.Substring(2, 5)
' s2 contains "CDEFG"

However, Mid (Mid$ in the early days) also had the ability to be used on the left side of an assignment. This functionality dated all the way back to QuickBASIC:

Mid(s1, 3, 5) = "12345"
' s1 now contains "AB12345HIKLM"

This is a handy capability, and it still works in VB 2003 and VB 2005. Unfortunately SubString doesn't work that way, however, there are some .NET Framework capabilities that you can use instead of this unusual Mid capability. For example, you can combine the Insert and Remove methods of the String class to accomplish the same task:

Dim s1 As String = "ABCDEFGHIKLM"
Dim s2 As String
' This line replicates Mid(s1, 3, 5) = "12345"
s2 = s1.Remove(2, 5).Insert(2, "12345")
' s2 now contains "AB12345HIKLM"

The Remove method takes a starting position and a length, and removes the characters indicated. The Insert method takes a starting position and a string, and inserts the characters in the string at the position indicated.

This article provides a good starting point for learning about strings, but there is a lot more you can explore, including such techniques as String.Format, the IndexOf method, the Compare method, and various methods for padding and trimming strings. As with the techniques described in this article, there are some caveats with learning the .NET versions of these techniques, but also some real performance bonuses. It is well worth your time to look into them and how they work.

comments powered by Disqus


Subscribe on YouTube