In-Depth

String.Together Snazzy String Routines

String handling is one of the most basic -- and important -- capabilities that all developers must deal with. Learn how to take maximum advantage of the way VB .NET handles strings.

Technology Toolbox: VB .NET, VB6, VB5

At some point, every developer needs to manipulate strings, whether it's to convert data from one format to another or to alphabetize a list. Strings might be a basic part of programming, but using them you encounter a lot of permutations, cases where using them can be more difficult or more complex than it might initially appear.

This is particularly true for those developers who have transitioned or are thinking of transitioning to Visual Basic .NET from Classic Visual Basic, whether VB5, VB6, or some other version. Most of the Classic VB string functionality is still supported in VB .NET through the compatibility library, but using this library can set back your development as a .NET programmer. Relying on these old libraries can keep you from discovering the effective and highly useful new features that Microsoft provided with its new string handling classes.

Indeed, I find that I rarely use old VB string functions in my .NET development. The newer replacement string methods are more flexible and usually result in cleaner code. And I can write string-handling logic with fewer lines of code than before. If you're still using older VB string functions, or if you're writing a lot of code for loops and manual string construction, I can't emphasize enough how much more productive you'll be in .NET if you adopt the native string-handling techniques instead. Doing so will speed up your development and help you write tighter, more maintainable code.

I'll help get you started by walking you through some of the new capabilities in VB .NET's strings, pointing out similarities and differences, as well as places where you might run into trouble if you bring a Classic VB mentality with you to .NET. Note that this is a companion article to one that ran in the March 2007 issue of VSM ["Straighten Out Your Strings," VSM March 2007]. In that article, I detailed several techniques for dealing with strings, placing a special emphasis on comparing .NET's string handling capabilities to the classic VB equivalents. That article didn't come close to exhausting the interesting things you can do with strings in .NET, so I'm back this month with some more useful tips on string handling. No doubt, this is a rich enough subject that it would be possible to do another two or three articles after this one.

One of the best aspects of .NET's implementation of strings is that it can save you a lot of effort. I perform a lot of code reviews, and I often see developers formatting their strings the hard way, with a lot of manual code for the construction of a particular formatted string. I'm too lazy for that, and you should be, too.

Be Smart; Be Lazy
There's an entire subsystem in .NET that enables you to implement basic and advanced formatting with a minimal amount of code; much of that subsystem is accessible through the String class's Format method.

Covering all of the capabilities of the .NET string formatting would be an article in itself, but I'll cover some of the more important aspects you're likely to brush up against here. String.Format includes several overloads. Here's the simplest:

String.Format({FormattingString}, 
	{ObjectToFormat})

This method returns a formatted string based on two arguments. The first argument specifies how you want the string to be formatted, while the second argument holds the object or value you want to format. The second argument can be any of several different data types. Numbers, dates, and colors are common types to reformat, and .NET includes special formatting strings for all of them.

Imagine a scenario where you need to format a number into a currency amount and a percentage amount. Assume you have a form with a Textbox named Textbox1, and a couple of Label controls named Label1 and Label2. This code formats a number from Textbox1 and places the formatted results in the Label controls:

Dim s As String
Dim d As Decimal = CDec(TextBox1.Text)
s = String.Format("{0:C}", d)  ' Currency
Label1.Text = s
s = String.Format("{0:P}", d)  ' Percentage
Label2.Text = s

Using String.Format is largely a matter of finding the code that specifies what kind of formatting you want or need. This example shows the "C" code for currency and the "P" code for percentage, and there are codes for other formats such as scientific and hexadecimal. (Consult the VB .NET help files for a complete list.)

String.Format does more than let you insert characters. It also handles other chores such as rounding. For example, the currency formatter displays the currency amount formatted in dollars. It does so because I live in the United States, and my operating system setting is for currency in dollars. If you run the exact same code in another locale, the formatted output changes automatically to reflect the currency used in that locale. Number separators, such as the comma in the percentage amount, also change based on locale.

VB .NET includes more codes for formatting dates than for formatting numbers. You can choose long or sortable dates, and various orders of month, day, and year. For example, try altering the previous example to format a date a couple of different ways:

Dim s As String
Dim d As Date = CDate(TextBox1.Text)
s = String.Format("{0:D}", d)  ' Long date
Label1.Text = s
s = String.Format("{0:s}", d)  ' Sortable date
Label2.Text = s

This code works like the previous snippet, with one exception: It converts the contents of the textbox to a date, the codes in String.Format to a "D" for long date, and to an "s" for sortable date.

Note that some codes are case sensitive, so the lowercase "s" used in the date formatting code performs a different action than an uppercase "S." Also, the action associated with a code can vary based on what data type you want to format. "D" as used in this snippet formats a date as a long date, but "D" used with a decimal would format it as a standard decimal, with an optional number of leading zeros.

Compare Old and New VB Strings
Suppose you have two values entered by a user, and you need to make sure that they are in alphabetical order. In Classic VB, you would use the StrComp function, which returns a value of -1, 0, or 1, depending on the relationship of the two strings:

Dim FirstString As String = _
	TextBox1.Text
Dim SecondString As String = _
	TextBox2.Text
Dim i As Integer
i = StrComp(FirstString, SecondString)
Select Case i
	Case -1 ' FirstString is lower
	Case 0  ' strings are equal
	Case 1  ' SecondString is lower
End Select

You can still use the StrComp function in .NET versions of Visual Basic. However, I recommend the Compare method of the String class instead. String.Compare works exactly the same way as StrComp in a simple example. You need to change only the line that contains StrComp:

i = String.Compare(FirstString, SecondString)

However, String.Compare offers more flexibility. It has a number of overloads, including a useful one that ignores case:

i = String.Compare(FirstString, SecondString, True)

This overload performs a case-insensitive comparison, but it returns the same values as the other overload.

If you begin use String.Compare with any frequency, you're probably aware of the String.CompareTo method, which is immediately below it in the IntelliSense dropdown list. This is an instance method rather than a shared method. I prefer it in most cases because I think it makes the code a bit more readable, though opinions vary on that. You use this syntax when using CompareTo:

i = FirstString.CompareTo(SecondString)

CompareTo returns either -1, 0, or 1, just as Compare does. However it doesn't have an overload to ignore case during the comparison.

Classic VB developers are familiar with the LSet and RSet functions to pad strings with blanks. Both are still available in the Visual Basic compatibility library, but I recommend you use their .NET replacements, PadRight and PadLeft.

The names can be a bit confusing at first. PadRight corresponds to LSet, because LSet inserts any necessary spaces on the right hand side of the string. Similarly, PadLeft corresponds to RSet.

The main difference is that you can specify the padding character you want to use. For example, this code pads the string with number signs (hash marks):

Dim s As String = "abcdef"
Dim s2 As String
s2 = s.PadRight(10, "#"c)
' s2 now contains "abcdef####"

The argument for the pad character is of type Char; that's why the example includes a "c" after the pound sign. If you omit the "#"c pad character argument, .NET defaults to the space character.

The inverse of padding is trimming. VB .NET gives you three trimming methods: TrimStart, TrimEnd, and Trim. TrimStart trims at the beginning of the string, TrimEnd trims at the end of the string, and Trim combines those operations and trims both ends.

These methods correspond to the classic VB functions LTrim, RTrim, and Trim, which are still available in the compatibility library. But—and this is often the case—the .NET equivalent offers more functionality.

The old VB functions can trim only spaces, but the .NET equivalents incorporate a character array argument for you to specify any characters you want to trim. If you leave out that argument, the methods trim all whitespace by default.

For example, suppose you want to trim all punctuation from the end of a string. You could use TrimEnd with the punctuation characters specified:

Dim s As String = "You bet!!!  "
Dim s2 As String
s2 = s.TrimEnd("!.? ".ToCharArray)
' s2 now contains "You bet"
s2 = s.TrimEnd()  'Default whitespace trimming
' s2 now contains "You bet!!!"

Searching Inside Strings
String.Contains is another nice addition to VB .NET's repertoire of String methods.

Classic VB developers are familiar with code that looks like this:

Dim s As String = "abcdef"
If InStr(s, "cd") <> 0 Then
	MsgBox("Substring cd is present")
End If

As is usually with the classic syntax, this code still works if you take advantage of the compatibility library. But I much prefer the .NET alternative, String.Contains, if I'm just checking to see whether a substring is present:

Dim s As String = "abcdef"
If s.Contains("cd") Then
	MsgBox("Substring cd is present")
End If

This code is cleaner, more understandable, and even easier to type than the Classic VB version.

There's one key difference: You could also use Classic VB's Instr function to find the position of a substring. String.Contains returns only a Boolean indicating that the substring is present. You must use a different method of the String class, String.IndexOf, to find the position of a substring.

String.IndexOf behaves a lot like the Instr function, with some extra functionality thrown in. You need to be aware of one critical difference, however; Instr is one-based (the first character is considered to have the position one), while String.IndexOf is zero-based, so the first character is position zero. You can see the difference at work in this short snippet:

Dim s As String = "abcdef"
Dim n As Integer
n = InStr(s, "cd")  ' old form
' n is now 3
n = s.IndexOf("cd") ' new form
' n is now 2

This difference in position numbering shows up in a couple of other ways. If InStr doesn't find a substring, it returns 0. String.IndexOf can't do that because 0 means it found the substring in the first position. So, String.IndexOf returns -1 when it does not find the substring.

Position numbering also matters when you need to specify the starting position of the search. For example, this code performs a comparison of Instr and String.IndexOf, specifying where to begin looking for the substring:

Dim s As String = "abcdef"
Dim n As Integer
n = InStr(3, s, "cd")  ' old form
' n is now 3
n = s.IndexOf("cd", 3) ' new form
' n is now -1 because 3 is past the
' beginning of the substring

It's potentially confusing to move back and forth between one-based and zero-based functions, so I recommend that you drop Instr completely and replace it with String.Contains and String.IndexOf. That way, you can have access to the extra functionality in String.IndexOf. For example, you can specify the comparison (binary versus standard string) that you want to use.

It will also feel more natural to use a related method that offers another capability. String.IndexOfAny allows you to find the first location of any character in a character array:

Dim s As String = "abcdef"
Dim n As Integer
n = s.IndexOfAny("ec".ToCharArray)
' n is now 2 because the first character
' found (c) is in zero-based position 2

Concatenation Revisited
In the companion article to this one, I discussed how to use String.Join to assemble a string out of an array of smaller strings, with delimiters between each of the smaller strings. But if you want to do simple concatenation with no delimiters, you can use String.Concat. This method allows you to submit an array of strings or objects and get a concatenated string based on the array:

Dim sSubString(3) As String
sSubString(0) = "Nashville"
sSubString(1) = ", "
sSubString(2) = "TN"
sSubString(3) = " 37215"
Dim sCityState As String
sCityState = String.Concat(sSubString)
' sCityState now contains
' "Nashville, TN 37215"

If the array is an object array, then the ToString method of each object is used to furnish the raw material for concatenation. This is a great shortcut for getting an assembled string on an array of strings or objects, because you don't have to write looping logic to go through the array.

Let's finish up with an aspect of strings in .NET that occasionally trips up traditional VB programmers. There is a difference in how strings are initialized between the two environments. In VB6, it's simple: Declaring a string in VB6 gives it a value immediately, that of an empty string. You could compare it against either the vbNullString constant or against a literal empty string (""), and it would match.

In most routine cases, .NET versions of Visual Basic act the same way. For example, vbNullString is replaced by String.Empty, which is a stand-in for a zero-length string. You can compare a new string that hasn't been initialized to either String.Empty or a zero-length string (""), and the result will return True:

Dim s As String
If s = String.Empty Then 
	'The above condition evaluates to True
End If

But that similarity obscures some underlying differences. When a .NET string is initialized, it's not really a zero-length string. Instead, it's a null object reference, which means it evaluates to Nothing in VB.NET. You can see the difference by running this code:

Dim s1 As String
Dim s2 As String = String.Empty

If s1 Is Nothing Then
	Console.WriteLine("s1 is Nothing")
End If
If s2 Is Nothing Then
	Console.WriteLine("s2 is Nothing")
End If

The first If statement's conditional ("s1 Is Nothing") evaluates to True, but the second If statement's condition ("s2 Is Nothing") doesn't.

On the other hand, the difference disappears when a comparison is made to an empty string. Consider this code:

Dim s1 As String
Dim s2 As String = String.Empty

If s1 = String.Empty Then
	Console.WriteLine("s1 is Empty")
End If
If s2 = String.Empty Then
	Console.WriteLine("s2 is Empty")
End If

In this case, both conditionals evaluate to True. How can that be? How can s1 be equivalent to both Nothing and String.Empty, even though Nothing and String.Empty are not themselves equal?

The answer is that the VB compiler is trying to keep you out of trouble. The conditional "s1 = String.Empty" is compiled as "s1 Is Nothing OrElse s1 = String.Empty". So the conditional is True for an uninitialized string.

This works reasonably well because you rarely need to care if a string is uninitialized vs. simply empty. However, if you want to be more explicit in your checking, and get somewhat more readable code in the bargain, you can use a method of the String class called String.IsNullOrEmpty:

Dim s As String
Dim s2 As String = String.Empty
Dim b As Boolean
b = String.IsNullOrEmpty(s)
' b contains True
b = String.IsNullOrEmpty(s2)
' b still contains True

You can probably file this in the "you'll never need to know this" category, but the difference between an uninitialized string and an empty string can also be seen when using the ToString method:

Dim s1 As String
Dim s2 As String = String.Empty
Console.WriteLine(s2.ToString)
Console.WriteLine(s1.ToString)

If you run this code, the last line generates a null object reference exception. However, you'll rarely have a need to use the ToString method of a String. A String is already a String datatype, so the String.ToString method is usually redundant.

This is my second article on strings in as many months, and there's still more to tell. I highly recommend that you take the time to try out the techniques described in this article, and see what .NET's native string features can do for your applications.

comments powered by Disqus

Featured

Subscribe on YouTube