.NET 2 the Max

Parse Text Files With Regular Expressions

Learn to parse fixed-length files and delimited text files, detect when a key combination is pressed, and change the style of the Web control that has the input focus.

Technology Toolbox: VB.NET, C#, ASP.NET

One of the great things about being a book and magazine writer and the founder of a Web site is that I can keep myself in touch with thousands of developers. And even when I don't receive e-mails, I can see which articles on our Web site developers visit most frequently (see the sidebar, "The 2TheMax Family of Sites"). It's surprising to see that so many developers spend so much time on a relatively small set of problems. It's another form of the famous 80/20 rule: Programmers spend 80 percent of their time solving the recurring 20 percent of all possible problems. With this new .NET 2 the Max column, we hope to help you deliver better applications, faster, by making the solutions to these recurring problems more widely known. —Francesco Balena

Parse Fixed-Length Fields in Text Files
XML has become the standard technology in information exchange, but many applications still use more primitive ways to import and export data. One such technique is based on text files containing fixed-width fields. Consider these text lines:

John  Smith   New York
Ann   Doe     Los Angeles

Each text line contains information about the first name (six characters), last name (eight characters), and city. The largest city has 11 characters, but usually you can assume that the last field will take all the characters up to the end of the current line.

Building a program that reads individual fields isn't difficult at all. Your app simply reads a line, then uses the String.Substring method to extract individual fields. However, I want to illustrate a different approach, based on regular expressions. Consider this regular expression:

^(?<first>.{6})(?<last>.{8})(?<city>.+)$

The dot (.) represents "any character." Therefore, .{6} means "any 6 characters." The expression (?<first>.{6}) creates a group named "first" that corresponds to these initial six characters. Likewise, (?<last>.{8}) creates a group named "last" that corresponds to the next eight characters. Finally, (?<city>.+) creates a group for all the remaining characters on the line and names it "city." The ^ and $ characters represent the beginning and end of the line, respectively. You can easily write short VB and C# routines built on this regular expression to parse a file (see Listing 1). Download the code for parsing fixed-length fields in text files here.

The beauty of this approach based on regular expressions is that it is unbelievably easy to adapt the code to different field widths and to work with delimited fields. For example, if the fixed-width fields are separated by semicolons, you simply modify the regular expression without touching the remaining code:

^(?<first>.{6});(?<last>.{8});
(?<city>.+)$

Once you understand how regular expressions work, creating and maintaining your parser routines becomes child's play. —F.B.

Use Regular Expressions With Delimited Text Files
Let's assume you want to write a program to parse a common (albeit primitive, according to today's standards) exchange format: delimited text files. Each field is separated from the next by a comma, a semicolon, a tab, or another special character. To further complicate things, such files usually allow values embedded in single or double quotes. In this case, you can't use the Split method of the String type to do the parsing, because your result would be bogus if a quoted value happens to include the delimiter (as in "Doe, John").

Regular expressions are a real lifesaver in such cases. You can use the parsing code (see Listing 1) for these purposes, provided that you use a different regular expression that accounts for delimited fields. Let's start with the simplified assumption that there are no quoted strings in the file:

John , Smith, New York
Ann, Doe, Los Angeles

As you might have noticed, I threw in some extra white spaces to add interest to the discussion. These spaces should be ignored when parsing the text. You can use this regular expression to parse a comma-delimited series of values and ignore these extra spaces at the same time:

^\s*(?<first>.*)\s*,\s*(?<last>.*)\s*,
\s*(?<city>.*)\s*$

The \s* sequence means "zero or more white spaces," where a white space can be a space, a tab, or a new-line character. It is essential that these \s* sequences and the delimiter character (the comma, in this case) are placed outside the (? ) construct, so that they aren't included in the named groups. Also, notice that you use the .* sequence (which stands for "zero or more characters") to account for consecutive delimiters that mark empty fields.

Next, let's see how to parse quoted fields, like those found in this text file:

'John, P.' , "Smith" , "New York"
'Robert "Slim"', "" , "Los Angeles, CA"

Text fields can be surrounded by both single and double quotes, and they can contain commas and quote characters that don't work as delimiters. The regular expression that can parse these lines is quite complex, so I'll split it for your convenience:

^\s*(?<q1>("|'))(?<first>.*)\k<q1>\s*,
\s*(?<q2>("|'))(?<last>.*)\k<q2>\s*,
\s*(?<q3>("|'))(?<city>.*)\k<q3>\s*$

The (?<q1>("|')) subexpression matches either the single or the double leading quote delimiter and assigns this group the name "q1." The \k<q1> subexpression is a back reference to whatever the q1 group found; therefore, it matches whatever quote character was used at the beginning of the field. The q2 and q3 groups have the same role for the next two fields. Once again, you don't need to change any other statement in the parsing routine.

By the way, .NET 2.0 developers will be able to parse both fixed-width and delimited fields by means of a brand-new class named TextFieldParser in the System.Text.Parsing namespace. In spite of what the namespace name suggests, however, this class is defined in the Microsoft.VisualBasic.Dll library. Therefore, C# applications can't access it unless you add a reference to this DLL (something few C# programmers will do, I'm afraid). I've prepared a TextFieldParser class for you to play with that you can download from the .Net2TheMax site (see the sidebar, "Additional 2TheMax Downloads," for details). —F.B.

Detect Global Hotkeys
.NET developers often want to determine whether a given key combination is pressed, when their Windows Forms applications don't have the input focus. There are basically two ways to detect if a key is pressed in such cases, and both require a Windows API call.

In the simplest case, you can poll the keyword using the GetAsyncKeyState API function, which you declare using this code:

' VB.NET
Private Declare Function _
   GetAsyncKeyState Lib "user32" _
   Alias "GetAsyncKeyState" ( _
   ByVal vKey As Keys) As Short

// C#
using System.Runtime.InteropServices;
// ...
[DllImport("user32")]
static extern short 
   GetAsyncKeyState(Keys vKey);

This method takes a 32-bit argument, but you can alias it to take a Keys value and save a conversion when you call it. Using the GetAsyncKeyState function is quite easy. For example, this code checks whether the end user is pressing the Ctrl-A key combination:

' VB.NET
If GetAsyncKeyState(Keys.A) < 0 And _
   GetAsyncKeyState(Keys.ControlKey) _
   < 0 Then
   ' Ctrl+A is being pressed
End If

// C#
if ( GetAsyncKeyState(Keys.A) < 0 && 
   GetAsyncKeyState(Keys.ControlKey) 
   < 0 )
{
   // Ctrl+A is being pressed
}

Running this code in the Tick event of a Timer control with a sufficiently low value for the Interval property (for example, 200 milliseconds) lets you trap all the hotkeys you're interested in. Unfortunately, the shorter the interval, the more overhead this technique adds to your application. Besides, documentation for GetAsyncKeyState states that this function can return 0 under Windows NT, 2000, and XP if the current desktop isn't the active desktop and if your application isn't the foreground program when desktop settings prevent background applications from learning what keys the end user is pressing.

Use the RegisterHotKey API function to avoid the overhead by registering one or more global hotkeys:

' VB.NET
Declare Function RegisterHotKey Lib _
   "user32" (ByVal hwnd As IntPtr, _
   ByVal id As Integer, _
   ByVal fsModifiers As Integer, _
   ByVal vk As Keys) As Integer

// C#
[DllImport("user32")]
static extern int RegisterHotKey(
   IntPtr hwnd, int id, 
   int fsModifiers, Keys vk);

hwnd is the handle of the window that receives a WM_HOTKEY message when the end user presses the hotkey specified by the last two arguments. The id argument identifies the hotkey and should be different for each global hotkey registered in the system. Call the UnregisterHotKey API function to unregister the global hotkey when the application shuts down.

Register the hotkey when the main form in your application loads, trap the hotkey by subclassing the WM_HOTKEY message, and unregister the hotkey when the form closes (see Listing 2). Use the GlobalAddAtom API function to generate a unique id for each instance of the class, as Microsoft documentation recommends.

A minor limitation of the code in Listing 2 is that it works only when called from inside a form class. In some circumstances, you might need to trap global hotkeys from inside non-visual classes, such as components. For this purpose, I've created a GlobalHotKey standalone class that you can instantiate from outside a form. This class exposes the HotKeyPressed event, so you simply need to use a WithEvents variable or set up an event handler explicitly for this event:

' VB.NET
Dim hk As New GlobalHotKey(Keys.A, _
   Keys.ControlKey)
AddHandler hk.HotKeyPressed, _
   AddressOf HotKeyHandler

// C#
GlobalHotKey hk = new GlobalHotKey( 
   Keys.A, Keys.ControlKey);
hk.HotKeyPressed += new 
   EventHandler(HotKeyHandler);

You can download the complete VB.NET and C# code of this class from the .Net2TheMax Web site (see the sidebar, "Additional 2TheMax Downloads"). One final note: These routines call unmanaged code, so you can't use them from inside .NET applications that aren't fully trusted—specifically, smart client Windows Forms applications that you launch through HTTP. —F.B.

Highlight the Active Textbox in Web Forms
When data-entry Web forms contain several textboxes, highlighting the textbox that has the input focus can improve the user's experience significantly. This technique is especially effective if your layout doesn't make the tab order sequence immediately clear. For example, users might be puzzled by multiple columns of textboxes and might wonder whether they're ordered horizontally or vertically. With a few lines of client-side JavaScript code, you can change the background and foreground colors of the active textbox easily, thus giving immediate feedback about the field that is receiving the user input.

DHTML makes it possible to change the HTML elements' style (font, colors, and position) by means of the control's style property and its subproperties. This HTML code renders a textbox control that handles the onfocus client-side event to change its background and foreground colors, and the onblur event to restore the original colors when the control loses the focus:

<input name="txtFirstName" type="text" 
   id="txtFirstName" onfocus= 
   "this.style.backgroundColor='Yellow';
   this.style.color = 'Blue';" 
   onblur="this.style.backgroundColor=
   'Window'; this.style.color='WindowText';"
/>

You can add highlighting support to all ASP.NET server-side controls dynamically, instead of hard-coding it manually. All controls that inherit from WebControl have an Attributes collection to which you can add one or more attributename=value pairs. These pairs are embedded at render time in the standard HTML code that the control generates. VB.NET and C# methods dynamically build a piece of JavaScript code that changes the background color and foreground color to the specified color values (see Listing 3). Using the SetInputControlColors method is trivial:

SetInputControlColors(txtFirstName, _
   SystemColors.Window, _
   SystemColors.WindowText, _
   Color.Yellow, Color.Blue)

Instead of calling SetInputControlColors manually for all the input controls on the form, you can use the SetAllInputControlsColors method to change the onfocus/onblur styles for all the TextBox, ListBox, and DropDownList controls on the form (see Listing 3). This method is recursive and also affects the controls nested in control containers. All you need to do now is put this code in the handler of the Page.Load event:

' VB.NET
SetAllIputControlsColors(Me, _
   SystemColors.Window, _
   SystemColors.WindowText, _
   Color.Yellow, Color.Blue)

// C#
SetAllIputControlsColors(this, 
   SystemColors.Window, 
   SystemColors.WindowText, 
   Color.Yellow, Color.Blue);

You can see the result in Internet Explorer (see Figure 1).

Using client-side JavaScript to change individual properties of each control isn't the only technique you can adopt to change the style of the active control. In fact, the approach just described works well only if the form contains a small number of fields. When the form has many controls, the amount of JavaScript generated for each control bloats the page's size and indirectly slows down its rendering. In such cases, you should define the normal and focus style by using a Cascading Style Sheet (CSS) class in a separate stylesheet file. You then write a shorter JavaScript code that sets the control's className property when the control gets or loses the focus. For instance, you might define this class in a CSS file:

.ActiveInputControl
{
   background-color: Red;
   color: Yellow;
   font-weight: bold;
}

You can call the SetAllInputControlsClassName method defined as shown here (see Listing 4):

' VB.NET
SetAllInputControlsClassName(Me, "", _
   "ActiveInputControl")

// C#
SetAllInputControlsClassName(this, "", 
   "ActiveInputControl");

The resulting HTML for a single control looks like this:

<input name="txtFirstName" type="text" 
   id="txtFirstName" onfocus=
   "this.className = 'ActiveTextBox';"
   onblur="this.className = '';"
/>

Notice that the control has no specific style class when it doesn't have the focus, so it uses the default style. Not only is this technique faster when a form contains many fields, but it's also more easily maintainable, because you can change the focus style later simply by providing a different CSS, without recompiling the ASP.NET application. —M.B.

comments powered by Disqus

Featured

  • Microsoft Revamps Fledgling AutoGen Framework for Agentic AI

    Only at v0.4, Microsoft's AutoGen framework for agentic AI -- the hottest new trend in AI development -- has already undergone a complete revamp, going to an asynchronous, event-driven architecture.

  • IDE Irony: Coding Errors Cause 'Critical' Vulnerability in Visual Studio

    In a larger-than-normal Patch Tuesday, Microsoft warned of a "critical" vulnerability in Visual Studio that should be fixed immediately if automatic patching isn't enabled, ironically caused by coding errors.

  • Building Blazor Applications

    A trio of Blazor experts will conduct a full-day workshop for devs to learn everything about the tech a a March developer conference in Las Vegas keynoted by Microsoft execs and featuring many Microsoft devs.

  • Gradient Boosting Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the gradient boosting regression technique, where the goal is to predict a single numeric value. Compared to existing library implementations of gradient boosting regression, a from-scratch implementation allows much easier customization and integration with other .NET systems.

  • Microsoft Execs to Tackle AI and Cloud in Dev Conference Keynotes

    AI unsurprisingly is all over keynotes that Microsoft execs will helm to kick off the Visual Studio Live! developer conference in Las Vegas, March 10-14, which the company described as "a must-attend event."

Subscribe on YouTube