DevDisasters

The Regex Code Review

Ding! A new e-mail appeared in Jed's and the other developers' inboxes on the floor. It was from his coworker Bob, and it read:

"Jed, the term-validation process of CAPBACS is a performance bottleneck that's costing our company thousands in lost sales every month. This change must go in ASAP. While I appreciate your feedback and input, I don't believe that your experience working on the Web places you in a position to critique my work. Thanks!"

Before a change could be promoted, the team had to send out the code for comments to a mailing list of fellow developers.

A Stackable Solution
Bob had sent out a single VB.NET function with an accompanying class definition. Its sole purpose in life was to see if at least one of the terms, passed as an array of strings, existed in the second parameter, a document body. Jed was positive that somewhere, a Reverse Polish Notation calculator was crying:

Public Function DocContainsAtLeastOneTerm(ByVal   
  searchTerms() As String, ByVal searchText As   
    String) As Boolean

      Dim paramTable As New StackType
      Dim searchTable As New StackType
      Dim truthTable As New StackType

      For i As Integer = 
        0 To searchTerms.Length - 1 Step 1
        paramTable.Push(searchTerms(i).ToLower())
      Next

      For i As Integer = 
        0 To searchText.Split(" ").Length - 1 Step 1
        searchTable.Push(searchText.Split(" ") 
          (i).ToLower())
      Next

      For i As Integer = 
        1 To paramTable.stackHeight() - 1 Step 1
        If searchTable.Contains(paramTable.Pop()) Then
          truthTable.Push(True)
        Else
          truthTable.Push(False)
        End If
      Next

      If truthTable.stackHeight() > 0 And 
        truthTable.Contains(True) Then
        Return True
      Else
        Return False
      End If
    End Function

Jed had sent a private e-mail explaining that while Bob's implementation was viable and would surely work, there had to be a better way that didn't involve recreating built-in stack functionality.

Rather than try to clean up Bob's approach, Jed showed him just what an "inexperienced" Web developer could do:

Public Function DocContainsAtLeastOneTermNew(
  ByVal searchTerms() As String, ByVal searchText  
  As String) As Boolean
      If searchTerms.Length = 0 Then
        Return False
      End If

      Dim regexString As String = 
        "(" + searchTerms(0)

      If searchTerms.Length 
        > 1 Then
        For i As Integer = 1 To searchTerms.Length - 1 Step 1
          regexString += "|" + 
            searchTerms(i)
        Next
      End If

      regexString += ")"

      Dim rx As Object = New Regex(regexString)
      Dim matches As MatchCollection 
        = rx.Matches(searchText)

      If matches.Count > 0 Then
        Return True
      Else
        Return False
      End If
    End Function

Maybe Next Time
Positive his code would prevail, Jed sent a polite message back to Bob and the rest of his coworkers on the mailing list, suggesting that regular expressions might be a better approach. He attached his sample code, details of the tests he ran and the results, and sat back, quite smug that his skills in applied one-upmanship had prevailed.

In the end, Bob and Jed's group manager stepped in and declared they'd be moving forward with Bob's version, and save Jed's version for a "2.0" revision. The department couldn't simply discard Bob's effort. That would be a waste of time and money.

About the Author

Mark Bowytz is a contributor to the popular Web site The Daily WTF. He has more than a decade of IT experience and is currently a systems analyst for PPG Industries.

comments powered by Disqus

Reader Comments:

Mon, Feb 27, 2012 Richard

Dim rx >> As Object << = New Regex(regexString)

Great - let's throw some late-binding in there to really slow this puppy down!

Thu, Dec 15, 2011 zahra

please find *(code of process ron rabbin for visual basic

Wed, Jun 15, 2011 CK

Also, using the RegexOptions.Compiled optional parameter in the instantiation of the Regex object will dramatically improve runtime (at the cost of slower startup). As in: Dim rx As Object = New Regex(regexString, RegexOptions.Compiled)

Wed, Mar 30, 2011 Waltman

Nerd fight!!!

Tue, Mar 29, 2011 Igor Melbourne, Australia

Last 5 lines of a proposed solution could be substituted by a single line:
Return matches.Count>0

Sat, Mar 19, 2011 MrTazuk viagra 3784 beställ viagra dpe 格安シアリスはオンラインで購入する 582 viagra vhvhv

http://www.fayeunrauphotography.com/ DOT 3784 http://www.vinoentetrapak.com/ DOT dpe http://www.primetermites.com/ DOT 582 http://www.edicionesvedra.com/ DOT vhvhv

Wed, Mar 16, 2011

It is .net, why are the terms not in a container and using built in enumeration API?

Fri, Feb 18, 2011 LinqGeek

Hmmm. Here in the 21st century, can't LINQ knock this out with one line of code?

Wed, Feb 9, 2011 Steve Chicago

I agree wish most of STDERR's comments, choosing between RegEx and looping would depend largely on the type of input expected. I differ on the longest-first comment, perhapps my thinking cap is misaligned, but wouldn't shortest-first match before longest? [cat,catfish,catapult,catatonic,caterpillar]

Tue, Feb 8, 2011 stderr Ecuador

It's obvious even the most braindead regex engine would be faster for this job. However, I don't see why it would be more stable unless it filtered the search terms for regex metacharacters first for example, having a single parenthesis or square bracket in a search term would make it blow up.
If the number of search terms could potentially be large, I'd have used the regex with manual escaping of metacharacters, otherwise loop with simple string searches (perhaps sorting search terms longest-first) and early termination on the first positive hit. Unless of course there was a library implementing an Aho-Corasick or Rabin-Karp algorithm in VB ...

Wed, Feb 2, 2011 Kathleen Richards, editor

We had to cut the original article for the print edition. Here is the section on speed: "Jed had no idea how fast the .NET Framework's regular expression engine was, but he wanted to check, just in case there was method to Bob’s madness. He built what he figured to be a couple ‘worst case scenario’ test cases and tried out both solutions. Jed's solution was not only magnitudes faster, but a whole lot more stable too!"

Wed, Feb 2, 2011 Deeks

I'd be interested to know what the performance improvements were using a Regex rather Bob's implementation. Was it significantly quicker?

Add Your Comments Now:

Your Name:(optional)
Your Email:(optional)
Your Location:(optional)
Comment:
Please type the letters/numbers you see above

.NET Insight

Sign up for our newsletter.

I agree to this site's Privacy Policy.