DevDisasters

How To Solve the Impossible Data Problem

Andrew does integration for a living. As a result, weird client data comes with the territory, but one client's data in particular stands out as being truly unique.

After Andrew graduated from college, his dream job was one where he spent his days telecommuting from home, putting on pants only when he had to and working the 9-to-5 grind. Turned out that, after a few years, the lack of face-to-face interaction and lack of hygiene (because Skype doesn't support the "smell-o-vision" protocol) finally took its toll. With the dream over, Andrew took a radically different career path as an on-site integration specialist, bouncing from site to site, often with visits home for weeks at a time in-between, and he loved it.

Sure, he sometimes had to make good on the inflated promises made by the technical sales guys, often leading to extended stays at a client's site, but that part wasn't all that much of a challenge. Just extra work. What was challenging, though, was having to research so-called "enterprise" databases created by people ranging from seasoned experts all the way to the CEO's "tech-savvy nephew." Most often, the documentation ranged from incomplete to missing to flat out wrong, leaving Andrew to put on his developer hat and find the source of whatever data he had to pull in.

Every system was unique, requiring different approaches, but there was one client that stood out. In particular, it's how the client's system stored strings.

The true reason was a mystery. Some said it was for "security" reasons. Others said that it harkened back to methods employed in a simpler age of Token Rings and mini-computers.

In either case, prior to storing strings on a certain anonymous client's system, the data was first converted to binary and then to a hexadecimal string. So, a string like "Hello" would be stored as "48656c6c6f0020202020202020" (null-terminated followed by padding spaces).

OK. Strange, but manageable. To work around this, Andrew wrote his own two-step conversion in C#. (Convert the text hex to a byte array; interpret the bytes as ASCII to form a string.) However, that left him wondering: How did the system go about reading in this information? That was when he discovered the following function stored in SQL Server:

RETURNS varchar(250)
AS
BEGIN
  DECLARE @result varchar(250)
  DECLARE @n int
  set @n = 1
  select @result = ('') ;
  while @n < = 200
  begin
    select @result = @result + case substring(@text,@n,2)
      when '30' then '0' when '31' then '1' when '32' then '2' when '33' then '3' 
      when '34' then '4' when '35' then '5' when '36' then '6' when '37' then '7' 
      when '38' then '8' when '39' then '9' when '41' then 'A' when '42' then 'B' 
      when '43' then 'C' when '44' then 'D' when '45' then 'E' when '46' then 'F' 
      when '47' then 'G' when '48' then 'H' when '49' then 'I' when '4a' then 'J' 
      when '4b' then 'K' when '4c' then 'L' when '4d' then 'M' when '4e' then 'N' 
      when '4f' then 'O' when '50' then 'P' when '51' then 'Q' when '52' then 'R' 
      when '53' then 'S' when '54' then 'T' when '55' then 'U' when '56' then 'V' 
      when '57' then 'W' when '58' then 'X' when '59' then 'Y' when '5a' then 'Z' 
      when '61' then 'a' when '62' then 'b' when '63' then 'c' when '64' then 'd' 
      when '65' then 'e' when '66' then 'f' when '67' then 'g' when '68' then 'h' 
      when '69' then 'i' when '6a' then 'j' when '6b' then 'k' when '6c' then 'l' 
      when '6d' then 'm' when '6e' then 'n' when '6f' then 'o' when '70' then 'p' 
      when '71' then 'q' when '72' then 'r' when '73' then 's' when '74' then 't' 
      when '75' then 'u' when '76' then 'v' when '77' then 'w' when '78' then 'x' 
      when '79' then 'y' when '7a' then 'z' when '20' then ' ' when '4A' then 'J' 
      when '4B' then 'K' when '4C' then 'L' when '4D' then 'M' when '4E' then 'N' 
      when '4F' then 'O' when '5A' then 'Z' when '6A' then 'j' when '6B' then 'k' 
      when '6C' then 'l' when '6D' then 'm' when '6E' then 'n' when '6F' then 'o' 
      when '7A' then 'z' when '2B' then '+' when '2b' then '+' when '2d' then '-' 
      when '2D' then '-' when '5F' then '_' when '5f' then '_' when '21' then '!' 
      when '23' then '#' when '24' then '$' when '25' then '%' when '26' then '&' 
      when '28' then '(' when '29' then ')' when '2A' then '*' when '2C' then ',' 
      when '2E' then '.' when '2F' then '/' when '3A' then ':' when '3B' then ';' 
      when '3C' then '<' when '3D' then '=' when '3E' then '>' when '3F' then '?'
      when '40' then '@' when '5B' then '[' when '5C' then '\' when '5D' then ']' 
      when '5E' then '^' when '7B' then '{' when '7C' then '|' when '7D' then '}' 
      when '7E' then '~' when '60' then '`' 
    else ' '
    end
    set @n = @n + 2;
  end
return @result 
end

Not only is this function capped at 100 characters (2 characters per hex byte), someone went through and manually mapped every single hex value to its corresponding character. Andrew wanted to believe that the developer wrote a program to spit this chart out, but something tells him that if they were smart enough to convert an integer both to its hex counterpart and its character counterpart (using loops), they probably wouldn't have felt the need to create this huge hardcoded lookup table in the first place.

After the system was migrated over, Andrew met with the client for a nice dinner celebrating another successful migration. It was at this gathering that the client thanked Andrew for his services. He had managed to accomplish something that a consultant they brought in from time to time to look over the system (at a princely sum of an hourly rate) had declared impossible without hiring an army of developers and, certainly not without paying a hefty project management fee, as well.

When he wrapped up and packed his bags for the next hotel near the next client's site with their quirky database, this time, he did so with a stronger sense of satisfaction than usual. Not only had he made a positive impact for the client, but he had done the work of an entire army.

About the Author

Mark Bowytz is a contributor to the popular Web site The Daily WTF. He has more than a decade of IT experience and is currently a systems analyst for PPG Industries.

comments powered by Disqus

Featured

  • Visual Studio Code Dev Team Cleans Up

    The Visual Studio Code development team focused on some housekeeping in the October update, closing more than 4,000 issues on GitHub, where the cross-platform, open-source editor lives.

  • ML.NET Model Builder Update Boosts Image Classification

    Microsoft announced an update to the Model Builder component of its ML.NET machine learning framework, boosting image classification and adding "try your model" functionality for predictions with sample input.

  • How to Do Naive Bayes with Numeric Data Using C#

    Dr. James McCaffrey of Microsoft Research uses a full code sample and screenshots to demonstrate how to create a naive Bayes classification system when the predictor values are numeric, using the C# language without any special code libraries.

  • Vortex

    Open Source 'Infrastructure-as-Code' SDK Adds .NET Core Support for Working with Azure

    Pulumi, known for its "Infrastructure-as-Code" cloud development tooling, has added support for .NET Core, letting .NET-centric developers use C#, F# and VB.NET to create, deploy, and manage Azure infrastructure.

  • .NET Framework Not Forgotten: Repair Tool Updated

    Even though Microsoft's development focus has shifted to the open-source, cross-platform .NET Core initiative -- with the aging, traditional, Windows-only .NET Framework relegated primarily to fixes and maintenance such as quality and reliability improvements -- the latter is still getting some other attention, as exemplified in a repair tool update.

.NET Insight

Sign up for our newsletter.

Terms and Privacy Policy consent

I agree to this site's Privacy Policy.

Upcoming Events