DevDisasters

A More Unique Identifier

"Oh for crying out loud," Jeremy heard his cubicle-neighbor Andy shout, followed by a string of not-so-family-friendly expletives. "It's yet another duplicate GUID!"

Jeremy was intrigued. "Duplicate" is perhaps the least likely problem for a Globally Unique Identifier. With more than 340 billion trillion quadrillion (and that's no typo) possible values, the probability of having two identical GUIDs is basically non-existent. The probability of having multiple duplicate GUIDs is smaller than winning the lottery twice. On the same day. For every lottery held in the world.

"Duplicate GUIDs?" Jeremy stood up and asked over the cubicle wall. "How is that even possible?"

"Obviously it's bound to happen sooner or later," Andy responded. "I mean, we generate a lot of GUIDs. And I mean a lot. We really should have used a more unique identifier, like I had suggested earlier."

That last sentence-especially delivered with the told-you-so inflection-was the only clue Jeremy needed to know exactly what Andy was referring to. Months earlier, the development team was presented with a bit of a unique challenge.

Unique Requirement
An automated data collection and processing application they were building required that a "dataset ID" be returned for every dataset that was uploaded to the Web service. This "dataset ID" could then be used by the consuming application to check on the processing status, cancel the processing request and, once processing was completed, retrieve the "processed dataset ID."

Tell Us Your Tale

Each issue Alex Papadimoulis, publisher of the popular Web site The Daily WTF, recounts first-person tales of software development gone terribly wrong. Have you experienced the darker side of development? We want to publish your story. E-mail your tale to Executive Editor Kathleen Richards at [email protected] and put "DevDisasters" as the subject line.

The tricky part in all this was that the processing application would never know how many IDs were issued or what IDs had been issued: It would somehow have to provide an ID that was always unique.

Given the globally unique requirement, the solution was obvious to Jeremy: Simply generate a GUID using the Windows API. Andy, on the other hand, hadn't used GUID in the past and didn't quite trust an algorithm to be smart enough to generate such an identifier. He didn't have a better idea, but was confident that, given enough time, he could cobble something together that utilized the computer's serial number, CPU footprint and a number of other factors.

"We're not generating that many GUIDs," Jeremy defended. "A thousand a day, tops. Statistically, we'd need to generate a hundred trillion every day for a mill-"

Andy cut him off. "Yeah, yeah, I remember your whole spiel. A billion, gazillion, fafillion, shabolubalu jillion zillion yillion ... Whatever. The fact is, we've got duplicates. It's causing all sorts of problems, and I'm going to have to spend all afternoon cleaning the mess for just this one duplicate."

Shifty Characters
Baffled, Jeremy decided to peek at the source code to see the problem. Perhaps it was a variable that was getting reused? Or maybe something in the cache?

After all of 10 minutes, Jeremy discovered the root of the problem:

// Swap two chars of dataset ID 
// to create processed ID
var dsID = dataSetGuid.ToString();
var pdsID = new StringBuilder();
pdsID.Append(dsID[1]); 
pdsID.Append(dsID[0]); 
pdsID.Append(dsID.Substring(2));
return new Guid(pdsID.ToString());

The code was checked in by Andy. In fairness, it will generate a new, unique GUID-provided that the first two characters of the GUID aren't the same.

Jeremy explained the problem to Andy, who was still working on cleaning up 664591c8-1985-4071-a4ab-ec87f1e9af1.

"Oh," Andy said, embarrassed. "I see. But what are the chances of that?"

About the Author

Alex Papadimoulis lives in Berea, Ohio. The principal member of Inedo, LLC, he uses his 10 years of IT experience to bring custom software solutions to small- and mid-sized businesses and to help other software development organizations utilize best practices in their products. On the Internet, Alex can usually be found answering questions in various newsgroups and posting some rather interesting real-life examples of how not to program on his Web site TheDailyWTF.com. You can contact Alex directly via email at [email protected].,

comments powered by Disqus

Featured

  • Logistic Regression with Batch SGD Training and Weight Decay Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end program that explains how to perform binary classification (predicting a variable with two possible discrete values) using logistic regression, where the prediction model is trained using batch stochastic gradient descent with weight decay.

  • Dev Asks, and 7 Years Later Python in VS Code Delivers Django Unit Test Support

    "We are excited to announce support for one of our most requested features: you can now discover and run Django unit tests through the Test Explorer!"

  • OData Finally Ditches Old .NET Framework

    "The most disruptive change we are making in this release is dropping support for .NET Framework."

  • .NET MAUI, ASP.NET Core Polished in First Release Candidate for .NET 9

    Microsoft shipped the first release candidate for .NET 9, which is nearing feature completeness and production readiness in advance of its November debut.

Subscribe on YouTube