Code Focused

The Fraternal Twins of Equals and GetHashCode

Lots of searching through lots of data means potential app performance degradation. Hash codes can speed things up.

The Items collection in a Windows Forms ComboBox control is eminently flexible, letting you store any object you desire for later selection and display. But I often find myself storing just the minimum of content in each element, typically a pre-formatted display string, and a numeric identifier from which I can access the underlying database entry or stored content, like this:

public class ItemWithID
{
  public string ItemText { get; set; }
  public long ItemID { get; set; }

  public ItemWithID(string theText, long theID)
  {
    this.ItemText = theText;
    this.ItemID = theID;
  }

  public override string ToString()
  {
    return this.ItemText;
  }
}

Adding a dropdown list item with an embedded identifier is then just a matter of adding the ItemWithID instance to the control's Items collection:

FoodCategory.Items.Add(new ItemWithID("Candy", 12L));

The ComboBox control's Items collection also sports a handy IndexOf method, used to locate an item already present in the list. Normally, you pass this method an instance of the desired object. But you can also pass it other values, and if you've overridden the object's Equals method, you can include logic that confirms whether members within a single instance match the queried value. For instance, this override accepts a long integer, and attempts to match it to the ItemWithID object's ItemID property:

public override bool Equals(object obj)
{
  if (obj is long)
    return (Convert.ToInt64(obj) == this.ItemID);
  else
    return base.Equals(obj);
}

Now you can find a matching object in the control by passing its identifier:

locationInCombo = FoodCategory.Items.IndexOf(12L);

This code works just fine, except for the error. Well, not an error, actually, but a noisy warning, one that appears in the Visual Studio Error List panel:

'ItemWithID' overrides Object.Equals(object o) but does not override Object.GetHashCode()

A hash code is a data value that represents some other data. Hashes are often used with password management, where a calculated and irreversible scramble (the hash) of a plain text password is kept in storage instead of the insecure original. The last digit of your credit card number is also a hash, one that helps validate the format of the other digits. Hashing algorithms have several important features that make them useful in a variety of computing situations:

  1. The same source data always produces the same hash. This is what enables hashed password comparisons and consistent credit card number verification.
  2. The hash is a simple data type that’s easy to compare and manipulate. This isn't a strict rule. You could have a hash that was more complex than the original. But normally, hashes are short strings or numbers.
  3. The set of all possible values in a hash is finite. The verification digit in your credit card is just a single digit, and allows for only 10 possible hashed values.

At first glance, there seems to be no relationship between a hash code and the comparison of one long integer to another within an Equals override. But hashes are important to the Equals method because of speed issues.

Let's say that you had added 1,000,000 items to your ComboBox control, with ID values ranging from zero to 9,999,999. Assuming that the numeric value you provided to each ItemWithID object is unique, the IndexOf method would need to call the Equals method on each and every element in the Items collection, stopping only when it found a match, or reached the end of the set without a match. If your lookups were somewhat random, you would expect to query about half the items (500,000) on average each time you used IndexOf.

That's a lot of comparisons. But what if you could lower the average number of comparisons down to, say, just 5,000 calls to Equals per IndexOf search? That's 100 times faster. Hashes enable that speed increase.

Hashes reduce the typical number of searches by breaking your data up into buckets, with each bucket represented by a hash code. Consider a hashing algorithm that converts an ID value (which, as you recall, ranges from 0 to 9,999,999) into a smaller representative number between 0 and 999:

hashedValue = (int)Math.Floor(originalID / 10000M);

Because each of the 1,000 buckets represent a set of 10,000 ID values, if you can identify the target bucket where the ID is stored, you only need to search that bucket instead of the entire set of 1,000,000 items. On average, you probably only need to search through half the bucket, which is just 5,000 comparisons in a full bucket.

The actual average number of comparisons varies based on the distribution of and repetition within your source data. Most of these so-called hashing tables don't guarantee that the target data will appear in the initial bucket. If the bucket is already full, subsequent values will flow into the next bucket. But at the very least, hashing provides the lookup with a reasonable starting point, and a good chance that the match will appear soon.

Basic hashing algorithms are easy to make. Adding up all source digits and dividing by the length might provide a useful hash. Better algorithms that are domain- and distribution-specific are more involved, but they have the same goals. Fortunately, for this article's example, the .NET Framework already has a hashing tool built right inside the long data type. Because the code is searching through long integers, deferring to that data type's built-in hash is sufficient:

public override int GetHashCode()
{
  return this.ItemID.GetHashCode();
}

You aren't required to supply a GetHashCode override in conjunction with each new Equals override. In fact, Visual Basic doesn't even provide the warning. But if your code will perform a lot of searches and comparisons through object data, you might want to consider a helpful and bucket-friendly hash.

About the Author

Tim Patrick has spent more than thirty years as a software architect and developer. His two most recent books on .NET development -- Start-to-Finish Visual C# 2015, and Start-to-Finish Visual Basic 2015 -- are available from http://owanipress.com. He blogs regularly at http://wellreadman.com.

comments powered by Disqus

Featured

  • Microsoft Revamps Fledgling AutoGen Framework for Agentic AI

    Only at v0.4, Microsoft's AutoGen framework for agentic AI -- the hottest new trend in AI development -- has already undergone a complete revamp, going to an asynchronous, event-driven architecture.

  • IDE Irony: Coding Errors Cause 'Critical' Vulnerability in Visual Studio

    In a larger-than-normal Patch Tuesday, Microsoft warned of a "critical" vulnerability in Visual Studio that should be fixed immediately if automatic patching isn't enabled, ironically caused by coding errors.

  • Building Blazor Applications

    A trio of Blazor experts will conduct a full-day workshop for devs to learn everything about the tech a a March developer conference in Las Vegas keynoted by Microsoft execs and featuring many Microsoft devs.

  • Gradient Boosting Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the gradient boosting regression technique, where the goal is to predict a single numeric value. Compared to existing library implementations of gradient boosting regression, a from-scratch implementation allows much easier customization and integration with other .NET systems.

  • Microsoft Execs to Tackle AI and Cloud in Dev Conference Keynotes

    AI unsurprisingly is all over keynotes that Microsoft execs will helm to kick off the Visual Studio Live! developer conference in Las Vegas, March 10-14, which the company described as "a must-attend event."

Subscribe on YouTube