Code Focused

The Fraternal Twins of Equals and GetHashCode

Lots of searching through lots of data means potential app performance degradation. Hash codes can speed things up.

The Items collection in a Windows Forms ComboBox control is eminently flexible, letting you store any object you desire for later selection and display. But I often find myself storing just the minimum of content in each element, typically a pre-formatted display string, and a numeric identifier from which I can access the underlying database entry or stored content, like this:

public class ItemWithID
{
  public string ItemText { get; set; }
  public long ItemID { get; set; }

  public ItemWithID(string theText, long theID)
  {
    this.ItemText = theText;
    this.ItemID = theID;
  }

  public override string ToString()
  {
    return this.ItemText;
  }
}

Adding a dropdown list item with an embedded identifier is then just a matter of adding the ItemWithID instance to the control's Items collection:

FoodCategory.Items.Add(new ItemWithID("Candy", 12L));

The ComboBox control's Items collection also sports a handy IndexOf method, used to locate an item already present in the list. Normally, you pass this method an instance of the desired object. But you can also pass it other values, and if you've overridden the object's Equals method, you can include logic that confirms whether members within a single instance match the queried value. For instance, this override accepts a long integer, and attempts to match it to the ItemWithID object's ItemID property:

public override bool Equals(object obj)
{
  if (obj is long)
    return (Convert.ToInt64(obj) == this.ItemID);
  else
    return base.Equals(obj);
}

Now you can find a matching object in the control by passing its identifier:

locationInCombo = FoodCategory.Items.IndexOf(12L);

This code works just fine, except for the error. Well, not an error, actually, but a noisy warning, one that appears in the Visual Studio Error List panel:

'ItemWithID' overrides Object.Equals(object o) but does not override Object.GetHashCode()

A hash code is a data value that represents some other data. Hashes are often used with password management, where a calculated and irreversible scramble (the hash) of a plain text password is kept in storage instead of the insecure original. The last digit of your credit card number is also a hash, one that helps validate the format of the other digits. Hashing algorithms have several important features that make them useful in a variety of computing situations:

  1. The same source data always produces the same hash. This is what enables hashed password comparisons and consistent credit card number verification.
  2. The hash is a simple data type that’s easy to compare and manipulate. This isn't a strict rule. You could have a hash that was more complex than the original. But normally, hashes are short strings or numbers.
  3. The set of all possible values in a hash is finite. The verification digit in your credit card is just a single digit, and allows for only 10 possible hashed values.

At first glance, there seems to be no relationship between a hash code and the comparison of one long integer to another within an Equals override. But hashes are important to the Equals method because of speed issues.

Let's say that you had added 1,000,000 items to your ComboBox control, with ID values ranging from zero to 9,999,999. Assuming that the numeric value you provided to each ItemWithID object is unique, the IndexOf method would need to call the Equals method on each and every element in the Items collection, stopping only when it found a match, or reached the end of the set without a match. If your lookups were somewhat random, you would expect to query about half the items (500,000) on average each time you used IndexOf.

That's a lot of comparisons. But what if you could lower the average number of comparisons down to, say, just 5,000 calls to Equals per IndexOf search? That's 100 times faster. Hashes enable that speed increase.

Hashes reduce the typical number of searches by breaking your data up into buckets, with each bucket represented by a hash code. Consider a hashing algorithm that converts an ID value (which, as you recall, ranges from 0 to 9,999,999) into a smaller representative number between 0 and 999:

hashedValue = (int)Math.Floor(originalID / 10000M);

Because each of the 1,000 buckets represent a set of 10,000 ID values, if you can identify the target bucket where the ID is stored, you only need to search that bucket instead of the entire set of 1,000,000 items. On average, you probably only need to search through half the bucket, which is just 5,000 comparisons in a full bucket.

The actual average number of comparisons varies based on the distribution of and repetition within your source data. Most of these so-called hashing tables don't guarantee that the target data will appear in the initial bucket. If the bucket is already full, subsequent values will flow into the next bucket. But at the very least, hashing provides the lookup with a reasonable starting point, and a good chance that the match will appear soon.

Basic hashing algorithms are easy to make. Adding up all source digits and dividing by the length might provide a useful hash. Better algorithms that are domain- and distribution-specific are more involved, but they have the same goals. Fortunately, for this article's example, the .NET Framework already has a hashing tool built right inside the long data type. Because the code is searching through long integers, deferring to that data type's built-in hash is sufficient:

public override int GetHashCode()
{
  return this.ItemID.GetHashCode();
}

You aren't required to supply a GetHashCode override in conjunction with each new Equals override. In fact, Visual Basic doesn't even provide the warning. But if your code will perform a lot of searches and comparisons through object data, you might want to consider a helpful and bucket-friendly hash.

About the Author

Tim Patrick has spent more than thirty years as a software architect and developer. His two most recent books on .NET development -- Start-to-Finish Visual C# 2015, and Start-to-Finish Visual Basic 2015 -- are available from http://owanipress.com. He blogs regularly at http://wellreadman.com.

comments powered by Disqus

Featured

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

  • .NET 9 Preview 3: 'I've Been Waiting 9 Years for This API!'

    Microsoft's third preview of .NET 9 sees a lot of minor tweaks and fixes with no earth-shaking new functionality, but little things can be important to individual developers.

  • Data Anomaly Detection Using a Neural Autoencoder with C#

    Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.

  • What's New for Python, Java in Visual Studio Code

    Microsoft announced March 2024 updates to its Python and Java extensions for Visual Studio Code, the open source-based, cross-platform code editor that has repeatedly been named the No. 1 tool in major development surveys.

Subscribe on YouTube