C# Corner
C# Object Equality in .NET
Explore ways to override the default concepts of object equality, based on reference and value types, in the Microsoft .NET Framework.
The subtle nuances of object equality in the Microsoft .NET Framework can make the results of equality comparisons in C# confusing at times. How can you ensure that objects and object references behave properly within an application -- and within the .NET Framework?
When we talk about "equality" in an object-oriented language like C#, we need to make sure we're talking about the same thing. Types in .NET have a default concept of equality depending on whether they're a reference type or a value type.
Reference types derive from System.Object, which defines a virtual Equals method. This method does a simple reference check to see if the two items point to the same object in memory. If they do, it returns true; otherwise, it returns false.
Value types derive from System.ValueType. This type overrides the Equals method because equality of value types is based on whether the two objects you're comparing contain the same data. How does it determine this? Reflection is really the only way, because the type of ValueType isn't known until runtime. If you're thinking, "performance hit!" you're right.
Redefining Equality
The Equals method in System.Object is virtual, so you can redefine the concept of "equality" for your own types. Why would you want to do this? Suppose you had a database application and you wanted equality to mean "the same record in the database." You could simply check the ID property on two types:
if( currentCustomer.ID == suspendedCustomer.ID)
{
// ...
}
This works fine, but redefining equality can make the code more readable by allowing consumers to compare Customer objects:
if (currentCustomer.Equals(suspendedCustomer))
{
// ...
}
If you decide to redefine equality for one of your types, you need to make sure your concept of equality adheres to four rules before you start coding:
- Reflexive: The x.Equals(x) must always return true.
- Symmetric: The order of comparison doesn't matter, but the result of x.Equals(y) must be the same as y.Equals(x).
- Transitive: If x.Equals(y) and y.Equals(z), the x.Equals(z) must return true.
- Consistent: Assuming x and y aren't reassigned, repeated calls to x.Equals(y) must always return the same value.
Following these four rules will ensure your type plays well with the .NET Framework.
Let's redefine Equals for our Customer type:
public class Customer
{
public int ID { get; set; }
public string Name { get; set; }
public override bool Equals(object obj)
{
if( obj == null)
{
return false;
}
if( Object.ReferenceEquals(this, obj))
{
return true;
}
if( this.GetType() != obj.GetType())
{
return false;
}
Customer other = (Customer) obj;
return this.ID == other.ID;
}
}
First, let's check and see if we're comparing against null. Because we're inside an instance method, "this" is not going to be null, so we only need to check the other side of the comparison (obj).
The next step is a simple -- and quick -- reference check. The Object class exposes a static ReferenceEquals method that indicates if the two objects point to the same item in memory. If they're the same object, there's no need to go any further and we can return true.
Next, we do a type check to make sure we're comparing like objects. If the objects aren't of the same type, there's no way they could be equal.
Finally, we get to our customized logic of checking the ID property. If the two IDs match, it represents the same object in the database and satisfies the equality check for our Customer object.
The validation and checking before we even get to the comparison code is necessary because an Equals override should never throw an exception. The objects are either equal or not equal -- there's nothing else that can make logical sense.
What about the == operator? Should we override that to use our new Equals method? For reference types, the answer is no. The .NET Framework assumes that the == operator will follow reference semantics. That changes with value types, however.
Redefining Equality for Value Types
The default implementation of System.ValueType.Equals uses reflection. You should always override Equals on value types and provide a custom implementation because reflection can be slow.
Let's look at a simple Point3d example:
public struct Point3d
{
public int X { get; set; }
public int Y { get; set; }
public int Z { get; set; }
}
Comparing two points for equality is as simple as making sure their X, Y and Z values are the same. However, the default implementation will use reflection to loop over the set of properties and then to get the property values -- and after that it will do a comparison. Let's override Equals and provide our own, reflection-free implementation:
public struct Point3d
{
public int X { get; set; }
public int Y { get; set; }
public int Z { get; set; }
public override bool Equals(object obj)
{
if (!(obj is Point3d))
{
return false;
}
Point3d other = (Point3d) obj;
return this.X == other.X && this.Y == other.Y
&& this.Z == other.Z;
}
}
Notice how this implementation is much simpler. We're dealing with value types, so we don't have to worry about nulls. This makes the logic simple: If the object is not a Point3d, return false. Otherwise, we cast it and do our comparisons of the X, Y and Z values.
With value types, you should always override the == operator. Like the Equals method, the default implementation of the == operator uses reflection and is slow. Use the same logic as the Equals method, and you'll get much better performance when you're doing equality comparisons on value types.
Boxing and Type Safety
As you probably noticed, the signature for Equals accepts an object. This means two things:
- For reference types, we lose some type-safety flexibility.
- For value types, we pay the cost of boxing and unboxing.
Microsoft realized this, and when the company introduced generics in the .NET Framework 2.0, it gave us a new interface for equality comparison that was type-safe -- IEquatable<T>:
public interface IEquatable<T>
{
bool Equals(T other);
}
Now we can use this interface to get type safety for our reference types and remove the boxing and unboxing of our value types. Let's revisit the Point3d example and implement IEquatable<T>:
public struct Point3d : IEquatable<Point3d>
{
public int X { get; set; }
public int Y { get; set; }
public int Z { get; set; }
public override bool Equals(object obj)
{
if (!(obj is Point3d))
{
return false;
}
Point3d other = (Point3d) obj;
return this.Equals(other);
}
public bool Equals(Point3d other)
{
return this.X == other.X && this.Y == other.Y
&& this.Z == other.Z;
}
}
We still need to support the old Equals(object obj) signature for older clients. But new clients can use the IEquatable<Point3d> implementation to avoid the boxing.
Equality Summary
Based on everything covered so far, we can make some general statements about Equals and the == operator and how they relate to reference types and value types.
You should override Object.Equals:
- Whenever you create a value type. The default implementation uses reflection and is slow.
- Whenever you have a reference type in which the equality (the same object in memory) doesn't make sense (think of String.Equals).
- When overriding Object.Equals, make sure your comparison code never throws an exception.
- When overriding Object.Equals, always implement IEquatable<T>.
You should override operator ==:
- Whenever you create a value type. Like Equals, the default implementation uses reflection and is slow.
- Almost never with reference types. The .NET Framework assumes reference types will always follow reference semantics with the == operator.
What Is Object.GetHashCode?
If you've typed in any of the sample code and tried to compile it, you'll notice a warning whenever you override Equals: warning CS0659: Ô<your object>' overrides Object.Equals(object o) but does not override Object.GetHashCode().
The hash code corresponds to the value of an object. If you're customizing the equality comparison of an object (it's "value" used for comparisons), then you must also customize the hash code. But overriding GetHashCode is not trivial.
The hash code returned by GetHashCode is used in hashing algorithms and data structures like HashSet<T> and Dictionary<K,V>. The implementation of GetHashCode is assumed to be quick and dependent on at least one of the properties of the object.
The default implementation of GetHashCode is simple. Whenever the .NET runtime creates an object, it assigns a unique key to that object. That key starts at one and is incremented every time a new object is created. The key is returned whenever GetHashCode is called and it can never be changed.
Now that you know how the default implementation works, you probably see a problem with the Customer object that we created earlier, which based equality on the ID property. If we have two Customer objects with the same ID (meaning that, for our purposes, they're the same), they'll have different hash codes because they're different objects in memory. That means code like this will fail:
static void Main(string[] args)
{
var customer1 = new Customer {ID = 5, Name = "Bob"};
var customer2 = new Customer {ID = 5, Name = "Bob"};
var customers = new HashSet<Customer>();
customers.Add(customer1);
var foundID5 = customers.Contains(customer2);
}
Remember, the assumption is that GetHashCode is quick. Therefore, for performance reasons, HashSet.Contains (and most other hash-based functions) will first check the hash codes. If the hash codes don't match, then the objects don't match and there's no need to run the more complicated Equals method. Only if the hash codes match will an "authoritative" check be done using the Equals method.
The previous code example will set the variable "foundID5" to false -- even though the Customer with ID of 5 is in the HashSet. This is all because we didn't override GetHashCode to return something meaningful based on our definition of equality (the database ID).
Overriding GetHashCode
There are three rules to keep in mind when doing your own GetHashCode implementation:
- Objects that are defined to be "equal" should produce the same hash code. If you override Equals to mean something other than reference equality, your "equal" objects should return the same hash code.
- The value of GetHashCode shouldn't change. If you get a hash code for a particular object, change some of the object's data and then retrieve the hash code again, it should stay the same.
- The hash code you produce should represent a random distribution across all possible values for your object.
The first rule is obvious and easy to handle. The second rule is important. Going back to our Customer class, we'll add a GetHashCode override that meets the first rule:
public class Customer : IEquatable<Customer>
{
// Rest of the code as before
public override int GetHashCode()
{
return this.ID;
}
}
Because our equality is based on the IDs matching, we can simply use the ID field as our hash code. Now run the following code and you'll notice something interesting:
var customer = new Customer {ID = 4, Name = "Patrick"};
var set = new HashSet<Customer>();
set.Add(customer);
customer.ID = 7;
var found = set.Contains(customer);
The variable "found" is false. We violated the second rule and made the hash code dependent on data that can change. This breaks the way hash code-based lookups work. In practice, the data used to generate the hash code should be immutable. If you need to change that data, provide a way for the user to create a new instance of your class with the updated data -- either a new constructor overload or some other factory method. The user would first remove the old object from the HashSet (or wherever it's stored that's based off the hash code), create a new instance with updated data and add the new instance.
The third rule is the hardest to do and beyond the scope of this article. Many things have been written on hash code algorithms. My advice would be to come up with an algorithm that meets rules one and two and then keep an eye out for performance. Wait until you see your hash-based lookups becoming a performance bottleneck, and then take the time to research a new algorithm.
A popular method I've used in the past with some success is to XOR hash codes of the immutable data together to create a hash code. Let's revisit the Point3d struct we created earlier. In this version, we're not only overriding GetHashCode (as is required because we're overriding Equals), but we're updating the class to follow the second rule. The x, y and z coordinates are read-only because they're used to produce the hash code:
public struct Point3d : IEquatable<Point3d>
{
private int x;
private int y;
private int z;
public Point3d(int x, int y, int z)
{
this.x = x;
this.y = y;
this.z = z;
}
public int X
{
get { return x; }
}
public int Y
{
get { return y; }
}
public int Z
{
get { return z; }
}
public override bool Equals(object obj)
{
if (!(obj is Point3d))
{
return false;
}
Point3d other = (Point3d)obj;
return this.Equals(other);
}
public bool Equals(Point3d other)
{
return this.X == other.X && this.Y == other.Y && this.Z == other.Z;
}
public override int GetHashCode()
{
return this.X.GetHashCode() ^ this.Y.GetHashCode() ^ this.Z.GetHashCode();
}
}
This implementation satisfies the first and second rules and probably does a good job on the third one as well.
We covered a lot of ground in this article. I hope you've learned something about how .NET treats equality comparisons, how that differs between reference types and value types, and how to implement your own concept of equality. If you have any questions about this topic or an idea for a future topic, don't hesitate to contact me.
About the Author
Patrick Steele is a senior .NET developer with Billhighway in Troy, Mich. A recognized expert on the Microsoft .NET Framework, he’s a former Microsoft MVP award winner and a presenter at conferences and user group meetings.