Maximize Comparison Accuracy
Learn how to minimize inherent inaccuracies with Singles and Doubles when performing comparisons; learn how to use extensions, a .NET 3.5 feature, when targeting .NET 2.0 apps; and learn why you should avoid using Names that are used in a type you are extending.
TECHNOLOGY TOOLBOX: VB.NET
- By Bill McCarthy
One of the most commonly used and least understood intrinsic types is the Double Structure. This last week alone I've seen multiple questions about Doubles and accuracy in both mailing lists and newsgroups. A Double is a floating point number stored as eight bytes (64 bits). The range of numbers you can hold in a Double is vast: from positive 1.79E308 to negative 1.79E308. But to do this all within a meager 64 bits, a Double is limited to 15 significant figures. This limit on the significant figures means only the first 15 digits are presented with accuracy.
Because of the significant figure limitation, mathematical operations on Doubles can result in values that are reported as not equal to each other, when they should be:
Dim x = 5 / 7
Dim y = (150 / 7) / 30
Dim equal = (x = y) 'False
One of the recommendations I saw people make to solve this problem was to see whether the difference was greater than the smallest number that can be represented by a Double: Epsilon, which is 4.94065645841247e-324. Another approach was to round the number to a given number of decimal places. Both sound almost possible as solutions, but neither approach takes into account the vast range of numbers a Double can hold. The real solution lies in understanding the significant figure limitation.
If a number's accuracy is limited to 15 significant figures, it means a huge number such as 1E100 can have errors of the magnitude of 1E85. That's still a really large number, but relatively speaking it isn't significant. The distance from the Earth to the Moon is approximately 400,000 km, which is 4E5 km in scientific notation. A Double could describe the distance accurately to within the order of magnitude of the thickness of a strand of hair. For financial calculations in dollars and cents, 15 significant figures allows for accuracy in values up to about 10,000,000,000,000 dollars, which should be enough even for Societe Generale to play on the stock market with.
To compare two doubles, you need to look at their order of magnitude, and then see if the absolute difference between the values is less than what can be accurately represented. This is such a common task that I decided to write an extension method for this purpose:
Function Approximates( _
ByVal left As Double, _
ByVal right As Double, _
Optional ByVal sigfigs _
As Int32 = 15) _
If left = right Then Return True
Dim sigValue = If(sigfigs = 15, _
10 ^ (sigfigs - 1))
Return Math.Abs(left - right) < _
Math.Abs(left / sigValue)
In the Approximates function, the first statement checks to see whether the values are exactly equal and, if so, returns immediately. This early exit reduces overhead as the equal test is just a bitwise comparison of the two values on the stack. The next step is to decide what order of magnitude a significant value is. You could use logarithms and round the logarithm values, but that has a large overhead. Instead, a close approximation can be obtained by dividing one of the values by 1E15 and comparing that with the difference of the two values. To give the function greater flexibility, I made the number of significant figures an optional parameter. Rather than calculate 10 ^ (sigfigs -1) each time the method is called, the function uses the constant 1E15 when called with the default number of significant figures.
Add the Approximates function to a Module in your applications whenever you're dealing with Double comparisons and use it to replace your equality test. Specifically, you should replace If x = y with If x.Approximates(y). Also be aware that the less-than and greater-than comparison operators can return false positives. You might want to consider rewriting statements such as If x < y Then to something like If x < y AndAlso Not x.Approximates(y) Then.
The Single data type is similar to the Double data type, but is only accurate to seven significant figures. The Decimal data type is a high precision 128-bit value type. Internally, it stores the value as a 96-bit integer using three 32-bit integers, and stores the sign and the location of the decimal point in the other 32-bit slot. As such, a Decimal value allows up to 28 significant figures. The downside is, Decimal is larger and slower to do mathematical operations as compared to Double. If you're doing a lot of calculations, you will need to weigh performance and storage costs against the accuracy. If you don't require the accuracy of the Decimal type, stick with Double.
As an aside, Dates in VB6 are OLE Dates and are stored as Doubles representing the days from midnight, 30 December 1899. The fraction part of the Double represents the time of the day: twelve o'clock (midday) is 0.5. Because the value is a Double, calculations surrounding dates suffer the same accuracy issues Doubles do. In .NET, the DateTime Structure uses a 64-bit integer with the value stored in the lower 62 bits as a representation of the number of 100 nanoseconds (Ticks) since the theoretical date January 1, 0001. This gives the DateTime Structure a range of about 18 significant figures, allowing accuracies 1,000 times greater than VB6's OLE Date. Despite this increased accuracy, both share similar ranges with the OLE Date range being from Jan 1, 0100 to 31 Dec 9999, with DateTime having a range from Jan 1, 0001 to 31 Dec 9999. The DateTime structure also provides an often crucial bit of information that's stored in the upper 2 bits of its 64-bit value: that being whether it is a local time, a UTC time, or undefined time zone.
Using Extensions in .NET 2.0
The rules for creating an extension method are simple: In a Module, you define a Sub or Function with the first parameter as the type being extended, and you add the System.Runtime.CompilerServices.Extension attribute to the method. The rest is just IDE and compiler wizardry that makes the method appear as if it's part of the type being extended. For the most part, this means you could theoretically use Extension methods in an application that targets .NET 2.0 because Visual Studio 2008 allows you to target 2.0, 3.0 and 3.5 .NET runtimes. The problem is that the Extension attribute is part of the 3.5 libraries. But there is a work around.
To trick the VB 2008 compiler into letting you use Extension methods, just add the following code into the project where you've defined your extensions:
Public Class ExtensionAttribute
You'll also need to ensure that you don't have a root namespace specified in the project properties; unfortunately, VB still doesn't provide a way to have a default root namespace in a project and escape out of the default when defining an explicit namespace. If your project does use a default namespace, and you decide to remove it, then you might find you'll need to add namespace declarations to your other code files.
Name Extension Methods Carefully
When it comes to extension methods and method resolution, the rules the compiler uses are slightly different than they are when dealing only with overload resolution. Last year, I wrote about Extension methods and their method resolution (On VB, "Beautify Your Code with Extensions," May 2007), but the rules then were about to change based in part on my feedback. As I wrote the article and looked at the rules at that time, it became clear to me that they were too restrictive: originally, one of the rules dictated that if a type has a method with the same name as the extension, the extension syntax couldn't be used.
I conveyed my concerns to the VB team, eagerly awaiting their decision. Unfortunately, my article deadline was itself extended too far, and somewhere in the mix-up, my original findings were published instead of the expected modified behavior. Adding to that, it turns out what I was expecting wasn't quite what we got--instead, a compromise was struck. I was expecting the rule to be the same as overloads resolution, and it is, for the most part, with one exception: If the type has a method with the correct number of parameters, and there's a widening conversion to those parameters, then the type's method is bound to in preference of any extension method, even if the extension method is a more derived or exact match. This means that you can't use an extension method named Equals that takes one or two parameters, even though you can define one:
Public Function Equals( _
ByVal left As Customer, _
ByVal right As Customer, _
ByVal checkNamesOnly As Boolean) _
Because Customer inherits from Object, it has a function (Equals) that takes two parameters, each as Object. VB will bind to the method defined in Customer or its base classes, rather than any extension method, even when the extension method is a more specific match.
Unfortunately, the change in how extensions are bound seems to have come in a bit too late for the VB team to provide a complete IDE experience with them. Considering the extension method can't be called at all, I would expect a warning on the Extension attribute part of the declaration, but there isn't any. What's worse: When you write "If customer1.Equals(customer2, True)", IntelliSense displays your extension method, and code completion will even help you write it (see Figure 1). As soon as you complete the code, depending on your compiler warning settings, you'll get a warning about accessing a Shared method through an instance: This is because the compiler binds to the Shared Equals(object, object) defined method in System.Object.
In the future, I hope the IDE will be more instructive as to what can and can't be used as an Extension method. In the meantime, try to avoid using Names that are used in the type you are extending.
Bill McCarthy is an independent consultant based in Australia and is one of the foremost .NET language experts specializing in Visual Basic. He has been a Microsoft MVP for VB for the last nine years and sat in on internal development reviews with the Visual Basic team for the last five years where he helped to steer the language’s future direction. These days he writes his thoughts about language direction on his blog at http://msmvps.com/bill.