Investigating Exceptions Causing Web Site Havoc

A stepped approach using traditional tools in Visual Studio and third-party solutions to troubleshoot and debug Web application issues.

Sometimes you may have a partially functional application where certain parts of the application are working fine, but other parts are not behaving as expected. Issues in the development environment aren't too hard to troubleshoot, but if it’s happening in a production environment affecting users and clients, things can get tricky, especially when traditional ways of troubleshooting don’t provide enough useful information. This article will walk you through one possible way to get to the bottom of the issue.

In order to set the stage, let me first introduce a simplified version of an online store and the issue facing it. The application doesn’t have all the bells and whistles of a typical real-life eCommerce Web site, but it certainly has all the necessary ingredients to describe a real-life problem. It presents a typical use case where users can choose products, review orders and then check out to finish the transaction. As shown in Figure 1, a dropdown lists all the available products. Users can add items by clicking on the "Add Item to Cart" button. Once a user is done placing items in the cart, she can finish by clicking the Checkout button.

[Click on image for larger view.] Figure 1. Simple Online eCommerce System

As a user adds more items, the Item Count increments accordingly (see Figure 2).

[Click on image for larger view.] Figure 2. Items Added in the Cart

A user can view all the items in the cart by clicking on "Show Cart Items" button (see Figure 3).

[Click on image for larger view.] Figure 3. View Items in the Cart

So, what happens -- sometimes – is a user clicks on the Checkout button and gets the error message shown in Figure 4. Not all users face this problem. Some of them can check out their items and finish a transaction without encountering any error. This makes the issue a little trickier to troubleshoot, as the problem isn’t always reproducible, even in the production environment. For the users who do encounter this error, other parts of the application still seem to be working normally. For example, after getting this error, if a user adds another item in the cart, the Item Count is incremented appropriately, as shown in Figure 5.

[Click on image for larger view.] Figure 4. Error Message Upon Checkout
[Click on image for larger view.] Figure 5. Adding Item Still Works, Even After Getting Error at Checkout
[Click on image for larger view.] Figure 6. Show Cart Items Function Also Working Properly After Getting Error During Checkout

If a user clicks on the Show Cart Items button now, that functionality is also working as expected.

Bottom line is that the Checkout function is broken for some unknown conditions. This obviously is causing loss of revenue for the company, which can never be a good thing for any business. These types of issues always need to be resolved as soon as possible, and that can exert some extra pressure on the developers responsible for maintaining the application.

The first logical action to troubleshoot is to look at the application log files, entries in the Event Viewer or any other type of logging that the target application may have been using. In this case, the following error is reported in the application log file:

ERROR 2017-01-25 17:12:35,970 38919ms CheckoutForm  btnCheckout_Click  - System.NullReferenceException: Object reference not set to an instance of an object.
   at OrderSystem.CheckoutForm.CalculateTotalPrice(Int32 customerId, Order order, ShipmentInfo shipmentInfo) in C:\ ApplicationThrowingException\ProductOrderSystem\ProductOrderSystem\CheckoutForm.aspx.cs:line 78
   at OrderSystem.CheckoutForm.btnCheckout_Click(Object sender, EventArgs e) in C:\ApplicationThrowingException\ProductOrderSystem\ProductOrderSystem\CheckoutForm.aspx.cs:line 51

This log entry clearly indicates that a null object reference condition has been encountered. A NullReferenceException isn’t too complicated to understand. According to the MSDN documentation page, this exception is thrown during an attempt to deference a null object reference. It's such a common error that developers encounter causing so many questions to be raised, that Stack Overflow has a separate tag for NullReferenceException. There are also many recommendations on how to resolve it (you can see one example on Stack Overflow).

There’s no question that developers can do a better job checking for a valid object before invoking any operation on it, and the framework can also do a much better job reporting exactly which object it’s complaining about. In fact, it's a popular suggestion on the UserVoice site. Teams at Microsoft are listening to this feedback and looking to provide new capabilities (like a new Exception Helper) that can help determine which object is null. Unfortunately, that’s not very helpful in this situation and you have to rely on conventional approaches to determine which object is null. Because logs are pointing to the CalculateTotalPrice method, let’s take a look at the code for it, as shown in Figure 7.

[Click on image for larger view.] Figure 7. Different Points in CalculateTotalPrice Can Cause NullReferenceException

A quick review shows that it’s full of traps that can cause a NullReferenceException. Here’s a summary of each of potential trap marked with arrows/number in the code in Figure 7. Each of these should have a check for object not null before invoking any operation:

  1. The Products property of Order instance should be checked for null object.
  2. Price property of Product instance should be checked for null.
  3. Multiple object instances/properties that aren’t checked against Null on this line.
  4. Duty property of the object returned by ImportDutyCalculator.GetImportDuty method.
  5. Rate property of the object returned by ShipmentRateCalculator.GetShipmentRates method.

As shown in Figure 8, the CalculateTotalPrice method is invoked from the Checkout button-click event handler. This method is wrapped in a try-catch block and the catch block has the message that’s observed in the UI. As noticed in the problem description earlier, when an error is encountered during a call to CalculateTotalPrice, the checkout process cannot complete, causing revenue loss for the business.

[Click on image for larger view.] Figure 8. Call to CalculateTotalPrice Is Wrapped in Try-Catch Block

The information logged in the log file combined with code review gave a few possibilities about the cause of the error, but you’re still not able to pinpoint the root cause. The traditional approach to get to the bottom of this issue is likely to add more logging in the CalculateTotalPrice method; however, this certainly is not the optimal approach, especially not for enterprise environments where making code changes and deploying them to production datacenters is not always possible. Let’s look for some other approaches.

IntelliTrace is a powerful Visual Studio feature that could be used to record and play (forward and backward) a sequence of certain types of events happening during the lifetime of an application. (If you’re new to IntelliTrace, check out this series of articles.) IntelliTrace logs can also be collected from a production application using IntelliTrace Standalone Collector. The link to that app also provides instructions on how to collect logs for a Web application that includes executing some Windows PowerShell scripts. Alternatively, you can also use IntelliTrace Data Collector to collect these logs. It's an open source tool that I’ve authored and is available for download at GitHub.

The output of this log-collection process will be an iTrace file that can be opened with Visual Studio 2015 Enterprise Edition. Figure 9 shows the collected iTrace file with a few key areas marked in red rectangles, but what’s important to notice is that it shows a NullReferenceException with the same stack trace that appeared in the application log file.

[Click on image for larger view.] Figure 9. IntelliTrace Logs in Visual Studio

In order to analyze IntelliTrace logs, it’s important to set up the symbol/PDF file appropriately. Assuming all of that has been done, clicking the "Debug Newest Exception in Group" button will open the source code and point to the last recorded event, as shown in Figure 10. The code line highlighted here is item No. 3, which was analyzed in Figure 7; it's there that you see quite a few possibilities that can cause the NullReferenceException. The Autos windows shows the same exception in Figure 10.

[Click on image for larger view.] Figure 10. Log Shows Code Causing NullReferenceException

Let’s step back and try to analyze different objects in this line of code that could be Null. This code is basically part of a foreach loop block iterating over various Products in the Order instance. This line also accesses a field in the Customer object passed as a method argument. Following is the list of all the objects that should have been checked for a null reference. See if you can extract any useful information from this IntelliTrace log and determine the specific object causing all of this trouble:

  1. Product class has a Detail property instance, as shown in Figure 11. This property is accessed in that line of code without any check for null object.
    [Click on image for larger view.] Figure 11. Product Class's Detail Property Should Have Been Checked for a Null Object
  2. ProductDetail class has InventoryInfo property, as shown in Figure 12. This property is accessed in that line of code without being checked for a null object.
    [Click on image for larger view.] Figure 12. ProductDetail.InventoryInfo Should Be Checked for Null Before Accessing It
  3. InventoryInfo class has an AvailableAtLocations property, as shown in Figure 13. Even though in the constructor this object has been initialized, it’s a public property, so there’s nothing stopping anyone from setting it as null in code.
    [Click on image for larger view.] Figure 13. AvailableAtLocations Should Have Been Checked for a Null Object
  4. Customer class has an Address property, as shown in Figure 14. This property is accessed in the line of code without being checked for null object.
    [Click on image for larger view.] Figure 14. Address Property Should Have Been Checked for a Null Object
  5. Address class has a CityName property, as shown in Figure 15. This property is of string type, but because it’s a public property, it shouldn’t be accessed with a check for null object.
    [Click on image for larger view.] Figure 15. CityName Should Have Been Checked for a Null Object

Looking back at Figure 10, the Autos window in the bottom half shows values of different objects passed into the CalculateTotalPrice method. However, the values shown for order.Products and customer.Address property are {unknown}. The reason is because IntelliTrace collects only the values of primitive data types and points to first-level child objects without their values as part of recording function call information. The bottom line here is that IntelliTrace directed you to a specific line of code causing the exception, but not the specific object that is null in the code.

Process Dump File
The next approach you can try is to turn to your old pal, Dump file. (If you’re new to the dump file-based approach, an article I wrote for is a good starting point.) Use it to collect a dump file of the affected process and then analyze that dump file to find the root issue.

As described earlier in Figure 8, the CalculateTotalPrice method is wrapped in a try-catch block, which basically means that there can be no unhandled exceptions caused due to this error. However, a first-chance exception still gets thrown. Let’s use Procdump with the -f and -e 1 switches to capture a dump file. The -e 1 switch is used to capture a dump file on first-chance exception, whereas, -f switch could be used to filter for a specific one. As shown in Figure 16, you’re filtering any exception with name NULL here. After running Procdump, just reproduce the problem and that should provide a dump file for further investigation.

[Click on image for larger view.] Figure 16. Configuring Sysinternal Procdump To Capture a Process Dump File

Once the dump file is created, open it in Windbg, as shown in Figure 17; the dump indicates a NullReferenceException. Specifically, the dump file shows that thread id 21 is active, which means that this could be the thread that caused the exception. Now you can further investigate by loading the Son of Strike (SOS) extension dll. This is needed because Windbg is a native debugger and it needs an extension DLL to analyze managed code.

[Click on image for larger view.] Figure 17. Process Dump File Opened in Windbg

Next, execute the !PrintException command to get any possible details about the exception on the active thread. No surprise here: It shows that the CalculateTotalPrice method is causing the exception. The ClrStack command can be used to display the managed stack trace of the current thread, as shown in Figure 18.

[Click on image for larger view.] Figure 18. Output of !PrintException and !ClrStack

All this is reassuring to what has already been found from the log file/IntelliTrace, but nothing new has been discovered from the dump file, yet. Let’s go back and just try to focus on the line of code that IntelliTrace logs identified is causing NullReferenceException. Refresh your memory about the concepts of Stacks that refers to a collection of elements managed in a Last-In-First-Out (LIFO) manner: If the dump file is captured at the right time, a possible object causing NullReferenceException should be near the top of its stack. You can use the DumpStackObjects command to list all managed objects on the active thread. The output of this command is shown in Figure 19.

[Click on image for larger view.] Figure 19. Output of !DumpStackObjects

You can see a Product object in this list of objects on thread stack. Run the DumpObj command against the address of this Product to see what it contains. As shown in Figure 20, this instance of Product object is a USB drive. Take the address of the ProductDetail instance further and dump its contents, too, which shows that its InventoryInfo is set to null, as shown in red rectangle in Figure 20.

[Click on image for larger view.] Figure 20. Dumping Contents of Product and ProductDetail Instances

This basically tells the products that a user added a USB drive to her cart and for some reason its InventoryInfo object isn’t properly initialized.

What You've Learned
This sort of concludes the investigation, as it is known exactly what object is null. From code reviews, it’s clear how to solve the issue by adding better error handling, especially for checking any object for null before accessing its values.

Now, in a real-life application, the data about the product will likely come from a database and that’s where you can also see why this USB product doesn’t have any InventoryInfo. For this sample application, I simply hardcoded all of the product-related data, as shown in Figure 21. Here, you can see that for the products Laptop and Desktop in blue rectangles, the InventoryInfo objects are initialized appropriately. However, for the USB Drive product marked in a green rectangle, no InventoryInfo object has been initialized, and that's what caused it to be null.

[Click on image for larger view.] Figure 21. Code Responsible for Initializing Products for Sample Web App

I hope you find this article helpful. Any questions or feedback, please leave in the Comments section at the end of the article. Happy debugging!

comments powered by Disqus


Subscribe on YouTube