Practical .NET

High-Performance ASP.NET Caching

Crafting a caching strategy is critical to building effective Web apps. It's only possible when you know what options are available and how to integrate them.

There are two separate places where your application can be slowed to a crawl: In the jump from the browser to the Web server; and at the server, from the Web server to the database server (see Figure 1). While there are numerous programming tricks you can use to speed up your application, you'll get the best results by reducing those trips through intelligent caching of the application's data.


[Click on image for larger view.]
Figure 1. A typical Web application looks like the one at the top: multiple time wasting trips to the Web Server and, from there, to the database server. Smarter, more scalable applications cache data at the client and the Web server farm as part of an integrated caching strategy.

In addition, as all Web developers know, managing state (keeping track of what the user has done in previous requests in the same session) also involves caching key pieces of data for later reference. Reducing the data cached at the server reduces the demands placed on the resource shared by all users -- the Web server. Fewer requests made to your servers and less memory used on the Web server means your application supports more users while processing their requests faster.

Caching isn't the only way to meet those two goals, of course: You can also scale your application out to a server farm. However, if you've used caching to boost performance on a single server, you don't want to do a major rewrite when moving to a farm.

Throwing everything into the Session object and using a Session server isn't the answer to any of these problems. There are a wide variety of tools for caching data, with each of these tools solving a particular problem. If you're not taking advantage of all of them -- and understanding when they're appropriate -- your site won't be performing as it should. Those tools include some new, standards-based technology at the client, effectively integrating the tools ASP.NET gives you, and exploiting some new Microsoft technology.

Client-Side Caching
To reduce trips to the server, you can store data at the client where your JavaScript code can access it. AJAX-based solutions access Web services that eliminate the need for postback and rebuilding the whole page -- but even with AJAX you're still making that jump to the Web server; and the Web Service you're calling is probably jumping over to the database server. Caching ensures that once you retrieve the data, you keep it at the client to eliminate later trips.

While leveraging client-side data is dependent on your willingness to write client-side code, the benefits are huge. Client-side data has no server-side footprint taking up memory between page requests. And by providing access to data that would otherwise require a trip to the server, client-side data reduces both trips to the server (which the user will appreciate) and processing on the server (which you'll appreciate).

Ideally, the first time a user hits your site, you'd download to the client all the data it would ever need. Your only trips back to the server would be to perform updates. In real life, though, data is volatile, and over time data stored on the client will grow stale -- any caching strategy needs a more sophisticated solution. While you can use hidden fields to store data on a page-by-page basis, a more powerful set of tools for caching data at the client is the W3C Web Storage standard.

Web Storage -- Not
This standard was originally part of the HTML5 set of specifications, but has since been split off. Unlike many HTML5 specifications, Web Storage is widely implemented (available in Internet Explorer 8, Firefox 2.x, Safari 4, Google Chrome 5, and Opera 10.50 -- basically, any browser version released after 2010). Web Storage has to be the worst name ever given to a technology, because the one thing it doesn't do is store data on the Web -- Microsoft, for instance, refers to the technology by its alternate name: DOM storage.

Web/DOM Storage provides two mechanisms for storing data at the client using JavaScript code running in the browser. One mechanism (local storage) supports long-term storage at the client that survives the browser being shut down. Local storage ensures that pages can only access data saved from pages in the same domain.

The other mechanism (session storage) automatically takes care of disposing your data when the tab in a window is closed. It also ensures that data retrieved in two different tabs is kept separate (something that cookies won't do, for instance). For both mechanisms, the data format is associative -- name/value pairs -- which means you won't be doing any data analysis with these tools. There's a separate specification for a more powerful mechanism (Indexed Database API) and a de facto standard called Web SQL Database that also provides more flexible storage, but only Web Storage can be used reliably.

Your caching strategy will need to deal with three kinds of client-side data:

  • Data used across multiple pages by a single user (data you'd normally stuff in the Session object on the server).
  • Data personal to the user. This data is updated only by the user. That data can be updated on the client when the user changes it, and stuffed into hidden fields to be returned to the server when needed for server-side processing. You'll need to keep a copy of that data on the server, however (to support the user as they move from one computer to another or for data analysis).
  • "Semi-volatile" data that doesn't change very frequently: everything from lists of countries to display in drop-down lists to your company's products list.

Using the Storage Classes
As an example of how to use client-side data for temporary storage, assume a user's attempting to buy some product on your site. You can store all that transaction's information in the Session object where it's inaccessible to client-side code, will take up space in memory between page requests, and will hang around for 20 minutes after the user finishes with your site. Or you could store the data in session storage and, on a page-by-page basis, embed it in hidden fields to get it back to the server when needed for server-side processing.

Fortunately, adding new data to session storage is easy: simply set a property with a name of your own to the sessionStorage object. The property name will be used as the key for the value stored in the property. Transferring data entered in a textbox to session storage can be as simple as the following combination of HTML and JavaScript. This code stores the value of the textbox in session storage under the key "cName" when the focus shifts from the textbox:

<asp:TextBox ID="CustName" runat="server" onblur="sessionStorage.cName=this.value;">

Retrieving the cached data and putting it in some other control looks like this:

if (sessionStorage.cName != null)
  {
window.document.getElementById("HiddenNameProcessed").value = 
  sessionStorage.cName;
  }

This code would successfully retrieve data that was stored by code executing in any page from the same domain (such as phvis.com) or subdomain (such as www.phvis.com) running in the same window/tab -- in other words, from any page in the user's session.

Handling user data and semi-volatile data is only slightly more complex. I'll assume that you'll deliver your client-side data through a set of Web services called by JavaScript code in the client. Making that assumption, once the page is ready, I check to see if the needed data is stored locally by checking some arbitrary property that I'll eventually create on the localStorage object:

var store = localStorage;
if (store.lastRetrieved != nothing)
{

My next step is to determine if the local data is out of date; if it isn't, I'll retrieve the latest version of the data. To do that, I call a Web service that returns the necessary data. When calling that service, in addition to passing any parameters the service requires, I also pass some value that indicates the state of the data on the client so the Web service can determine if the client's data is stale (of course, if I haven't retrieved the data before then this value will be null). The simplest value might just be the last retrieved date, but an alternative is to use the timestamp value from a database row. Using a date allows me to reduce checking on the client by not requesting the data if I've updated local storage within, for instance, the last hour.

The following code uses the jQuery getJson function to call the Web service in a RESTful kind of way. You could, however, just as easily use ScriptManager's support for calling Web services. The querystring for this request includes the user's ID and the value indicating the state of the data on the client:

$.getJSON(
  "http://myserver.com/UserServices/GetUser?uId= + $("#username").val() + 
    "&lastRetrieve=" + localStorage.lastRetrieved, 
  function (returnValue) 
  {

Once I retrieve the data, I update local storage with properties on the returned object that contain the user's data and the value representing the state of the data:

store.data = returnValue.data;
store.lastRetrieved = returnValue.lastRetrieved;

There are limitations to the amount of data you're allowed to store per domain, but they're very forgiving: 10MB in Internet Explorer and 5MB in most other browsers. Both local and session storage have a remainingSpace method that returns the amount of space available for use.

On both storage objects, you can also use their length method to retrieve the number of items in storage, the key method to retrieve the key for a value by position, the removeItem method to remove an item from storage, and the clear method to clear out everything in the storage for your domain. The getItem and setItem methods allow you to retrieve or set values when passed a key instead of hardcoding the key names as property values. This code, for instance, retrieves all the items in local storage:

var key;
for (i = 0; i < sessionStorage.length; i++) 
{
  key = sessionStorage.key(i);
  window.alert(sessionStorage.getItem(key));
}

As with cookies, everything must be stored as a string value.

Custom Server-Side Caching
All the client-side programming in the world isn't going to completely eliminate trips to the server, however. When a Web service is called or a page is posted back to the server, processing will be faster if you can retrieve the data from memory in the Web server instead of making a trip to the database.

For any data that's shared among users, caching data helps ensure that you have only one copy of the data soaking up memory, instead of a copy for each user. Even for data kept on a per-user basis, using a combination of the Cache and the Session objects can reduce demands on your Web server's memory. In addition, the Cache is designed to handle every possible case -- you can improve caching solutions with business-specific code.

Returning to my example of a customer making a purchase: In the Session object, you could store the sales order the customer's building. The problem is that the data in the Session object sticks around for 20 minutes after the user leaves your site, regardless of how great the demand currently being placed on your server.

Storing data in the Cache offers at least one advantage: When things get tight, ASP.NET will start shedding items from the Cache to recover memory. The disadvantage is that when you go back to get that sales order from the Cache, it may be gone. The solution here is to save your data in the Cache, but when ASP.NET removes the item from the Cache, save it to a database table. Later, when the application requests the data, your code first checks the Cache; if it doesn't find the data there, it then retrieves the data from the database and deletes it. That doesn't mean you can do without the Session object, but all you need it for anymore is to store the keys you'll use to retrieve items from the Cache.

Because most Web servers are sized to support their peak demand periods, most of the time your Web server has memory to spare. This strategy lets you use that available memory. When you do enter your peak periods, the data is removed from the Cache to your second choice storage: the database. You'll have to make a trip to the database when you need the data, which is too bad. However, in the absence of the Cache, you'd have to store and retrieve your data from the database anyway -- you're no worse off at the worst of times and much better off most of the time. Because the Cache automatically sheds the data that hasn't been used for the longest time and because you will only put data back in the Cache when an application needs it, you're guaranteed that the only data in the Cache is what's currently needed.

There are a lot of ways to implement this strategy, but the easiest way is to create your own Cache class to centralize the code. For consistency with the ASP.NET 4.5 customizable output caching provider, your Cache provider should implement four methods: Add, Get, Remove, Set.

To avoid having to instantiate your new Cache object, declare all the methods as Shared. In the Add method, remove any existing version of the object currently in your cache and add the object passed to the method to the ASP.NET Cache. While adding the object to the Cache, also reference a method to be called whenever the item is removed from the Cache. For that you need to create a CacheIRemoveCallback object, passing the address of the method:

Public Class PHVCacheProvider
Public Shared Function Add(key As String, entry As Object, utcExpiry As Date) As Object
  Remove(key)
Dim SaveToDatabase = New CacheItemRemovedCallback(AddressOf SaveToDbCallback)
  entry = System.Web.HttpContext.Current.Cache.Add(
    key, entry, Nothing, utcExpiry, Nothing, 
    CacheItemPriority.Normal, SaveToDatabase)
  Return entry
End Function
The Set method uses the Add method to update the cache:
Public Shared Sub [Set](key As String, entry As Object, utcExpiry As Date)
  entry = Add(key, entry, utcExpiry)
End Sub

The Get method attempts to retrieve the requested object from the ASP.NET Cache and, if it doesn't find the object there, attempts to find it in the database. If the method finds the object in the database, it uses the Add method to put the object back in the Cache:

  Public Shared Function [Get](key As String) As Object
    Dim entry As Object
    entry = System.Web.HttpContext.Current.Cache(key)
    If entry Is Nothing Then
      'retrieve and delete from database, create an instance of entry
      If entry IsNot Nothing Then
        entry = Add(key, entry, Nothing)
      End If
    End If
    Return entry
  End Function

It's here that you'll need to add business-specific code. While it may be tempting to create a universal table that can hold any cached object, you'll find your applications easier to manage and debug if you store the values of properties in specific columns, rather than serializing it into some BLOB-like column.

The Remove method attempts to find the object in the Cache and, if it finds it, removes it. If the method doesn't find the object, the method removes the object from the database:

  Public Shared Sub Remove(key As String)
    Dim entry As Object
    entry = System.Web.HttpContext.Current.Cache(key)
    If entry Is Nothing Then
      'remove from database 
    Else
      System.Web.HttpContext.Current.Cache.Remove(key)
    End If
  End Sub

Finally, here's the SaveToDatabase callback function referenced earlier that ASP.NET will call whenever an object is removed from the database. The method doesn't interfere with the process of removing items from the Cache if they've expired or deleted by the user, so the code checks the reason the object's being removed before taking any action. If the reason passed to the method indicates that the object is being removed because it's underused (rather than expired or deleted by the Remove method), the method saves the object to the database:

  Shared Function SaveToDbCallback(key As String,
                                   value As Object,
                                   reason As CacheItemRemovedReason) As Object
    If reason = CacheItemRemovedReason.Underused Then
      '...save to database
    End If
    Return value
  End Function
  End Class

You'll also need some scheduled job to delete objects from the table (once per day is probably enough). In ASP.NET 4.5, with a little bit of extra work (having the class inherit from OutputCacheProvider, replacing the Shared keyword with Overrides, and adding some tags to the web.config file) you can use your custom provider with output caching in your .aspx page.

Cross-Server Caching
As powerful as the Cache is, it's not a perfect solution. The major problem is that the Cache resides in the memory of a single server. If you've scaled out to a server farm and aren't pinning users to a particular server, users can load data into the Cache on one server; but on their next request, they find their request being processed on another server that doesn't have that object in the Cache.

That's not the end of the world for data that's shared among users -- eventually all the servers in the farm will get their own copy of the shared data. However, the user-specific data that drives the design of my custom cache won't work effectively in this environment. Besides, replicating data on each server seems inefficient. Ideally, cached data should be shared across all servers in a farm.

Microsoft provides this support with Windows Server AppFabric (not to be confused with Windows Azure AppFabric), which provides caching support for Web applications running under the Microsoft .NET Framework 4 and the .NET Framework 3.5 SP1. Web caching is handled by the AppFabric Caching Service, which started life as a project called "Velocity." You can download and install Windows Server AppFabric from the MSDN site. For earlier versions of the Framework, the free memcached tool is a good alternative.

Windows Server AppFabric and Windows PowerShell
Configuring AppFabric with Windows PowerShell is key to success. You need to establish one or more computers as Cache Hosts, which run a Windows service to support caching (having multiple Cache Hosts provides protection against one of the hosts failing). Configuration, among other options, ensures that Caching Hosts can find each other and clients can find the hosts.

After configuring, the first step in using Windows Server AppFabric from your pages is to create a DataCacheFactory object. Microsoft recommends creating only a single DataCacheFactory in any page and storing it in a variable accessible throughout the application (apparently, there's a lot of overhead associated with creating a DataCacheFactory). This code should, therefore, go outside any method:

Dim dcf As DataCacheFactory = New DataCacheFactory()

Once you've created a DatacacheFactory, you can retrieve a DataCache object by calling the factory GetDefaultCache method:

Dim dc As DataCache = dcf.GetDefaultCache

Now you're ready to use the DataCache. This code creates a User object and then stores the object in the DataCache under the key "self," concatenated with the value in the userid variable used to create the object, like so:

Dim usr As User 
usr = New User(userid)
dc.Put("self" & userid, usr)

To retrieve the object, use the DataCache's Get method, passing the key the object stored under and casting the returned value. This example retrieves the User object stored in the previous code. It could be used from any server in the farm sharing a Cache Host with the server that stored the original object:

Dim usr As User = CType(dc.Get("self" & userid), User)

As you can see, if you've written your application to use the ASP.NET standard Cache object, you'll need to rewrite it to use Windows Server AppFabric if you move the application to a server farm. For any new application, consider using Windows Server AppFabric even if the application isn't going to be installed, initially, in a farm -- it could save you a rewrite later. Of course, if you've created your own custom Cache class, migrating to Windows Server AppFabric is simpler: you just need to rewrite your custom Cache class.

A complete caching solution involves integrating multiple components -- data cached at the client that can be accessed by client-side code, effectively melding the tools available on the server with business-specific code, and supporting the possibility of eventually scaling out to a server farm. Using only the Session, Cache, and Viewstate objects or failing to integrate them into an overall strategy will limit your application's ability to scale to support more users. You should be planning your data caching with the same care that you devote to the design of the rest of your application.

comments powered by Disqus

Featured

Subscribe on YouTube