Simplify Your Applications with Asynchronous Processes
With the right tools, creating an asynchronous application can give you not only a more responsive application that makes better use of your multi-core computer, it can also make your application simpler. Really, asynchronous applications should be your default choice.
I used to tell people that if you wanted to create an application with bugs that would never be tracked down, just create a multithreaded application. But Microsoft .NET Framework 4 introduced some new tools that not only make multithreaded/asynchronous programming easier to do, it actually turns asynchronous programming into a tool for simplifying your application.
Before reading further, be warned: This column is a tease. Here, I'm just going to outline the structure for (and problems in) creating an asynchronous application. It's my next column where I'll get into the implementation details. Still, I'm going to claim that, with the right structure, asynchronous processing should be your default choice for creating applications.
A Simple Asynchronous Structure
A typical application can often be broken down into two activities: retrieving information and doing something with it. Separating these two activities into two asynchronous processes can give you some real benefits. To begin with, because each process is only doing one thing, the code for each process can be simpler. Ideally, you can end up with two loosely-coupled processes that can be enhanced independently of each other.
In addition, with asynchronous processes, the two components don't have to work at the same pace: If one activity takes longer than the other, that slower process can be allowed to fall behind without slowing up the other process. The only caveat, of course, is that the faster process will have to slow down or stop eventually so that the slower process can catch up. Overall, though, you should end up with a more flexible and responsive application because the input process can run as fast as stuff can be thrown at it.
And, of course, if you have a multi-core processor, with asynchronous processing there's the very real possibility that the .NET Framework and Windows will conspire together to distribute your processes over several cores: Your application will run faster.
To make this work, though, the two processes must have some reliable way to share information without adding so much complexity or overhead that you throw away the benefits of separating them. The first thing you need to worry about when sharing data is data integrity/consistency.
Data Integrity Problems
For example, in your application it might be that DataItem1 and DataItem2 are related to each other and need to be processed together (for example, when my wife married me, her marital status and last name changed together). If you have two asynchronous processes, it's possible that the process creating the data (let's call it the "producer") will first update DataItem1 and then update DataItem2. At the same time, it's possible that the process using the data (let's call it the "consumer") will read the data in the reverse order, getting the "old" value of DataItem2 and the "new" value of DataItem1. Just by reading the two pieces of data in the reverse order that the producer is updating them will cause the consumer to process my wife's information with her new marital status and her old name. Data integrity/consistency is lost.
One way to avoid this is not to change multiple pieces of data but instead have a single object that carries all the data. With that design you can add or remove the object with all of its data as a unit. Implementing that design for my "updating my wife" problem, I would have a Person object that represents my wife. Now, when she marries me, I remove her old object and add her new object -- effectively, I update her last name and marital status at the same time.
However, I've just moved the problem around -- I haven't actually solved it. After all, the producer has to remove my wife's "prior-marriage" object before replacing it with her "post-marriage" object. If the consumer tries to use the object between those two operations, the consumer will assume my wife doesn't exist at all.
It gets worse: If, as is often the case, I'm processing a collection of Person objects, the consumer won't see any individual object in an inconsistent state but might see the whole collection in an inconsistent state: Missing my wife. Adding my wife's "post-marriage" object before removing her "pre-marriage" object just creates a different problem: Now I have two "my wife" objects in the collection and only one of them is right.
What can I say: It's a problem.
One solution is to have the producer apply a lock before updating the multiple data items or collection: The producer will release the lock after the update is complete. Similarly, the consumer will apply a lock before reading the data items or collection and release the lock when done. When either the consumer or the producer finds that a lock already exists (presumably applied by the other process), the process that finds the lock waits until the other process releases its lock before continuing.
The first problem this solution creates is performance: A process idles while waiting for the other thread to release its lock and, as a result, does nothing. As the number of processes sharing data increases, the number of processes doing nothing also increases. To keep this wasted time to a minimum, you want to make sure that you apply only the locks you need and that they're in place for the least amount of time.
But, quite frankly, I think that the performance problem is the least of your worries: Implementing this kind of locking strategy is really hard to do well. While it's often easy to throw a lock around the update code in the producer, it's harder to ensure that all of the consumer's code honors those locks. So you want to simultaneously apply all the locks you need while applying the minimum locks you need.
Like I said: It's a problem.
And, if you're really unlucky your locks will create a deadlock situation where the consumer is waiting on the producer to release one lock while the producer is waiting on the consumer to release some other lock.
Fortunately, as of .NET Framework 4, there are several tools that take care of these problems for you. The easiest of these tools to use and understand is the BlockingCollection. The BlockingCollection provides a simple way for two asynchronous processes to share data reliably while maintaining data integrity/consistency and keeping lock time to a minimum. Using the BlockingCollection makes creating an asynchronous consumer/producer application easy.
I will admit that the BlockingCollection isn't a perfect solution: Obviously, if you're using a predefined object rather than crafting your own custom lock-based solution, you're not going to maximize performance. In a Microsoft article the authors point out that BlockingCollection can actually give you worse performance than an equivalent well-implemented use of synchronized locks, at least when processing volumes are low.
The first piece of good news is that the performance hit is very small. More importantly, as the number of operations per second increases, the BlockingCollection gives better performance than a locking solution. This is, of course, what you want: something that gets better as your load increases and resources become more constrained.
I should also point out that this comparison between BlockingCollection and a custom solution only makes sense if you're confident that you can implement an efficient locking scenario correctly. For me, at any rate, that's unlikely (and I have the multithreaded applications to prove it). In addition, the BlockingCollection gives you functionality that you'd probably never get around to writing in your own custom solution: BlockingCollection allows you to set bounds to your collection (preventing producers from adding items faster than other consumers can use them), supports letting processes know when there are no more results to process, and handles canceling asynchronous processes.
As I said, this column is a tease: In my next column I'll start working through how you can use the BlockingCollection to create a reliable asynchronous application with multiple processes that share information. You'll want to come back for that.
About the Author
Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.