Practical .NET

Share Information Among Asynchronous Processes Sans Locks

If you're creating an asynchronous application (and you should be) you'll be glad to know that .NET offers ways to share data that don't require you to lock up your application.

In an earlier column, I discussed how creating an application as a set of cooperating, asynchronous processes can actually simplify your application while giving you all the benefits of asynchronous processing (a more responsive application, related processes that deal well with different processing speeds, plus exploiting multi-core processors). The trick is to divide your application into producers (processes that gather data) and consumers (processes that do something with that data) joined together by some asynchronous pipeline.

In a later column I demonstrated how to use BlockingCollections to pass data from the producer to the consumer and to pass error messages from the producer to the consumer.

In the case study in the earlier article, my producer was a UI that gathered data from the user; the consumer was the process that took the user's data and performed whatever updates were required. Using the BlockingCollection avoided you having to craft a sophisticated set of locks that would prevent the two processes from stepping on each other's toes.

But you don't have to use the BlockingCollection: You can also use any one of the ConcurrentQueue, ConcurrentStack, or ConcurrentBag collections as the connecting tool that joins your producers to your consumers. In fact, if you're using the BlockingCollection, you're actually using the ConcurrentQueue because BlockingCollection is, by default, a wrapper for ConcurrentQueue. Like the BlockingCollection, all of these Concurrent* collections make it easy to implement the producer/consumer pattern and do it with the minimum of locking. And, if you don't need the features of the BlockingCollection, bypassing the BlockingCollection to use one of the Concurrent* collections should give you better performance.

BlockingCollection vs. Concurrent* Collections
For me, the major difference between BlockingCollection and the Concurrent* collections are in the methods that remove items from the collection: In the BlockingCollection, the TryTake method (and its variants) block if there isn't anything on the queue to process; the equivalent methods in the Concurrent* collections merely return False and carry on to the next line of code. I find that the blocking behavior makes the BlockingCollection easier to program than the Concurrent* collection's "just keep going" approach. In addition, the BlockingCollection offers additional functionality that makes it easier to coordinate consumers and providers -- something the more bare bones Concurrent* collections do not.

There are performance considerations that make the BlockingCollection attractive to me (though I've never measured these). Assuming your consumer code is continuously checking for new items in the collection, you'll probably be retrieving items from the collection inside a loop. With the Concurrent* collections, if the queue is frequently empty, then you'll spend a lot of time swinging through the queue and not finding anything. In the same scenario, however, the BlockingCollection will spend its time idling, waiting for something to show up on the queue. As a result, I suspect the BlockingCollection uses fewer resources than the Concurrent* collections when there isn't much to do. I like that also.

On the other hand, many developers will prefer the Concurrent* collections because they never block. If two requests try to pull items from the Concurrent* collections simultaneously, both requests are satisfied. If there are two or more items in the collection, the first request gets the first item from the collection and the second request gets the next item. If there aren't enough items in the collection to satisfy all of the requests, the second item gets nothing. Either way, no exception is thrown and, critically, there is no pause in execution. Your application can be more responsive with the Concurrent* collections than with the BlockingCollection.

In addition, the processes involving the Concurrent* collections are easier to cancel: Each time through the loop you can check some global variable to determine if your code should exit the loop. With the BlockingCollection, because you might be idling on a TryTake you have to work with cancellation tokens to ensure you can break out of a TryTake when you want to end processing (I showed how to use the cancellation tokens in my follow-up column on the BlockingCollection).

One more vote in favor of the BlockingCollection: Because the BlockingCollection is just a wrapper around one of the Concurrent* collections, using the BlockingCollection means never having to say you're sorry. With the BlockingCollection, if it turns out that you've made the wrong choice in picking a Concurrent* collection to wrap, you can just change the Concurrent* collection you passed to your BlockingCollection when you create it -- the rest of the code in your application, which works with the BlockingCollection, won't have to change at all.

Queue and Stack
As far as integrating ConcurrentQueue or ConcurrentStack into your application, the two collections are really the same object. The difference between the two is in the order you retrieve the items that are added to the collection. The ConcurrentQueue is a FIFO structure: The first item you add is the first item you retrieve (it's just like the line waiting for a teller at the bank); the ConcurrentStack is a LIFO structure: The last item you add is the first item you retrieve (it's like piling clothes into a laundry basket). Both Collections, for example, have an IsEmpty property that returns False as long as there's something in the queue to process.

To reflect this difference, the add and retrieve methods on the two classes have different names. With ConcurrentQueue, you add items to the collection with the Enqueue method and remove items with the TryDequeue method; with ConcurrentStack you add items with the Push method and remove items with the TryPop method.

With both the TryDequeue and TryPop methods, you must pass a variable to hold the item found on the queue. When called, the TryDequeue and TryPop methods actually do three things: First, they check to see if there's an item on the queue; If there is an item, the methods populate their parameter with the item and remove the item from the queue (if no item is found, the parameter is set to null/Nothing); Finally, the methods return True if they're given an item and False if they did not.

As an example, I'll rewrite my case study using a ConcurrentQueue. Because I want to use my Concurrent* collection in two different processes, I declare it at the class level. I'd also declare a variable I can use to cancel the consumer loop:

Private cqPeople As New ConcurrentQueue(Of Person)
Private cancel As Boolean

The producer process that adds items to this ConcurrentQueue would have code like this:

Dim pers As Person
pers = New Person
pers.FirstName = Me.FirstName.Text

cqPeople.Enqueue(pers)

The consumer process would use the TryDequeue method to retrieve items like this:

Dim pers As Person = Nothing
Do While Not cancel
  If cqPeople.TryDequeue(pers) Then
    '...code to process Person object in pers...
  End If
Loop

If you want to retrieve an item without removing it from the collection, you can use TryPeek (which both classes have) instead of TryDequeue or TryPop.

With both the ConcurrentQueue and the ConcurrentStack you can pass some other collection when creating the ConcurrentQueue or ConcurrentStack in order to populate your Concurrent* collection. This example loads an existing array of People objects into a ConcurrentStack:

Private csPeople As ConcurrentStack(Of Person)
Private Sub StartConsumer(lstPeople As List(Of Person))
  csPeople = New ConcurrentStack(Of Person)(lstPeople)  

The ConcurrentStack does have two methods that ConcurrentQueue does not: PushRange and TryPopRange. These methods allow you to add (or remove) multiple items from the ConcurrentStack at a time and will run faster than repetitively calling Push or TryPop. Unfortunately, the methods only work with arrays. Here's what a TryPopRange call looks like:

Private csPeople As New ConcurrentStack(Of Person)
Private Sub StartConsumer(arPeople() As Person)
  csPeople.PushRange(arPeople)

Bag
If you're looking for maximum responsiveness, however, the tool you want to use is the ConcurrentBag. Unlike ConcurrentQueue or ConcurrentStack, ConcurrentBag doesn't guarantee that your objects will be returned in any particular order. Because ConcurrentBag doesn't worry about order, it runs slightly faster than either ConcurrentQueue or ConcurrentStack.

However, to use ConcurrentBag you really have to not care at all about the order in which you process items. In my BlockingCollection scenario, I had a UI-based front-end process that was adding items to the collection and a back-end process that was removing items from the collection to perform updates. Because the items in the queue will all, eventually, be processed and I don't care what order the items are processed in, I could use a ConcurrentBag.

But, while I may not care about the order, my users might. In my case study, I had a feedback loop that reported errors found during updating back to the UI. If I used ConcurrentBag to hold the updates, it's possible that the user might update items 1 and 2 (in that order) but my back-end process might update the items in the reverse order. If my back-end processing found problems in both updates, it's hard to tell how my users might react if item 2 had its problems flagged before item 1 had its problems flagged.

The ConcurrentBag object, like BlockingCollection, has an Add method for adding items to the Collection and a TryTake method to retrieve items and remove them from the collection. The TryTake method returns False if no items are found but, unlike TryTake on the BlockingCollection, the ConcurrentBag's method does not block. Like the ConcurrentQueue and ConcurrentStack collections, ConcurrentBag also has a TryPeek method to retrieve an item without removing it. Also like ConcurrentStack and ConcurrentQueue, you can pass a collection (like a List) to the ConcurrentBag when creating it to populate the ConcurrentBag.

And now you have a problem: You've got four classes (BlockingCollection, ConcurrentQueue, ConcurrentStack and ConcurrentBag) all of which look very much alike. This, of course, is an embarrassment of riches. There are worse problems to have when building applications but, armed with this column, you should be able to make the right choice.

About the Author

Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.

comments powered by Disqus

Featured

  • Full Stack Hands-On Development with .NET

    In the fast-paced realm of modern software development, proficiency across a full stack of technologies is not just beneficial, it's essential. Microsoft has an entire stack of open source development components in its .NET platform (formerly known as .NET Core) that can be used to build an end-to-end set of applications.

  • .NET-Centric Uno Platform Debuts 'Single Project' for 9 Targets

    "We've reduced the complexity of project files and eliminated the need for explicit NuGet package references, separate project libraries, or 'shared' projects."

  • Creating Reactive Applications in .NET

    In modern applications, data is being retrieved in asynchronous, real-time streams, as traditional pull requests where the clients asks for data from the server are becoming a thing of the past.

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

Subscribe on YouTube