C# Corner
Using LINQ to Express Intent
Use LINQ outside of databases to make your code easier to read and maintain.
How many times have you been handed some code to maintain and spent more than a few minutes reading over a certain method? You stare at the code and start working out in your head what the code is supposed to be doing. You may be able to glean some information from the method name or the variable names, but without being the author of the original code, it often takes more time than you'd like.
In this article, we'll look at how to use LINQ to make the intent of your code more obvious. This will lead to cleaner, more concise code that will be easier to maintain by you and the other members of your team -- even if you're a team of one.
The MSDN Library defines LINQ as: "A query syntax that defines a set of query operators that allow traversal, filter and projection operations to be expressed in a direct, declarative way in any .NET-based programming language."
When LINQ first came out, many code examples focused on two areas where LINQ fit well: LINQ to SQL and LINQ to XML. LINQ to SQL allows you to express SQL queries much more naturally in your code. Instead of resorting to raw SQL or Stored Procedures, you write object-oriented queries complete with IntelliSense and statement completion -- much nicer than raw SQL. LINQ to XML provides similar benefits for the processing of XML documents. Instead of quirky XPath queries, you can write more-readable queries that express the particular part of the XML document you're interested in.
What a lot of developers don't realize is that LINQ is available on just about any object that's enumerable. Lists, Arrays, Collections -- anything that implements IEnumerable or IQueryable can be used as a source of LINQ queries. Let's see how we can take advantage of that.
Element Selection
We've all written code that needs to iterate over a collection and select only certain items. Perhaps you only want items from a list of integers that are divisible by three:
private IEnumerable<int> Sample1(int[] numbers)
{
var answer = new List<int>();
foreach (int n in numbers)
{
if (n%3 == 0)
{
answer.Add(n);
}
}
return answer;
}
This is easy code and we've all written code like it before. But there's so much "extra" code to support this method, including code to initialize a storage location for the results, iterate over the collection and build the results.
All we want is a list of numbers divisible by three. Let's use LINQ to help us express exactly what we want and only what we want:
private IEnumerable<int> Sample1_LINQ(int[] numbers)
{
return numbers.Where(n => n%3 == 0);
}
This is much nicer. We reduced five lines of code, ignoring braces, down to a single line of code that implicitly expresses the intent of our code: a list of numbers in which each number is divisible by three.
Taking Just What You Need
How many times have you needed to grab just the first 10 elements of some list? Like our previous example, we'll start with a list of integers:
private IEnumerable<int> Sample2(int[] numbers)
{
var answer = new List<int>();
for(int i = 0 ; i < 10; i++)
{
answer.Add(numbers[i]);
}
return answer;
}
Again, we've got to write a lot of code just to grab the first 10 elements. And this code has a possible IndexOutOfRange exception. What if the "numbers" collection only contained eight elements? To really make this work, we need an additional check on the length of our source list:
private IEnumerable<int> Sample2(int[] numbers)
{
var answer = new List<int>();
for(int i = 0 ; i < 10 && i < numbers.Length ; i++)
{
answer.Add(numbers[i]);
}
return answer;
}
But with LINQ, we can simplify the code down to a single line:
private IEnumerable<int> Sample2_LINQ(int[] numbers)
{
return numbers.Take(10);
}
The LINQ method Take handles all of the mundane stuff and lets us concentrate solely on what we want to accomplish. It also will stop when it reaches either the end of the list or the number of items you asked for -- whichever comes first.
What's really nice is when you need a combination of the two samples we've shown. Suppose you need the top 10 elements of a list that are divisible by three? There's a couple of ways we could go about this. First, we could take the first 10 items of our previous method call:
return Sample1_LINQ(number).Take(10);
Or, if we wanted the query in a single method without the reliance on the other methods, we're still in good shape with clean, readable code:
return numbers.Where(n => n%3 == 0).Take(10);
Skip to the Good Part
Similar to Take, we can use LINQ to skip over a certain number of elements in our list. How many times have you written code that needs to use the first element as a sort of "parent" and then iterate through the rest of the elements and treat them as a "child"?
private void Sample5(int[] numbers)
{
var parent = numbers[0];
// do something with parent
for(int i = 1 ; i < numbers.Length ; i++)
{
var child = numbers[i];
// do something with child
}
}
The thing I really dislike about this code is the requirement of using integer indexing. We can't use foreach because that loops through the entire sequence and we've already used the first element as our "parent." LINQ's Skip method is perfect for something like this:
private void Sample5_LINQ(int[] numbers)
{
var parent = numbers[0];
// do something with parent
foreach(int child in numbers.Skip(1))
{
// do something with child
}
}
Not only is this code much cleaner, shorter and more readable, it's more robust. In the previous code snippet, I didn't check to see if numbers.Length was greater than 1. If it only contained a single element, the code would crash on the first iteration looking for child elements. In the LINQ example, if the numbers sequence only contains a single element, the Skip(1) call would simply return an empty list and the foreach loop would do nothing.
When used together, Take and Skip provide an easy way to instantly present your data in a paged fashion. Given a list of elements, a page size and a page number, you can get your page of data with a single line of code:
private IEnumerable<int> GetPage(IEnumerable<int> elements,
int pageSize, int pageNumber)
{
return elements.Skip(pageNumber*pageSize).
Take(pageSize);
}
Take and Skip both have an alternative version that allows you to specify a Boolean condition that must be met in order to continue iterating over the sequence. Whereas Take and Skip accept an integer value indicating the number of elements, TakeWhile and SkipWhile take in a function that determines when the enumerating will stop.
The Real First Element
Grabbing the first element in a list is hardly rocket science, but there's still extra code that needs to be written -- and maintained and debugged. For example, let's grab the first child in a collection:
IList<Child> kids = GetAllKids();
var first = kids[0];
But let's not forget there's a possibility that the list may be empty:
IList<Child> kids = GetAllKids();
Child first = null;
if( kids.Count> 0 )
{
first = kids[0];
}
Again, we have to add more code in order to protect ourselves from issues like an empty list. It muddies our code from expressing exactly what we want. LINQ handles this nicely with the "FirstOrDefault" method. It does the same thing the code above does, but without the hassle:
IList<Child> kids = GetAllKids();
Child first = kids.FirstOrDefault();
This does exactly what we want: it finds the first child in the list. If the list is empty, it returns us the default value of the specified type. Because "kids" is a list of reference objects, the return value would be null. For value types like an integer, it would be the default for the value type: 0 for an integer, false for a Boolean and so on.
If you know for certain that the list contains at least one element, there's simply the First method. This grabs the first item in the list, but will throw an exception if the list is empty.
What's nice is that the overloads are provided on both First and FirstOrDefault. You can pass in a Boolean function, which is used to check each element of the list. The first element that causes the function to return true will be returned from the method. For example, let's find the first teenager in the list:
var teenager = kids.FirstOrDefault(k => k.Age > 12);
Without LINQ, we'd be writing a foreach, checking each item and maintaining a variable to hold our answer. This time, we let the framework handle all of that. Like First and FirstOrDefault, LINQ also gives us Last and LastOrDefault methods. They're an easy way to grab the very last element in a sequence. And just like FirstOrDefault handles an empty list and returns the default type if the list is empty, LastOrDefault will do the same thing.
Group Data Together
Aggregating data is used often in reporting scenarios. It's so common, LINQ has the ability to traverse a list and build up "groups" of items based on whatever criteria you need.
Let's go back to our example of a list of kids. Suppose we need to group the kids based on their age. All we need to do is let LINQ know what we want each child element grouped by:
IList<Child> kids = GetAllKids();
var result = kids.GroupBy(c => c.Age);
So what's the result? LINQ will give us another sequence, of which each element is a "grouping" object that contains the key -- in our case, age -- along with all of the Child elements that matched the age key. This grouping object implements the interface IGrouping<TKey, TSource>. In our case, because we grouped the Child objects by age -- an integer -- each grouping object would be of type IGrouping<int, Child>. It's easier to see it with some code and diagrams. (You can find code illustrating this sequence here).
We can enumerate through each grouping object and display the grouping's key as well as enumerate each element in the grouping:
foreach(IGrouping<int, Child> group in result)
{
Console.WriteLine("Age: {0}", group.Key);
foreach(var child in group)
{
Console.WriteLine("\t{0}", child.Name);
}
}
In some cases, you may not want the actual list element as the object to be grouped. Suppose we simply want to know the number of times each letter appears as the first letter of a child's name? This is easy because there's a GroupBy overload that allows you to pick the key, as well as what element is placed into the grouping:
IList<Child> kids = GetAllKids();
var result = kids.GroupBy(
c => c.Name.FirstOrDefault(),
e => e.Name.FirstOrDefault());
Notice we're using LINQ's FirstOrDefault method to grab the first letter. This way we don't need to worry about empty names. The default for char is a null character ('\0'), so we'll also get a count of how many children have a blank name, if they exist. Let's display the results of our new grouping:
foreach (IGrouping<char, char> group in result)
{
Console.WriteLine(
"Letter: {0} appears {1} time(s) as the first letter.",
group.Key, group.Count());
}
Also, in these grouping examples, I've been defining my iterator variable as the specific type, such as IGrouping<char, char>, to help you understand the type of data a particular GroupBy will return. In practice, I'll usually define my iterator variable as simply "var" and let the compiler infer the details.
Build Dictionaries with Ease
LINQ provides some nice utility methods for building dictionaries. As with other code examples, LINQ removes the extra plumbing and allows you to concentrate on the intent of your code. Let's take our list of children and create a dictionary keyed on their Social Security number (SSN) the way we're used to doing it:
IList<Child> kids = GetAllKids();
IDictionary<string, Child> dictionary = new
Dictionary<string, Child>();
foreach(Child c in kids)
{
dictionary.Add(c.SSN, c);
}
Now let's have LINQ do most of the work for us. In the previous code sample, we're looping through a sequence and grabbing a specific property of each element as the key to the dictionary. LINQ lets us simplify the creation of a dictionary by simply telling it how to select the key. It'll do the rest of the work:
var dictionary = kids.ToDictionary(k => k.SSN);
There's also an overload that allows you to specify functions for selecting both the key and the element itself. As an example, let's assume each child element contains a "dad" property, which exposes the father's information -- including the father's SSN. We could easily build up a dictionary of all of the dads:
var dads = kids.ToDictionary(k => k.Dad.SSN, e => e.Dad);
So what about building Lists? Many of our first few LINQ examples returned an IEnumerable<int>. When integrating with legacy code, we're much more likely to encounter code that wants a List<T> or an IList<T>. In cases like that you can convert any IEnumerable<T> into a List<T> by using the ToList() method -- which is also a great way to gradually introduce LINQ into existing code.
Order in the Code
Ordering data is easy if your data store is a database -- just provide an "ORDER BY" clause in your SQL query and you're all set. With a list of objects in memory, it's not always as easy. If you've got a complex object, you could write your own IComparer<T> to handle the sorting. If you have a lot of sorting options, you may have to deal with a number of IComparer<T> classes.
LINQ provides an OrderBy method. Like the ToDictionary method, it accepts a function, which tells it what piece of data to order the list on. If we look back at our list of kids, ordering them by their age is easy:
var byAge = kids.OrderBy(k => k.Age);
The LINQ approach is very clean and easy to understand.
We don't need to loop through the collection ourselves and perform a sort on the age property. Suppose requirements change and we now want the kids ordered based on the age of their dad? We can do that:
var dadsByAge = kids.OrderBy(k => k.Dad.Age);
What if you have more than one field to search on? LINQ provides a "ThenBy" method, which can be chained onto the OrderBy results to provide a second level of sorting. You can add additional ThenBy calls if you need to sort by more than two fields. So, for example, sorting our list by SSN and then by the child's age would be accomplished with:
var sorted = kids.OrderBy(k => k.SSN).ThenBy(k => k.Age);
Finally, if you need to sort in descending order, LINQ provides OrderByDescending and ThenByDescending methods.
Much of this article has shown how LINQ can help us traverse, filter and project a sequence of items. The benefit you get is cleaner code, which is easier to maintain and read. The intention of the code is clear from the beginning. This article illustrates just the basics of what you can do with LINQ. I urge you to check out the documentation for IEnumerable<T> to see everything you can do with LINQ.