In-Depth

Creating LINQ-Enabled Frameworks

Harness the power of query expressions to develop powerful frameworks.

Many developers use LINQ without fully realizing the power at their fingertips. The features of the language enable queries of in-memory collections or external data sources, and Microsoft has extended those queries to events and other observable sources via Reactive Extensions (Rx). With the exception of query expressions, many LINQ-supporting language features of C# 3.0 are now commonplace.

Query expressions are the Domain-Specific Language (DSL) used for list comprehension in C# and other .NET languages. They provide a SQL-like language to query sequences, but usage varies. Many developers prefer to build method chains describing data transformation from the source sequence to the expected output. Others prefer query expressions. I find it best to use whatever is most readable for the given situation.

The query expression DSL is an elegant way to manipulate sequences, but you might also use it to empower your frameworks and applications.

Breaking the Mold
Nearly every framework has in-memory sequences, and the standard library for their transformation is LINQ to Objects, imported from the System.Linq namespace. Other types, such as sequences with a remote data source, have libraries to handle them. However, query expressions aren't limited to sequences. The code in Listing 1 shows a hypotenuse function created from a query expression.

Listing 1. Hypotenuse function created from a query expression.

Func<int, int> square = x => x * x;

var hypotenuse = from h in square
                 from w in square
                 select Math.Sqrt(h + w);

Assert.AreEqual(5, hypotenuse(3, 4));

Sequences implement the IEnumerable<T> interface in the Microsoft .NET Framework, but the query expression in Listing 1 operates on the square function defined immediately above it. The delegate type, Func<T, T2> (shown in Listing 2), doesn't implement the IEnumerable<T> interface.

This makes it clear that query expressions do not operate from a class or interface. Instead, C# and Visual Basic use duck typing, and extension methods allow any type to have the appropriate methods. The code in Listing 2 shows a SelectMany extension method on Func<T, TResult>.

Listing 2. SelectMany extension method on Func<T, TResult>.

public static Func<T, T2, TResult3> SelectMany<T, T2, TResult, TResult2, TResult3>(
  this Func<T, TResult> left,
  Func<TResult, Func<T2, TResult2>> right,
  Func<TResult, TResult2, TResult3> selector)
{

  return (a, b) =>
  {
    var memo = left(a);
    return selector(memo, right(memo)(b));
  };
}

Both wrapped functions and methods with five generic arguments are rare. Put them together, and the code in <b>Listing 2</b> appears to be a mess of angle brackets. Despite the complex signature, this method simply returns a new function by combining its three parameters.

Ubiquitous Language
Aside from language features and query expressions, LINQ is primarily an API consisting of methods known as query operators. A few of these operators map to language keywords, creating query expressions. Most .NET developers are familiar with these operators, and synonyms can make discoverability difficult in other frameworks.

I'll illustrate this with a non-LINQ example. Suppose your framework uses the prefix Get when retrieving data: GetById, GetByDate and so on. Imagine that a new developer on your team doesn't like this, and uses the prefix Fetch instead. After wasting time looking for either the Get method or the Fetch method, you finally discover that someone else decided to use Retrieve. This lack of convention wastes time and causes frustration. In contrast, frameworks adhering to a convention become intuitive to use.

Choosing a Manner of Execution
There are two primary execution styles when using query operators: immediate and deferred.

Methods using immediate execution return a result, and those using deferred execution compose a structure for future processing. The composed structure varies depending on implementation. LINQ to Objects builds a data-transformation pipeline, LINQ libraries based on IQueryable<T> build expression trees, and Listing 2 builds a composed function.

Which style you choose depends on the structure and goals of the system. Generally, methods returning the same type use deferred execution and those returning a different type use immediate execution. This may not always be the best approach, so use your discretion.

There are various styles of deferred execution to consider. Two commonly used deferred-execution strategies in LINQ to Objects are streaming and non-streaming. You should usually choose to stream during execution. Non-streaming is necessary for operations requiring all source data.

Another particularly useful strategy is a form of caching called memoization. As the method outputs its data, it stores it. If another method call occurs, the data is readily available. This provides better performance at the expense of memory consumption.

Deciding Mutability
Standard implementations of LINQ use immutable constructs. When operating on mutable types, these libraries generate an immutable type when executing a deferred operation. Other developers will expect this to be the case, and you should usually deliver what your fellow developers expect. However, some constructs will not lend themselves well to this restriction. Strive for an immutable implementation, but don't be afraid to break convention if immutability results in poor performance.

Getting Started
The code required to enable the query expression in Listing 1 is unusually complex due to the number of generic types involved in combining three functions. Fortunately, the most common operator is much simpler, as shown in Listing 3.

Listing 3. A simple prime function composed from two functions.

Func<int, int> square = x => x * x;

var simplePrime = from x in square
                  select x + x % 2 + 1;

Assert.AreEqual(11, simplePrime(3));
Assert.AreEqual(17, simplePrime(4));
In a two-line query expression (as shown in Listing 3) the compiler uses the Select operator. It also uses this operator when the query requires a selector and the preceding operator did not provide one. The Select operator implementation for Func<T, TResult> best demonstrates the selector:
public static Func<T, TResult2> Select<T, TResult, TResult2>(
  this Func<T, TResult> func, 
  Func<TResult, TResult2> selector)

{
  return a => selector(func(a));
}
Query expression keywords require a matching method signature. The "from" keyword in Listing 3 specifies that the query operates on square, a variable of type Func<int, int>. I use an extension method to attach the Select method to any Func with two generic parameters. The next parameter in the signature is the selector, a function that transforms one type to another. My implementation returns a new function wrapping the selector around the original.

The square function is the func argument, represented by T -> TResult. The expression following the "select" keyword is the selector argument, represented by TResult -> TResult2. The Select operator combines them to produce a new function: T -> TResult -> TResult2, simplified to T -> TResult2.

You can implement a query operator directly on a type. Extension methods are only required when this isn't possible, as shown in Listing 4. Because the Parser<T> class implements Select, the only argument is the selector. I designed the class for immutability, and I chose deferred execution with function composition. This class represents an object-oriented version of a Select operator, specialized for string parsing.

Listing 4. Parser<T> with the Select method.

public class Parser<T>
{
  private readonly Func<string, T> map;

  public Parser(Func<string, T> map)
  {
    this.map = map;
  }


  public T Parse(string value)
  {
    return map(value);
  }

  public Parser<U> Select<U>(Func<T, U> selector)
  {
    return new Parser<U>(s => selector(map(s)));
  }
}
Querying Multiple Objects
More than one from keyword in a query requires a SelectMany method:

public Parser<V> SelectMany<U, V>(Parser<U> other, Func<T, U, V> selector)
{
  return new Parser<V>(s => selector(this.map(s), other.map(s)));
}

public Parser<V> SelectMany<U, V>(Func<T, Parser<U>> other, Func<T, U, V> selector)
{
  return new Parser<V>(s => 
  {
    var memo = this.map(s);
    return selector(memo, other(memo).map(s));
  });
}
You may find it odd that I overloaded the SelectMany method in this code. The second version accepts a function wrapping the other parser, and I simply unwrap it and call the other method. I do this because I'm more likely to use the first method outside of query expressions, but the second method is required for query expressions. You may need to use the variable from the first from clause in the second clause, and LINQ provides this capability by wrapping the second expression in a unary function, accepting the variable as its argument:

var left = new Parser<int>(s => int.Parse(s.Split('+', '-')[0]));
var right = new Parser<int>(s => int.Parse(s.Split('+', '-')[1]));
var op = new Parser<Func<int, int, int>>(
  s => (a, b) => s.Contains('-') ? a - b : a + b);

var parser = from l in left
             from o in op
             from r in right
             select o(l, r);

Assert.AreEqual(10, parser.Parse("3+7"));
Assert.AreEqual(-2, parser.Parse("2-4"));

This test shows how query expressions combine individual parsers to create a more substantial parser. Because the parsers are immutable, you can recombine them for other scenarios.

Asynchronous Execution
The async and await keywords are new to C# 5.0, and they greatly simplify the problem of asynchronous execution. These keywords operate on methods, lambda expressions or anonymous methods that return Task, Task<T> or void. They're compatible with query operators, making it easy to create asynchronous queries.

Using asynchronous query operators is a matter of returning a Task and marking the method with the async keyword. The method will run synchronously unless an await keyword is present. This code works directly on Task<T>, enabling await:

public static async Task<TResult> SelectMany<T, T2, TResult>(
  this Task<T> left,
  Func<int, Task<T2>> right,
  Func<T, T2, TResult> selector)
{
  return selector(await left, await right(default(int)));
}

If you don't have a Task instance, and you want a long-running process to run asynchronously, use Task.Run:

public async Task<int> GetPageViewsAsync(string url)
{
  return await Task.Run(() => GetPageViews(url));
}

The ability to write asynchronous queries as shown in the following code is a very recent development, and I find it both clean and declarative:

var totalViews = await from a in GetPageViewsAsync(blog)
                       from b in GetPageViewsAsync(videos)
                       select a + b;
A Cautionary Tale
Not every problem is a perfect fit for query operators, and solutions with ill-fitted problems are difficult to decipher. If query operators or expressions make a code base difficult to manage or hard to read, find another solution.

I once had an idea to make LINQ to Specifications. The specification pattern separates business rules from business objects. It holds a conditional testing whether a business object argument satisfies the specification. You can combine specifications together using the And and Or methods, or you can negate the specification using the Not operator. Eric Evans originally described this pattern in Domain-Driven Design: Tackling Complexity in the Heart of Software (Addison-Wesley Professional, 2003). Listing 5 shows a basic implementation of this pattern using combinators.

Listing 5. A combinator implementation of the specification pattern.

public class Spec<T>
{
  private readonly Func<T, bool> predicate;

  public Spec(Func<T, bool> predicate)
  {
    this.predicate = predicate;
  }

  public bool IsSatisfiedBy(T item)
  {
    return predicate(item);
  }

  /* Operators */

  public Spec<T> And(Func<T, bool> predicate)
  {
    return new Spec<T>(t => this.predicate(t) && predicate(t));
  }

  public Spec<T> Or(Func<T, bool> predicate)
  {
    return new Spec<T>(t => this.predicate(t) || predicate(t));
  }

  public Spec<T> Not()
  {
    return new Spec<T>(t => !this.predicate(t));
  }
}

The most obvious LINQ operator to add is Where. In this context, Where implies an extra condition that must be tested. It's a synonym for the And operator, and is shown in Listing 6.

Listing 6. The Where method of Spec<T>.

public Spec<T> Where(Func<T, bool> predicate)
{
  return this.And(predicate);
}

The following code is a test to ensure the where clause of query expressions is active (I prefer to think of these as specification expressions due to the context):

var highEarnerSpec = new Spec<Salesperson>(s => s.AnnualRevenue >= 1000000);

var techHighEarnerSpec = from x in highEarnerSpec
                         where x.Clients.Any(c => c.Category == Category.Technology)
                         select x;

Assert.True(techHighEarnerSpec.IsSatisfiedBy(salesperson));

I thought Select would be easy to create. I assumed Spec<T> would transform to Spec<U>, and the selector would provide the logic for T -> U. Unfortunately, a specification carries a predicate rather than a value. The Select method actually needs to provide the logic to go from U -> T. Inverting the projection makes the signature invalid for query expressions, as the compiler can't infer the type. To compensate, I specified the generic parameter for the method call:

var intSpec = new Spec<int>(i => i > 1);
var strSpec = intSpec.Select<string>(s => int.Parse(s));
Assert.True(strSpec.IsSatisfiedBy("3"));
Assert.False(strSpec.IsSatisfiedBy("0"));

This is a necessary operation, but it's not Select. I decided to apply both Select and SelectMany to the result, making the logical operators expressive in the process, as shown in Listing 7.

Listing 7. Effects of Select and SelectMany applied to a result.

public Spec<T> And(Func<T, bool> predicate)
{
  return from t in this
         from u in predicate
         select t && u;
}

public Spec<T> Or(Func<T, bool> predicate)
{
  return from t in this
         from u in predicate
         select t || u;
}

public Spec<T> Not()
{
  return from t in this 
         select !t;
}

public Spec<T> Select(Func<bool, bool> selector)
{
  return new Spec<T>(t => selector(this.predicate(t)));
}

public Spec<T> SelectMany(Func<bool, Spec<T>> other, Func<bool, bool, bool> selector)
{
  return new Spec<T>(t => 
  {
    var memo = this.predicate(t);
    return selector(memo, other(memo).predicate(t));
  });
}

public Spec<T> SelectMany(Func<bool, Func<T, bool>> other, Func<bool, bool, bool> selector)
{
  return new Spec<T>(t => 
  {
    var memo = this.predicate(t);
    return selector(memo, other(memo)(t));
  });
}

Everything seemed to be coming together, but I actually created a huge problem, as shown in Listing 8.

Listing 8. A confusing variable in a query.

var spec = new Spec<int>(i => true);

var result = from a in spec
             where a != 0
             select !a;

The behavior of the individual query operators makes sense, but together the query operators create a confusing situation. In Listing 8, someone might assume the variable "a" has an operator or type overload. Surprisingly, the variable "a" is actually an Int32 on the second line and a Bool on the third line.

Instead of representing a value or reference, query expression variables are placeholders for arguments passed into query operators. Complex queries create compiler-generated classes to hold values assigned to query expression variables, resulting in the generated class used for the argument and properties on the class substituted for the variable within the expression.

This may seem complicated, but it works well when using the correct types. Applying a type to a generic argument in a query operator signature may result in unexpected behavior. The variable "a" in Listing 8 is an int in the "where" clause because spec is of type Spec<int>, and Listing 6 uses the generic type for the predicate argument. If you apply a specific type for the predicate parameter, variable "a" will then become that type. The Select method in Listing 7 does this by specifying the selector's argument as a Bool.

Scratching the Surface
In this article, I showed how to implement query operators to enable query expressions on any type. I also showed how to use combinators to transform LINQ-empowered classes and how to make asynchronous query expressions.

This article contains a lot of information, but it barely scratches the surface of framework development and LINQ. To see more operators, visit the MSDN Library page. Just remember to replace references to IEnumerable<T> with the class that you're making LINQ-enabled.

About the Author

Chris Eargle is a C# MVP and Telerik evangelist with more than a decade of experience designing and developing enterprise applications. He runs the Columbia, S.C., .NET User Group and the Columbia Enterprise Developers Guild. Eargle is a frequent guest of conferences and community events promoting best practices and new technologies. Follow him on Twitter at twitter.com/KodefuGuru.

comments powered by Disqus

Featured

Subscribe on YouTube