C# Corner

Treat Code as Data

Take advantage of new features in C# 3.0 that let you treat code as data -- and save time over more traditional, imperative approaches to programming.

Technology Toolbox: C#

LINQ and C# 3.0 will force you to learn some new idioms in your everyday programming tasks. The idiom that has the most far-reaching consequences is learning to treat code as data. Every time you build a query expression, you're treating code as data. You're passing around bits of code or logic as parameters to a method. The methods in the LINQ libraries don't return data, but delegates that can create the data when you need them. This might feel strange, but it's not as far-fetched as it sounds. From the outside, it shouldn't matter whether a data value is cached or is computed from first principles. For example, this bit of code shouldn't seem scary:

var num = Math.Sin(Math.PI / 4.0);

Do you know whether Math.Sin computes the sine of the angle when you call it? In many libraries, numeric methods like these are implemented using a large lookup table. The method simply returns the value in the lookup table, or makes a linear interpolation of the two nearest values if the request angle isn't in the lookup table.

From your perspective as the one initiating a call to this method, it doesn't matter. The contract of the method is to return a value corresponding to the input parameter. How it happens isn't important.

There's one key point to consider here: I said that how the calculation happens isn't important. That's true -- unless the calculation depends on some side effect. Sine doesn't depend on any side effects, so it works no matter what. Other methods aren't pure functions. For example, this method depends on information beyond the current parameters:

public static decimal CurrentTemperature(int zipCode)

Calling this method at different times with the same input gives different answers. Temperature varies over the course of a day. Substituting the answer (a number that won't change) for a function (some way to find the current answer) doesn't work.

There are also quite a few gray areas, where the answer to whether or not you can substitute a function for data or vice versa turns out to be: "It depends."

Methods as Parameters Are familiar
You've worked with methods as parameters before. The List.RemoveAll() method uses a predicate to determine what items to remove from a list. This predicate is a pure function; it depends only on its input:

numbers.RemoveAll((n) => n > 20);

You can also use the ForEach method to print a list of numbers:

numbers.ForEach((n) => Console.WriteLine(n));

However, this bit of code is much more complicated, and it has dependencies related to how the internal algorithm is implemented. For example, this code removes all numbers from a list of integers where the number is greater than its index in the list:

numbers.RemoveAll((n) => n >
   numbers.IndexOf(n));

This isn't a pure function because the output depends on something other than the input. Namely, it depends on the current program's state. Does RemoveAll() remove each element as it's processed? That would change the current index of the items. Or, does it perform all the tests and then perform a bulk remove? In which order does it examine the list? First to last? Or last to first? The results of this code will depend on the answers to these questions. (For the record, RemoveAll performs all of the tests, and then removes all of the items. Knowing that doesn't make this code any more excusable, however.)

There are quite a few new techniques and concepts that you use when you begin to think of your code as data. You'll be using lambda expressions, deferred execution, closures, higher order functions, and function composition. And, unlike switching to a pure functional language, you'll likely be mixing your current object-oriented style of programming with this new functional approach, where functions are data. Yes, it's a steep learning curve, but the ends are worth the effort.

It's possible to implement every one of the techniques just mentioned in C# 2.0, but you can do so much more easily in C# 3.0 because the syntax is so much cleaner, so I'll show you how to implement these techniques using C# 3.0's syntax.

A lambda expression is nothing more than a simplified way to express a method. (In the formal definition, lambda expressions shouldn't have any side effects, but C# doesn't enforce this rule.) Consider this statement from earlier in the article:

numbers.ForEach((n) => Console.WriteLine(n));

This is nothing more than a concise way of saying:

numbers.ForEach(delegate(int n )
{
	Console.WriteLine(n);
});

Using the lambda syntax, the compiler infers the type of the parameter (an integer) and the type of the return (void in this case). There's nothing too earth-shattering here, but you must keep the key point in mind: You're passing a function (in the form of a delegate) to the ForEach method. Essentially, the parameter is describing the algorithm. That's a fundamental change in terms of how you think about your code.

Deferred execution changes your thinking about code in some important ways (see Listing 1). Now consider the output from a test that runs the code in Listing 1:

2/19/2008 2:18:14 PM
2/19/2008 2:18:23 PM
2/19/2008 2:18:32 PM
2/19/2008 2:18:41 PM
2/19/2008 2:18:50 PM
Do it again
2/19/2008 2:19:08 PM
2/19/2008 2:19:17 PM
2/19/2008 2:19:26 PM
2/19/2008 2:19:35 PM
2/19/2008 2:19:44 PM

I chose to use the DateTime.Now property to generate the sequence because it gives you a clear picture of when operations happen. You can see that there's a nine-second delay between generating the next sequence item. Also, when you examine the sequence again, you get a totally different sequence of times. The sequence is an algorithm that can create values, but the sequence isn't the values themselves. Again, you're now treating code as data. The sequence of values doesn't exist until you ask for it. Even after you ask for it, the variable sequence still doesn't contain values. If you examine it again, you see a new sequence of values.

Closures Introduce Bound Variables
One more bit of dry computer science, and then we can move onto the more interesting ramifications of treating code as data. Assume you alter Listing 1 to create different behavior (see Listing 2). Now, examine its output:

2/19/2008 2:34:27 PM
2/19/2008 2:34:27 PM
2/19/2008 2:34:27 PM
2/19/2008 2:34:27 PM
2/19/2008 2:34:27 PM
Do it again
2/19/2008 2:35:21 PM
2/19/2008 2:35:21 PM
2/19/2008 2:35:21 PM
2/19/2008 2:35:21 PM
2/19/2008 2:35:21 PM

What changed? Well, the compiler created a closure containing Current as a bound variable. A closure is a way to inject local variables (or parameters) into the body of the lambda expression. Those local variables are referred to as "bound variables." The closure contains both the local variables and lambda expressions. The code is implemented in such a way that changes to the bound variable outside of the lambda expression are reflected inside the lambda expression, and vice versa. In this piece of code, you see that the generator returns a sequence containing five copies of the current time. Later, you modify the value of the bound variable (current), outside the lambda. The next time you enumerate the sequence, you get five copies of the newer version of the variable.

Putting This to Work
All of this is wonderful, but why should you care? Using this kind of algorithm can help you create snippets of code to reuse later. Think about how many times you've written code like this:

var currentCustmers = 
	From c in customerList
	Where c.Orders.Count > 0
	Select c;

Because that variable contains code, not just data, you're actually creating a bit of logic that gives you the current customer list when requested, rather than when the logic executed originally. Instead of copying that code everywhere, you need only access that code when you need it.

Another advantage is that you can work with sequences that are far too large to examine or process on your local machine. You can chain these sequence operators together. When you do that, you're not making new copies of data. You're manipulating the algorithm and the functions, and that new set of functions provides a new answer when you examine it.

You can see this at work by converting an ancient numeric algorithm from imperative to declarative. You can find full source for this conversion in the online code, but I'll highlight the key points in this article's inline code. Hero of Alexandria's algorithm for finding square roots lets you find the square root of any number S, by starting with a guess G (S-1 works fine). The next guess is computed using the formula ((S / G + G) / 2). For example, to find the square root of 2, you start with 1 as the guess. The next guess is 1.5 ((2 / 1 + 1) / 2). The next guess is 1.416 ((2 / 1.5 + 1.5) / 2). After enough iterations, the answer converges on the square root.

You begin with a classic C# imperative implementation of Hero's algorithm (see Listing 3). Next, you make a set of changes and re-implement this algorithm to make it more declarative, or functional (see Listing 4).

It's a twist, so look at this revised listing carefully. Begin with HeroRootFunc, which defines a function that creates a sequence of guesses. It returns the last number in the sequence. The method contains two anonymous methods that define how to generate the next number, and when to stop. This expression defines how to generate the next number:

(g) => ((square / g + g) / 2)

This expression defines when to terminate the sequence:

(c, n) => Math.Abs(c - n) > epsilon)

The query expression returns the entire sequence. The Last() extension method returns the last value in the sequence, which is the best answer.

The GenerateSequence() method generates the sequence while the test method returns true. It creates the sequence by evaluating each of the functions used as arguments. These methods generate the sequence and perform the tests. However, they only generate the sequence when someone asks for the final number.

Look again at the implementation of the functional version of Hero's algorithm. The sequence function generates an infinite sequence. This algorithm would run out of memory if it were imperative. No matter how you do it, you can't fit an infinite number of elements in memory. It would also take an infinite amount of time. And yet, this works, because the functions defined as parameters are evaluated only when requested. Also, the GenerateSequence() method can be used for other purposes.

Not every problem is best solved using functional approaches, but many problems can be solved more succinctly and more clearly by rethinking parameters and return types. Instead of sending all the data, you can send along a function that can generate the data you need. Sometimes that can give you the answer while requiring much less work on your part.

About the Author

Bill Wagner, author of Effective C#, has been a commercial software developer for the past 20 years. He is a Microsoft Regional Director and a Visual C# MVP. His interests include the C# language, the .NET Framework and software design. Reach Bill at [email protected].

comments powered by Disqus

Featured

Subscribe on YouTube