Ask Kathleen
Capture Variables with Closures
Learn to pass anonymous types outside the method in which they're created; take advantage of closures when working with lambda expressions; drilldown on overloading; initialize static fields properly; and see where KeyedCollections improve performance.
Technologies mentioned in this article include VB.NET and C#.
Q I've heard the term "closure" mentioned in relation to lambda expressions. What is a closure?
A Closures are fragments of code--special delegates --that capture variables in the scope where the closure is defined. This means a closure can contain a local variable and access that variable even where the variable is not otherwise in scope. In .NET, code fragments are delegates, and the in-line delegates--such as lambda expressions and C#'s anonymous delegates--are closures.
A few examples can help you understand the difference. This snippet is a little contrived, but the lambda expression is passed out of the scope of the current methods and takes the variable i with it:
Private Sub Test1()
Dim i = 1
Dim lambda1 = _
Function(val As Int32) val + i
i = UseAsDelegate( _
10, lambda1)
Console.WriteLine(i)
End Sub
Private Function UseAsDelegate( _
ByVal val As Int32, ByVal del _
As Func(Of Int32, Int32)) As Int32
Return del(val)
End Function
Lambda expressions become either delegates or expression trees depending on the type of the variable they're assigned to. Lambdas are inferred to be delegates by default in VB (C# doesn't allow lambda inference), so the variable lambda1 is a delegate. Func(Of Int32, Int32) is one of the new delegate types created in .NET 3.5 to make it more convenient to use lambdas. The last type parameter defines the return value, and all other type parameters define the types of the delegate's parameters.
It's important to recognize that the variable is contained in the lambda; the value of two is not copied into the closure delegate. You can see this if you change the variable value:
Private Sub Test2()
Dim i = 2
Dim lambda1 = Function(val As Int32) val + i
i = UseAsDelegate(10, lambda1)
Console.WriteLine(i)
i = UseAsDelegate(100, lambda1)
Console.WriteLine(i)
End Sub
After the first call, the output value is 12. If the value two was passed, the second call would result in 102. However, the variable itself is captured in the closure, so the output after the second call to the lambda is 112.
One way you can take advantage of closures is to have delegates interact in ways that aren't supported by the method using the delegate.
For example, you can write a sort that cancels if a particular condition fails (see Listing 1) to give a new feature to the List.Sort method.
Q Can you pass anonymous types outside the method in which they're created?
A The scope of an anonymous type is the method in which it's created. Code outside the method doesn't understand the structure of the anonymous type. However, like all types in .NET, instances of anonymous types can be cast implicitly to System.Object, so you can pass them between methods as objects.
This wouldn't be useful if you couldn't cast them back to a meaningful type. Recreating a meaningful type takes a few generic tricks, and the code where you recreate the type has to know the target structure. This sets up a dependency that can be hard to maintain, so I'd restrict using these conversion tricks to code that's already tightly coupled. Most of the time, it's better to create simple named types that can be passed around the system with normal semantics and without performing expensive casts.
Anonymous types are reused throughout the assembly based on the signature made up of their property names and types. Thus, if you have a type with a string Name property and a date property named Date used in two different places in the assembly, the same anonymous type is used. You can take advantage of this to recreate a known anonymous type within the same assembly (see Listing 2).
The DoSomething method receives an array of System.Object that contains instances of the anonymous type. The DoSomething method creates a dummy instance with matching FirstName and LastName properties to define the type within the Convert method. The Convert method uses type parameter inference to establish the type from the dummy parameter which matches the expected anonymous type signature. The type inference system determines the type you're working with, and after the instance is cast to this type it is fully functional as the original anonymous type. Checking for a successful conversion is important because the conversion will fail if the LINQ statement changes and the dummy instance in DoSomething isn't updated.
Q I added a generic method overload to my project to replace an earlier method that took "Object" as a parameter. A different overload of the same method is no longer being called. What's going on here?
A Overloading lets you call the same method with different parameters. .NET resolves to the correct implementation based on the concept of a close match. For example, if you have overloads for Int32, Int64, and System.Object, and you call the method passing an Int16, .NET executes the overload for Int32 because it's the closest match.
For classes, the concept of closest match is the nearest base class. For example, if C derives from B and B derives from A, and a method has overloads for A and B, passing C causes the overload for B to be called because it's the closer ancestor. This has been basic overloads behavior since .NET 1.0.
If you have overloads for A, B, and System.Object, and you pass C, .NET uses the overload for B. Creating a generic overload throws an extra spin into the mix whether or not you remove the Object overload:
public static void MyMethod<T>(T param)
{
Console.WriteLine("In generic overload");
}
.NET can construct a perfect match and calls the generic overload rather than the overload for B. .NET uses the generic overload unless a perfect match overload exists which changed the method called when you passed B to the method.
Q I'm getting an FxCop error: "Initialize reference type static fields inline." Can you explain what the problem is and how to fix it? I'm using C#.
A The same rules for initializing static fields apply to both Visual Basic and C#, so this answer covers both languages. The rule you encountered removes a performance hit due to wasteful static constructor behavior. You can almost always avoid using a static constructor.
There are five different ways to supply initial values to static fields (see Listing 3). The const approach shown for the First and Sixth fields in Listing 3 has the least impact on performance because the literal value is included directly in the assembly as metadata. Assigning a literal as shown in the Second and Sixth fields is slightly slower and has no added benefit. If you're using FxCop 1.35, it will flag the literals and request you use a constant.
You're currently calling a static constructor, similar to the Fourth and Ninth fields because you're setting values that you must calculate at runtime. This is the reason for FxCop's complaint. There are subtle differences between calling a function to set a static field and setting the field in a static constructor. Static constructors are generally expensive because code is added during JIT to ensure the static constructor is called before the class is used in any way.
While generally preferable, in-line declarations such as the Third and Eighth fields have two issues: They might be called when they're not needed, and they might not be called before constructors on the class or static methods are called. In-line declarations can be called when no instance of the class is instantiated and no static method of the class is used. This is a problem if setting the initial field value is expensive. Inline assignments are guaranteed to occur before fields are used, but not guaranteed before instance constructors or static methods are called. This can be a problem in the rare situation where you need to set some other type of global state such as deleting a previously created file. It's also a bit inconsistent with good development practices to set state in a function whose apparent purpose is getting the value of a static field. So in rare situations, static constructors might be appropriate.
In your case, you create a new list, and filling it might be expensive. So it makes sense to avoid that step unless you use the list. Assuming you're relatively disciplined and have a naming strategy to help you recognize backing fields that shouldn't be used directly, the approach used in the Fifth field is probably the best approach. This just-in-time creation of the list gives you precise control over when the list is created. You can either wait until you need the list, or retrieve it early if there is a better time to construct it.
Q I used to have a column in the Exceptions dialog of Visual Studio titled "User Unhandled." But, it's disappeared and now I only have one that says "Thrown." Do you know how I can get the "User Unhandled" column back? I'm using Visual Studio 2005.
A You're using the exception dialog that resides under the Debug menu. This dialog lets you control what happens when certain types of exceptions occur during debugging, and it's also one of the fastest references to the exception hierarchy.
During debugging, the User Unhandled and Thrown columns of this dialog control when the debugger breaks. However, the User Unhandled column doesn't make sense when you turn off the "Just My Code" option so Visual Studio removes it. Turn on "Just My Code" in Tools/Options/Debugging/General, and the column will reappear.
Q I'm trying to figure out whether to use a generic List, generic Dictionary, or implement my own class based on a generic IList. I have a large number of collections that will be used frequently at runtime. Each item in the collections contains a GUID, which I'll be searching on. The majority of these collections have only one or two items. Perhaps 10 percent to 15 percent of them contain more than 100 to 500 items. I'm worried about performance, but I'm also worried that I won't be able to change from List to Dictionary later if I make the wrong decision because a lot of code will call these methods. Do you have any suggestions?
A I'd suggest creating your own collections derived from .NET Framework classes. System.ObjectModel.KeyedCollection is probably your best base class. It provides both sequential and lookup access and holds list items with each item containing its own key. You derive your class from KeyedCollection and retrieve the key by overriding the GetKeyForItem method:
public class CustomerCollection :
KeyedCollection<Guid, Customer>
{
protected override Guid GetKeyForItem(
Customer item)
{
return item.CustomerId;
}
}
For your scenario, a major advantage of the KeyedCollection is the ability to control whether it creates an internal dictionary. You can set a threshold value, probably around 10 to 20, and collections with fewer items will not use a dictionary. Instead, the collection iterates to find specific keys. For small collections, this is faster than maintaining the dictionary.
There are a couple of caveats to keep in mind. Internally, KeyedCollection is a collection and a dictionary. This means that items exist in two places. For reference types, this means the reference is duplicated. The reference is an integer, which is small and of no concern from a memory perspective, unless you're using enormous collections. However, value types can use much more space, so this isn't a good approach for very large value types.
KeyedCollection has one ugly wart: It inherits Con-tains<itemType> from Collection, but it also overloads it with Contains<keyType> for keyed access. That's problematic because your fellow programmers will probably expect a Con-tainsKey method that more closely parallels the Dictionary class. If the key and item are the same type, which can occur with string items, the keyed access wins unless you cast to Collection. Whew! And on top of that, help documents the Contains method incorrectly.
You can't remove a method defined in a base class, but you can hide it partially. All you need to do is add a ContainsKey method and mark the overloaded Contains method as obsolete:
public bool ContainsKey(Guid id)
{
return base.Contains(id);
}
[Obsolete("Use ContainsKey instead")]
[EditorBrowsable(EditorBrowsableState.Never)]
public new bool Contains(Guid id)
{
throw new NotImplementedException(
"Use ContainsKey instead");
}
The new modifier is similar to the Overloads modifier in VB and generates the hidebysig modifier in IL. It indicates that the Contains method in your class replaces the base class method that searched through the key. The EditorBrowsable attribute should hide the method from IntelliSense, but C# has more trouble picking up this attribute than VB, so you're likely to still see the method. The compiler will generate a warning if the programmer inadvertently uses the Contains method because of the Obsolete attribute. You don't currently have any code written against your class, so you can also throw an exception. This approach discourages the use of the base class method, although a determined programmer can still cast to the base class and access its method.
Thanks to Alex James for reporting on the anonymous type trick he learned from Wes Dyer , which I was happy to turn into VB code and explain a bit further.
About the Author
Kathleen is a consultant, author, trainer and speaker. She’s been a Microsoft MVP for 10 years and is an active member of the INETA Speaker’s Bureau where she receives high marks for her talks. She wrote "Code Generation in Microsoft .NET" (Apress) and often speaks at industry conferences and local user groups around the U.S. Kathleen is the founder and principal of GenDotNet and continues to research code generation and metadata as well as leveraging new technologies springing forth in .NET 3.5. Her passion is helping programmers be smarter in how they develop and consume the range of new technologies, but at the end of the day, she’s a coder writing applications just like you. Reach her at [email protected].