In-Depth

Improved Combinations with the BigInteger Data Type

If your app requires the creation and manipulation of combinations of objects, the BigInteger structure in the Microsoft .NET Framework can offer huge advantages to combination functions.

The BigInteger data type, introduced as part of the System.Numerics namespace in the Microsoft .NET Framework 4, enables big improvements in standard mathematical combination functions, which are fundamentally important in software engineering.

A mathematical combination is a subset of items in which order does not matter. Take five animals, {ant, bat, cow, dog, elk}. One combination element of the five animals when taken two at a time is {ant, bat} and another is {cow, elk}. When taken three at a time, one combination is {ant, cow, dog}. Notice that combinations depend on the total number of items in the parent set (usually denoted by n) and the number of items in the subset (usually denoted by k).

The screenshot in Figure 1 gives you an overview of where I'm headed in this article. The C# demo console application begins by displaying the int.MaxValue, which is more than 2 billion. Mathematical combinations often involve huge numbers that are much larger than int.MaxValue, and this is where the BigInteger type is useful. The next part of the demo output displays the number of ways to select 10 items from a set of 200 items, which is 22,451,004,309,013,280 -- much larger than int.MaxValue.


[Click on image for larger view.]
Figure 1. A C# console application with an int.MaxValue of more than 2 billion uses the BigInteger type to enable larger mathematical combinations.

The demo program then uses a successor method to display all 10 ways to select three items from a group of five items. Here the five items are just the numbers 0 through 4. The first combination element is {0,1,2} and the last element is {2,3,4}. Notice that an element like {0,2,1} is not listed; it's considered the same as {0,1,2} because, by definition, order doesn't matter for a combination.

The demo program indicates that the underlying code has the ability to generate the successor to a given combination element. The code uses what is called lexicographical order. This means that if the parent items are the numbers 0..n-1, then if each combination element is interpreted as a single value, elements are listed from smallest to largest.

The next part of the demo program computes and displays the 123,456,789,012,345th element of 200 items taken 10 at a time. This is an extremely difficult problem, but one that has a beautiful solution using the BigInteger type and something I call the combinadic of a number.

The final part of the demo output shows the seventh combination element of five names and suggests that, although it's useful to be able to apply mathematical combinations to integers, in application programming you often want to work with combinations of strings.

In this article, I'll present powerful C# code that you can integrate into your software apps, or use directly as standalone utilities. It's C# code, but you should have little trouble refactoring the code to Visual Basic .NET.

Before I dive into the code, let me caution you that mathematical combinations are often confused with mathematical permutations. A permutation of n items is all possible rearrangements of the items. For example, if you have three items {0,1,2}, then there are six permutation elements: {0,1,2}, {0,2,1}, {1,0,2}, {1,2,0}, {2,0,1} and {2,1,0}. Permutations are just as important as combinations, and warrant a separate article.

Overall Program Structure
The overall structure of the program that generated the screenshot in Figure 1 is presented in Listing 1. I launched Visual Studio 2010 and created a C# console application program. In the Solution Explorer window, I renamed file Program.cs to the more descriptive CombinatoricsProgram.cs, which automatically renamed class Program. Next, I removed the unneeded template-generated using statements at the top of the program. Then I added a reference to the System.Numerics namespace, which is not accessible to a C# program by default. The BigInteger data type is defined in System.Numerics. I added a using statement that referenced the namespace, so I wouldn't have to fully qualify instances of BigInteger.

The program houses a single public class named Combination. Depending on your application scenario, you may want to create a separate class library to house the Combination class. The Combination class has private members n (the parent size), k (the subset size) and data (to hold combination element items). I don't need properties on any of these members because all combination functionality is exposed through public methods.

The Combination class has a single constructor that accepts parameters n and k. The Successor method accepts no arguments and returns a combination, which is the lexicographical successor to the current combination object. The Choose method is declared static because it isn't tied to a specific instance of a combination object. The Element method accepts a parameter m, which is a lexicographical index, and returns the combination object, which has that index for the current values of n and k. The LargestV method is a helper used by the Element method. The ApplyTo method accepts an array of strings, and returns the subset that corresponds to the combination element.

Choose Method
An important combination function computes the number of possible combination elements for n and k. This function is often called the Choose function, and sometimes abbreviated as C in mathematical literature. Before the arrival of the BigInteger data type, there was no satisfactory way to compute Choose in the .NET Framework. The BigInteger data type can represent arbitrarily large integer values.

A surprising number of hideously bad examples of how to write a Choose function appear on the Web. The mathematical definition of Choose(n,k) is n! / (k!)(n-k)!, where the exclamation point means factorial. The factorial of a number v is v * (v-1) * (v-2) * . . . * 1. For example, 5! = 5 * 4 * 3 * 2 * 1 = 120. So Choose(5,3) could be computed as 5! / (3!)(5-3)! = 120 / (6)(2) = 10. That definition for Choose is extremely inefficient, however. It uses the factorial function, which is tricky to code properly.

Without going into lengthy detail, the computation of Choose can be dramatically simplified. In general Choose(n, k) = Choose(n, n-k). For example, Choose(100,97) = Choose(100,3). Smaller numbers are much better when computing Choose. Also, there's no need to compute full factorials. Instead, it's possible to compute running products. For example, it turns out that Choose(100,3) = (100)(99)(98) / (3)(2)(1) = 161,700. Putting those simplifications together with the BigInteger type leads to the Choose implementation in Listing 2.

Notice the input parameter value check if (n < k) in Choose. In some situations, you may want to treat this condition as an error. However, allowing this condition (and returning a value of 0 when it occurs) is needed by the Element method, which uses Choose as a helper.

The statement ans = (ans * delta + i)) / i accumulates the running product. Because ans is type BigInteger, it will not overflow, so using the C# checked block isn't necessary.

The first few lines in the Main method of the demo program show how to call Choose:

static void Main(string[] args)
{
  Console.WriteLine("\nBegin combinations with C# BigInteger demo\n");
  BigInteger bi = Combination.Choose(200, 10);
  Console.WriteLine("MaxInt is " + int.MaxValue.ToString("#,###"));
  Console.Write("The number of ways to Choose from n=200 items k=10");
  Console.WriteLine(" at a time is \n" + bi.ToString("#,###") + "\n");
  Console.Write("Number of ways to Choose from 5 items 3 at a time is: ");
  Console.WriteLine(Combination.Choose(5, 3) + "\n");
. . .

Notice that Choose is called as a static method of the Combination class.

Combination Constructor and ToString Method
The Combination constructor is very simple:

public Combination(int n, int k)
{
  if (n < 0 || k < 0) // normally n >= k
    throw new Exception("Negative param in Combination ctor");
  this.n = n;
  this.k = k;
  this.data = new int[k];
  for (int i = 0; i < k; ++i)
    this.data[i] = i;
}

After transferring input argument values n and k, the data array is allocated. The values assigned to data represent the first combination element in lexicographical order. Combination c = new Combination(5,3) creates the Combination of {0,1,2}.

The ToString method is also simple:

public override string ToString()
{
  string s = "^ ";
  for (int i = 0; i < this.k; ++i)
    s += this.data[i].ToString() + " ";
  s += "^";
  return s;
}

Here, each combination representation begins and ends with the "^" character to distinguish it from the representation of an ordinary numeric array. I use the carat character because it starts with the letter "c" and it reminds me that what I see is a combination (which also starts with the letter "c"). I use string concatenation because it's a bit cleaner -- though slightly less efficient -- than using the StringBuilder class.

Successor Method
The Successor method returns the next combination element relative to the current element. The code for the Successor method is short but rather tricky, as shown in Listing 3.

One design issue in Successor is deciding how to deal with the Successor to the last combination element. If you refer to Figure 1, notice that the last element for a combination with n=5 and k=3 is {2,3,4}. In general, the last element is the only element that has value n-k in the [0] position of the data array. Here I decide to return null. Alternatives are to return the first combination element or throw an Exception. Another possibility is to define a LastElement method.

Realistically, there are few scenarios in which you'll need to modify the code logic of the Successor method. The while loop uses an index i that starts at the right end of the current combination data array. Index i moves to the left until it reaches the key value in data. The key value is incremented, then the for loop creates an increasing sequence for every value in data to the right of i.

The following lines of code in the Main method demonstrate how to use the Combination constructor, and the ToString and Successor methods:

Console.Write("All combinations of 5 items 3 at a time ");
Console.WriteLine("in lexicographical order are: \n");
Combination c = new Combination(5, 3);
int i = 0;
while (c != null)
{
  Console.WriteLine(i + ": " + c.ToString());
  c = c.Successor();
  ++i;
}

mth Lexicographical Element Method

Consider the problem of generating the mth lexicographical element of a combination. Wait, let me explain! Suppose n=5 and k=3; then, if m=7, the mth element is {1,2,4}, as shown in Figure 1. At first, this seems like an easy problem. Just start at the first element and iterate m times, calling the Successor method. However, as you've seen, the number of combination elements can be huge. I ran some timing tests to estimate how long it would take to determine the 123,456,789,012,345th element of a combination with n=200 and k=10 using a naive iterative approach. On my standard desktop PC, I came up with approximately 370 million seconds, which is roughly 12 years. You probably don't want to wait that long, so a different approach is needed.

Some time ago, I discovered a clever math construct called the combinadic that can be used to directly compute the mth element of a combination. My original code implementation didn't use the BigInteger type because it wasn't available at the time. The code for an improved Element method that computes the mth lexicographical element of a combination is presented in Listing 4.

Using this implementation of Element, I was able to compute the 123,456,789,012,345th element of a combination with n=200 and k=10 in less than 1 second, as shown in Figure 1.

The Main method of the program that called Element and produced the screenshot in Figure 1 is the following:

c = new Combination(200, 10);
BigInteger m = BigInteger.Parse("123456789012345");
Combination e = c.Element(m);
Console.Write("\nThe 123,456,789,012,345-th combination element ");
Console.WriteLine("of C(200,10) is: \n");
Console.WriteLine(e.ToString());

The implementation of the Element method is tricky, but fortunately, you rarely have to modify the code. A detailed explanation of how Element works is available in my MSDN library article.

Notice that Element calls method Choose and also calls a private helper method named LargestV:

private static int LargestV(int a, int b, BigInteger x)
{
  int v = a - 1;
  while (Choose(v, b) > x)
    --v;
  return v;
}

The LargestV helper method also calls Choose. My earlier versions of the Element method couldn't use the improved Choose method because the BigInteger type did not exist at the time.

Applying Combinations to Strings
In many application development scenarios, you'll want to apply combinations to collections of strings. Because a mathematical combination consists of a subset of integers between 0 and n-1, mapping combinations to arrays -- which are also indexed starting at 0 -- is easy.

Take a look at method ApplyTo:

public string[] ApplyTo(string[] a)
{
  if (a.Length != this.n)
    throw new Exception("Bad array size in ApplyTo");

  string[] result = new string[this.k];

  for (int i = 0; i < result.Length; ++i)
    result[i] = a[this.data[i]];

  return result;
}

The method accepts an array of strings and returns a subset array of strings that represents the current combination element. The idea is best explained by seeing how ApplyTo was called in the program that produced the output shown in Figure 1:

string[] people = new string[] { "Abe", "Bob", "Cal", "Don", "Eve" };
Console.WriteLine("\nIf five people are:");
foreach (string s in people)
  Console.Write(s + " ");
Console.WriteLine("\nThe 7th subset of the 5 people when taken 3 at a time is:\n");
c = new Combination(5, 3);
c = c.Element(7);
string[] subset = c.ApplyTo(people);
foreach (string s in subset)
  Console.Write(s + " ");
Console.WriteLine("");
Console.WriteLine("\nEnd demo\n");

The demo code sets up an array of five people: "Abe," "Bob," Cal," Don" and "Eve," indexed as {0,1,2,3,4}, and echoes their values to the screen. Then the demo creates a combination with n=5 and k=3, and uses the Element method to determine the seventh combination element, which is {1,2,4}. The ApplyTo method generates the seventh element by extracting the corresponding string values from the input array, giving "Bob," "Cal" and "Eve."

The code presented here should allow you to efficiently handle most application programming scenarios that require the program¬matic creation and manipulation of combinations of objects. The code implementations of the Choose and Element methods are particularly efficient, in part because of the welcome addition of the BigInteger data type in the System.Numerics namespace. Happy combinatorializing!

comments powered by Disqus

Featured

Subscribe on YouTube