The Data Science Lab

AdaBoost Regression Using C#

Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the AdaBoost.R2 algorithm for regression problems (where the goal is to predict a single numeric value). The implementation follows the original source research paper closely, so you can use it as a guide for customization for specific scenarios.

The original AdaBoost ("adaptive boosting") algorithm is a binary classification technique (predicting a variable that has two possible values, such as the sex of a person). The AdaBoost.R2 ("AdaBoost for regression, version 2") modifies the AdaBoost classification technique for use with regression problems (predicting a single numeric value, such as the age of a person).

This article presents a complete demo of AdaBoost.R2 using the C# language. AdaBoost.R2 regression sequentially creates an ensemble (collection) of simple decision trees, where each tree is a bit better at prediction than the previous tree.

For a given input x, the predicted y value is the weighted median of the predictions of the collection of decision trees. Other tree ensemble regression techniques include random forest regression (and a variant called bagging regression), and gradient boosting regression. Compared to other regression techniques, AdaBoost regression is not used as often (at least based on my experience), but for datasets where AdaBoost regression works well, it can often have very high predictive accuracy.

Figure 1: AdaBoost.R2 Regression in Action
[Click on image for larger view.] Figure 1: AdaBoost.R2 Regression in Action

A good way to see where this article is headed is to take a look at the screenshot in Figure 1. The demo program begins by loading synthetic training and test data into memory. The data looks like:

-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
. . .

The first five values on each line are the x predictors. The last value on each line is the target y variable to predict. The demo creates an AdaBoost regression, evaluates the model accuracy on the training and test data, and then uses the model to predict the target y value for x = [-0.1660, 0.4406, -0.9998, -0.3953, -0.7065]. The first part of the demo output shows how an AdaBoost regression model is created:

Loading synthetic train (200) and test (40) data
Setting maxLearners = 100
Setting tree maxDepth = 5
Setting tree minSamples = 2
Training model
Done
Created 100 learners

The second part of the demo output shows the model evaluation, and using the model to make a prediction:

Accuracy train (within 0.15): 0.9100
Accuracy test (within 0.15): 0.7000
Predicting for x = 
 -0.1660  0.4406 -0.9998 -0.3953 -0.7065
Predicted y = 0.4892

The three parameters that control the behavior of the demo AdaBoost model are maxLearners, maxDepth, and minSamples. The maxLearners parameter specifies how many decision trees should be created. The variable name maxLearners is used instead of numLearners, because the AdaBoost regression algorithm may not succeed in creating the requested number of decision trees.

Additionally, the name maxLearners is used instead of maxTrees, because AdaBoost regression can use an ensemble of other regression models, such as k-nearest neighbor models or linear regression models. That said, by far the most common learners used in AdaBoost regression are decision trees, so much so, that without an explicit qualification, AdaBoost regression implies decision trees are being used as the learners.

The maxDepth parameter specifies the number of levels of each decision tree. If maxDepth = 1, the decision tree has three nodes: a root node, a left child and a right child. If maxDepth = 2, the tree has seven nodes. In general, if maxDepth = n, the resulting tree has 2^(n+1) - 1 nodes. The decision trees used for the demo run shown in Figure 1 have maxDepth = 5 so each of the 100 trees has 2^6 - 1 = 64 - 1 = 63 nodes.

The minSamples parameter specifies the fewest number of associated data items in a tree node necessary to allow the node to be split into a left and right child. The demo trees use minSamples = 2, which is the fewest possible because if a node has only one associated item, it can't be split any further. The values of maxLearners, maxDepth, and minSamples must all be determined by trial and error.

This article assumes you have intermediate or better programming skill but doesn't assume you know anything about AdaBoost regression. The demo is implemented using C# but you should be able to refactor the demo code to another C-family language if you wish. All normal error checking has been removed to keep the main ideas as clear as possible.

The source code for the demo program is too long to be presented in its entirety in this article. The complete code and data are available in the accompanying file download, and they're also available online.

The Demo Data
The demo data is synthetic. It was generated by a 5-10-1 neural network with random weights and bias values. The idea here is that the synthetic data does have an underlying, but complex, structure which can be predicted.

All of the predictor values are between -1 and +1. There are 200 training data items and 40 test items. When using decision trees as the AdaBoost regression learners, it's not necessary to normalize the training data predictor values because no distance between data items is computed. However, it's not a bad idea to normalize the predictors just in case you want to send the data to other regression algorithms that require normalization (for example, k-nearest neighbor regression).

AdaBoost regression is most often used with data that has strictly numeric predictor variables. It is possible to use AdaBoost regression with mixed categorical and numeric data, by using ordinal encoding on the categorical data. In theory, ordinal encoding shouldn't work well. For example, if you have a predictor variable color with possible encoded values red = 0, blue = 1, green = 2, red will always be less-than-or-equal to any other color value in the decision tree construction process. However, in practice, ordinal encoding for AdaBoost regression often works well.

Understanding AdaBoost Regression
There is very little solid information about the AdaBoost.R2 regression algorithm available online. Many of the resources I discovered had significant errors, and so I used the original research paper as my implementation guide. See "Improving Regressors Using Boosting Techniques" (1997), by H. Drucker. That paper is available online in PDF format in several locations. The relevant page with the AdaBoost.R2 algorithm is shown in Figure 2.

Figure 2: The Original Paper for the AdaBoost.R2 Regression Algorithm
[Click on image for larger view.] Figure 2: The Original Paper for the AdaBoost.R2 Regression Algorithm

By the way, the author never called his algorithm "AdaBoost.R2." He wrote only that his algorithm is "a modification of Adaboost.R." I wasn't able to track down who first coined the term AdaBoost.R2. The earlier AdaBoost.R algorithm was a suggestion in the original 1995 AdaBoost binary classification research paper by Y. Freund and R. Schapire, "A desicion-theoretic (sic) generalization of on-line learning and an application to boosting."

The key idea of AdaBoost regression is to construct a sequence of decision trees where each decision tree is a bit better than the previous decision tree. Briefly, this is achieved by creating a new set of training data for each new decision tree, where the new set of training data has difficult-to-predict data items.

A weight value is assigned to each training data item, where the weight values sum to 1. Initially all training item weights are the same. For the demo data, because there are 200 items, each initial data item weight is 1/200 = 0.05.

Each training data item is fed to the current decision tree and an average loss/error, Lbar in Figure 2, for the current tree is computed. The average loss for the current tree is used to compute a beta value for the tree. A small value for beta means high confidence in the current tree, and vice versa.

The weights associated with each training data item are updated using the beta value for the current tree and the loss/error for each data item. At this point, a new set of training data items is probabilistically created from the source data items, where data items with high error have a greater probability of being selected than data items with small error.

The new training data items are fed to the next decision tree. Because the new training dataset has high-error items, the new tree will concentrate on learning to predict those difficult-to-predict items. Clever!

Each new training dataset is constructed probabilistically, and so there is a chance that a newly created decision tree is not better than the previous decision tree. In this case, the newly created decision tree is not added to the collection of trees. This is why the actual number of trees created may be less than the maxLearners parameter.

Selecting the Rows for a New Dataset
Each decision tree in the ensemble is trained on a different dataset. The first decision tree is trained on the N items of the source training data. Each data item is assigned a loss/error value which is used to compute a weight for each item. A large weight value means the data has often been incorrectly predicted by previous decision trees.

A new dataset is constructed by randomly selecting N data items with replacement from the source training data, where the probability that a row/item is selected is proportional to its weight. The demo program implements a helper SelectIdxs() function that uses a mini-algorithm called roulette wheel selection to pick the N rows. Because selection is made with replacement, the new training dataset will likely have duplicates, which is what is needed to train the next decision tree on difficult-to-train data items.

Making Predictions Using Weighted Median
To make a prediction, an input item x is fed to each of the decision trees. Each tree computes a predicted value. You could use the average of the predictions as the final predicted value, but because different trees have different beta (confidence) values, the approach used by AdaBoost.R2 regression is to compute the weighted median of the predictions, weighted by the log of the inverse of beta values. This gives greater emphasis to the predictions made by decision trees with high confidence.

Computing the weighted median of a set of values and weights is a bit tricky. Suppose you have a set of five values (20, 12, 10, 16, 13) and their associated weights (0.35, 0.05, 0.15, 0.25, 0.20) that sum to 1. You can sort the values and the weights in parallel, based on the values, giving:

values:    10,   12,   13,   16,   20
weights: 0.15, 0.05, 0.20, 0.25, 0.35

The standard unweighted median is the middle value, 13. To find the weighted median, you advance through the sorted weights, accumulating their sum until you equal or exceed 0.5 -- 0.15, 0.20, 0.40, 0.65 -- and return the corresponding value, which is 16.

It's not necessary to explicitly sort the values and the weights. It's more efficient to just find the sort order of the weights -- [1], [2], [4], [3], [0] -- and iterate through the weights and values using those indices. Some programming languages, such as Python, have a built-in argsort() function but C# does not, and so the demo program implements an ArgSort() helper function.

The version of weighted median used in the demo program does not interpolate between the two middle values when the number of values is even. This uses the weighted median algorithm design shown in step 8 in Figure 2.

The Demo Program
I used Visual Studio 2022 (Community Free Edition) for the demo program. I created a new C# console application and checked the "Place solution and project in the same directory" option. I specified .NET version 8.0. I named the project AdaBoostTreeRegression. I checked the "Do not use top-level statements" option to avoid the program entry point shortcut syntax.

The demo has no significant .NET dependencies and any relatively recent version of Visual Studio with .NET (Core) or the older .NET Framework will work fine. You can also use the Visual Studio Code program if you like.

After the template code loaded into the editor, I right-clicked on file Program.cs in the Solution Explorer window and renamed the file to the slightly more descriptive AdaBoostRegressionProgram.cs. I allowed Visual Studio to automatically rename class Program.

The overall program structure is presented in Listing 1. All the control logic is in the Main() method in the Program class. The Program class also holds helper functions to load data from file into memory and display data. All of the AdaBoost regression functionality is in an AdaBoostRegressor class. All of the decision tree regression functionality is in a separate DecisionTreeRegressor class. The AdaBoostRegressor class exposes a constructor and three methods: Train(), Predict(), Accuracy().

Listing 1: Overall Program Structure

using System;
using System.IO;
using System.Collections.Generic;

namespace AdaBoostTreeRegression
{
  internal class AdaBoostRegressionProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("Begin AdaBoost with tree " +
        "regression from scratch C# ");

      // 1. load data from file into memory
      // 2. create and train AdaBoost regression model
      // 3. evaluate model
      // 4. use model to make a prediction

      Console.WriteLine("End demo ");
      Console.ReadLine();
    }

    // helper functions for Main

    static double[][] MatLoad(string fn, int[] usecols,
      char sep, string comment) { . . }

    static double[] MatToVec(double[][] mat) { . . }

    public static void VecShow(double[] vec, int dec,
      int wid) { . . }
  } // class Program

  // ========================================================

  public class AdaBoostRegressor
  {
    public int maxLearners;
    public int maxDepth;
    public int minSamples;
    private Random rnd;
    public List<DecisionTreeRegressor> learners;
    public List<double> betas;

    public AdaBoostRegressor(int maxLearners,
      int maxDepth, int minSamples, int seed) { . . }

    public void Train(double[][] trainX,
      double[] trainY) { . . }

    public double Predict(double[] x) { . . }

    public double Accuracy(double[][] dataX,
      double[] dataY, double pctClose)


    private static double WeightedMedian(double[] values,
      double[] weights) { . . }

    private static int[] ArgSort(double[] values) { . . }

    private static double[][] MatMake(int nRows,
      int nCols) { . . }

    private int SelectIdx(double[] probs) { . . }

    private int[] SelectIdxs(double[] probs, int n) { . . }

    private static double[][] ExtractRows(double[][] mat,
      int[] idxs) { . . }

    private static double[] ExtractVals(double[] vec,
      int[] idxs) { . . }
  }

  // ========================================================

  public class DecisionTreeRegressor
  {
    public DecisionTreeRegressor(int maxDepth,
      int minSamples) { . . }
    public void Train(double[][] dataX,
      double[] dataY) { . . }
    public double Predict(double[] x) { . . }
    public double Accuracy(double[][] dataX,
      double[] dataY,  double pctClose) { . . }
    public void Show() { . . }
    public void ShowNode(int nodeID) { . . }
    // private helper methods here
  }

  // ========================================================
} // ns

The demo starts by loading the 200-item training data into memory:

string trainFile =
  "..\\..\\..\\Data\\synthetic_train_200.txt";
int[] colsX = new int[] { 0, 1, 2, 3, 4 };
double[][] trainX =
  MatLoad(trainFile, colsX, ',', "#");
double[] trainY =
  MatToVec(MatLoad(trainFile,
  new int { 5 }, ',', "#"));

The training X data is stored into an array-of-arrays style matrix of type double. The data is assumed to be in a directory named Data, which is located in the project root directory. The arguments to the MatLoad() function mean load columns 0, 1, 2, 3, 4 where the data is comma-delimited, and lines beginning with "#" are comments to be ignored. The training y data in column [5] is loaded into a matrix and then converted to a one-dimensional vector using the MatToVec() helper function.

The first three training items are displayed like so:

Console.WriteLine("First three train X: ");
for (int i = 0; i < 3; ++i)
  VecShow(trainX[i], 4, 8);
Console.WriteLine("First three train y: ");
for (int i = 0; i < 3; ++i)
  Console.WriteLine(trainY[i].ToString("F4").PadLeft(8));

In a non-demo scenario, you might want to display all the training data to make sure it was correctly loaded into memory. The 40-item test data is loaded into memory using the same pattern that was used to load the training data:

string testFile =
  "..\\..\\..\\Data\\synthetic_test_40.txt";
double[][] testX =
  MatLoad(testFile, colsX, ',', "#");
double[] testY =
  MatToVec(MatLoad(testFile,
  new int[] { 5 }, ',', "#"));

The AdaBoost regression model is trained using these statements:

int maxLearners = 100;
int maxDepth = 6; int minSamples = 2; int seed = 0;
AdaBoostRegressor model = new AdaBoostRegressor(maxLearners,
  maxDepth, minSamples, seed);
model.Train(trainX, trainY);
Console.WriteLine("Created " + model.learners.Count() +
 " learners ");

Because each new training dataset is created probabilistically, the AdaBoostRegressor class has a random number generator and its seed value is set so that results are reproducible. After the model has been trained, it's a good idea to check how many decision trees were actually created. If that count is significantly less than the requested maxLearners, that means the decision trees were not improving and so you likely need to increase the tree maxDepth value or decrease the minSamples value.

Next, the demo evaluates model accuracy:

double accTrain = model.Accuracy(trainX, trainY, 0.15);
Console.WriteLine("Accuracy train (within 0.15) = " +
  accTrain.ToString("F4"));
double accTest = model.Accuracy(testX, testY, 0.15);
Console.WriteLine("Accuracy test (within 0.15) = " +
  accTest.ToString("F4"));

The Accuracy() method scores a prediction as correct if the predicted y value is within 15% of the true target y value. There are several other ways to evaluate a trained regression model, including root mean squared error, coefficient of determination, and so on. Using the Accuracy() method as a template, other evaluation metrics are easy to implement.

The demo concludes by using the trained decision tree to make a prediction:

double[] x = trainX[0];
Console.WriteLine("Predicting for x = ");
VecShow(x, 4, 8);
double yPred = model.Predict(x);
Console.WriteLine("Predicted y = " + yPred.ToString("F4"));

The x input is the first training data item. The predicted y value is 0.4892 which is quite close to the actual y value of 0.4840.

Wrapping Up
The AdaBoost.R2 regression algorithm has not received as much research attention as other tree-based regression ensemble techniques such as random forest regression, and gradient boosting regression. That said, there are several interesting ideas to explore in AdaBoost regression. Instead of using a weighted median to make a prediction, you could use a weighted average. Instead of using the linear loss/error function implemented in the demo, the original paper shown describes two other possibilities in step 4 in Figure 2.

The original AdaBoost.R2 paper mentions that the algorithm can be used with any regression learner, not just decision tree regressors. To the best of my knowledge, there are no solid research results that explore using k-nearest neighbor regressors, neural network regressors, kernel ridge regression regressors, or any other common regression techniques. It's quite possible that an alternative to decision tree regressor learners might produce a better AdaBoost regression model.

comments powered by Disqus

Featured

  • Building Blazor Applications

    A trio of Blazor experts will conduct a full-day workshop for devs to learn everything about the tech a a March developer conference in Las Vegas keynoted by Microsoft execs and featuring many Microsoft devs.

  • Gradient Boosting Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the gradient boosting regression technique, where the goal is to predict a single numeric value. Compared to existing library implementations of gradient boosting regression, a from-scratch implementation allows much easier customization and integration with other .NET systems.

  • Microsoft Execs to Tackle AI and Cloud in Dev Conference Keynotes

    AI unsurprisingly is all over keynotes that Microsoft execs will helm to kick off the Visual Studio Live! developer conference in Las Vegas, March 10-14, which the company described as "a must-attend event."

  • Copilot Agentic AI Dev Environment Opens Up to All

    Microsoft removed waitlist restrictions for some of its most advanced GenAI tech, Copilot Workspace, recently made available as a technical preview.

  • Microsoft's New Cloudy Java: The Modern Web App Pattern and 'Strangler Fig'

    Having introduced its Modern Web Application (MWA) pattern for Java developers late last year, Microsoft is now promoting the cloud-centric concept further by highlighting the benefits of the "Strangler Fig" approach.

Subscribe on YouTube