The Data Science Lab

Neural Network Quantile Regression Using C#

Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of neural network quantile regression. The goal of a quantile regression problem is to predict a single numeric value with an assurance such as, "The predicted y value is 0.6789 and there's roughly a 90% chance the prediction will be greater than or equal to the true y value."

The goal of a machine learning regression problem is to predict a single numeric value. Quantile regression is a variation where you are concerned with under-prediction or over-prediction. I'll phrase the rest of this article in terms of scenarios where you mostly care about under-prediction.

Suppose you want to predict how much liquor to order for an important charity fund-raising event. There's a large negative consequence if your prediction is too low (angry people who don't get a drink and don't contribute) but minor consequence if your prediction is too high (you just hold the excess liquor until the next party). In such a scenario, you could make a 90th percentile regression prediction, which means 90% of your predictions will be correct or sightly high (and therefore only 10% of your predictions would be too low). You can also loosely phrase this as, "There's a 90% chance my prediction will meet the demand."

Most machine learning regression techniques over-predict about 50% of the time and under-predict about 50% of the time. Put another way, most regression techniques are implicitly 50th percentile quantile techniques.

Classical quantile regression techniques exist but usually do not work very well. They are variations of linear regression (and therefore not very powerful) and are extremely difficult to train because they require linear programming (and therefore very complex). However, a relatively new form of quantile regression is neural network quantile regression -- a variation of neural network regression. By using a custom loss function that penalizes low predictions more than high predictions, you can coerce the network to make high predictions to a specified quantile value, such as 90th percentile.

This article presents a complete demo of neural network quantile regression using the C# language. To the best of my knowledge, there are no existing code libraries that directly implement neural network quantile regression. Implementing neural quantile regression from scratch using C# allows you to customize your prediction system and easily integrate your prediction system with existing .NET systems.

Figure 1: Neural Network Quantile Regression in Action
[Click on image for larger view.] Figure 1: Neural Network Quantile Regression in Action

A good way to see where this article is headed is to take a look at the screenshot in Figure 1. The demo program begins by loading a set of synthetic data into memory. The data looks like:

-0.1660,  0.4406, -0.9998, -0.3953, -0.7065, -0.8153,  0.7022
-0.2065,  0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.5666
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192, -0.6038,  0.8186
 0.7528,  0.7892, -0.8299, -0.9219, -0.6603,  0.7563,  0.3687
. . .

The first six numbers on each line are predictor values, and the last number is the target value to predict. There are 200 training items and 40 test items. The synthetic data was programmatically generated using a 6-10-1 neural network with random weight and bias values. The point is that the data has an underlying structure that is complex, but can be predicted.

After displaying the first few lines of training data, the demo creates the quantile regressor and prepares training:

Creating 6-100-1 0.90 quantile neural network
Done
Setting train parameters
maxEpochs = 180
lrnRate = 0.020
qRate = 0.010
batSize = 10

For the neural network architecture, the number of input nodes, 6, and the number of output nodes, 1, are determined by the data. The number of hidden nodes, 100 in the demo, must be determined by trial and error.

The four training parameters, maxEpochs, lrnRate, qRate, and batSize, must be determined by trial and error. An epoch is one complete pass through the training data items. The learn rate controls how quickly the neural network weight and bias values change. The quantile rate controls how fast the current effective quantile loss changes, which prevents the training process from stalling out. The batch size specifies how many training items are processed together to estimate the gradients used to update weight and bias values.

The demo displays progress messages during training:

Starting training
epoch:    0  MSE = 0.0293  acc = 0.3200  curr quantile = 0.6100
epoch:   18  MSE = 0.0296  acc = 0.4100  curr quantile = 0.7850
epoch:   36  MSE = 0.0040  acc = 0.7000  curr quantile = 0.8200
epoch:   54  MSE = 0.0031  acc = 0.7550  curr quantile = 0.8300
epoch:   72  MSE = 0.0034  acc = 0.7450  curr quantile = 0.8550
epoch:   90  MSE = 0.0034  acc = 0.7500  curr quantile = 0.8500
epoch:  108  MSE = 0.0039  acc = 0.7200  curr quantile = 0.9050
epoch:  126  MSE = 0.0034  acc = 0.7450  curr quantile = 0.8500
epoch:  144  MSE = 0.0038  acc = 0.7250  curr quantile = 0.8900
epoch:  162  MSE = 0.0037  acc = 0.7350  curr quantile = 0.8850
Done

The goal during training is to reach the specified quantile (0.90 for the demo), and simultaneously get good prediction accuracy (where predicted output is within 15% of the target y in the training data), and also achieve a low mean squared error.

The demo concludes by evaluating the trained quantile regression model:

Evaluating final quantile regression model
Accuracy (15%) on train data = 0.7100
Accuracy (15%) on test data  = 0.7000
Computed quantile training = 0.90
Computed quantile test = 0.75

Predicting for x =
 -0.1660  0.4406 -0.9998 -0.3953 -0.7065 -0.8153
Predicted y = 0.7351

In general, the model accuracy should be similar for training and test data. The computed quantile on the training and test data should be close to the specified quantile (0.90 for the demo). In this example, the computed quantile on the training data is 0.90, which means that 90% of the predictions are greater than or equal to the true target y values, or equivalently, 10% of the target y values are under-predicted. The quantile regression percentile is only 0.75 on the test data because the training and test datasets are a bit too small for a neural network.

The demo uses the trained model to predict the y value for the first training item x = (-0.1660, 0.4406, -0.9998, -0.3953, -0.7065, -0.8153). The predicted y = 0.7351 is relatively close to the true target y = 0.7022.

This article assumes you have intermediate or better programming skill but doesn't assume you know anything about neural network quantile regression. The demo is implemented using C#, but because the code is complex, it would be very difficult to refactor to another programming language.

The source code for the demo program is too long to be presented in its entirety in this article. The complete code and data are available in the accompanying file download, and they're also available online.

Understanding Quantile Regression
Suppose you set a quantile value of 0.80 for a regression system. This means you are concerned with under-predictions. You want the system to predict y values as closely as possible, but overall, you want just 20% of the predictions to be less than the true target values.

A naive approach is to start by creating a standard regression model. About half of the predictions will be less than the target values, and about half will be greater. You can then modify the model's prediction method by adding some constant to the normal predicted values so that 80% of the predictions are greater than the associated target values. This naive quantile regression approach is simple and sometimes works reasonably well, but a better approach is to create a different model.

Figure 2: Naive Modify Predict Function vs. Using a Quantile Loss Function
[Click on image for larger view.] Figure 2: Naive Modify Predict Function vs. Using a Quantile Loss Function

The ideas are illustrated by the graph in Figure 2. The hypothetical graph shows a linear prediction system, but the ideas are the same for a neural network. There are 10 training data points. A standard linear regression model, the solid red prediction line, under-predicts 5 of the data points and over-predicts 5 of the data points.

The naive quantile approach, the dashed green prediction line, adds a constant 0.35 to the standard linear predictions. This increment is just enough so that 8 out of 10 data points are below the line and therefore you do not under-predict except for 2 of the data points.

However, the naive modified prediction line now does not fit the data as well as a different prediction line, the dashed purple line. You can maintain the 80th percentile requirement and get a better prediction model by using a special quantile loss function during training that penalizes under-predictions more than over-predictions.

There are many ways to define a quantile loss function. The demo system uses a quantile loss implicitly defined by:

error = target_y - predicted_y
loss = Max((q - 1.0) * err), (q * error))

For example, suppose the specified desired quantile is q = 0.80, and a target y value is 0.45. If the predicted y is 0.35 (an unwanted under-prediction of 0.10), the loss is:

error = 0.45 - 0.35 = 0.10
loss = Max(-0.20 * 0.10, 0.80 * 0.10)
     = 0.08

If the predicted y is 0.55 (a not-so-bad over-prediction of 0.10), the loss is:

error = 0.45 - 0.55 = -0.10
loss = Max(-0.20 * -0.10, 0.80 * -0.10)
     = 0.02

And so the 0.10 under-prediction is penalized more than the 0.10 over-prediction.

Using the Quantile Loss Function
When training a neural network, the loss function is not used directly. Instead, the calculus derivative of the loss function is used to compute weight gradients, which are then used to update the network weight and bias values so that predictions get better. The key code in the demo program that computes the derivative of the loss function is:

double lossDerivative;
if (y >= this.oNodes[k])  // tech no exist at ==
  lossDerivative = this.quantScale * this.quantile;
else
  lossDerivative = 1.0 - this.quantile;

The this.quantile variable is the specified quantile such as 0.90 or 0.80 and it does not change during training. However, if the quantile value is used directly, training often stalls due to a phenomenon called the vanishing gradient.

To deal with the vanishing gradient issue, the demo code uses a dynamic this.quantScale factor. The quantScale factor is adjusted at the end of each training epoch:

if (currQuantile < this.quantile)
  this.quantScale += qRate;
else if (currQuantile > this.quantile)
  this.quantScale -= qRate;

The quantScale value is incremented or decremented by a constant qRate value that is a training parameter. If you refer back to the screenshot in Figure 1, you can see the demo program sets qRate = 0.01.

All of this is based on quite complicated theory. However, except in rare scenarios, you won't have to modify this part of the source code. Most of the challenge of performing neural network quantile regression is finding good values for the four key hyperparameters: number hidden nodes, learning rate, quantile rate, max epochs. The batch size parameter usually has less influence than the other parameters on model accuracy.

The Demo Program
I used Visual Studio 2022 (Community Free Edition) for the demo program. I created a new C# console application and checked the "Place solution and project in the same directory" option. I specified .NET version 8.0. I named the project NeuralNetworkQuantileRegresionQuantileLoss. I checked the "Do not use top-level statements" option to avoid the weird program entry point shortcut syntax.

The demo has no significant .NET dependencies and any relatively recent version of Visual Studio with .NET (Core) or the older .NET Framework will work fine. You can also use the Visual Studio Code program if you like.

After the template code loaded into the editor, I right-clicked on file Program.cs in the Solution Explorer window and renamed the file to the more descriptive NeuralQuantileRegressionProgram.cs. I allowed Visual Studio to automatically rename class Program.

The overall program structure is presented in Listing 1. All the control logic is in the Main() method in the Program class. The Program class holds three functions that are used to load training and test data into memory: MatLoad(), MatToVec(), VecShow().

The neural network quantile regression functionality is in a NeuralNetwork class. The NeuralNetwork class exposes a constructor and five primary public methods: Train(), Predict(), Error(), Accuracy(), ComputedQuantile(). The class also exposes four public methods that are used to save and load the weights of a trained model: GetWeights(), SetWeights(), SaveWeights(), LoadWeights().

The NeuralNetwork class code has eight private methods that are helpers used by the public methods: MatCreate(), InitWeights(), HyperTan(), Identity(), ZeroOutGrads(), AccumGrads(), UpdateWeights(), Shuffle().

So, yes, neural networks are complicated. However, you won't need to modify the source code other than to insert Console.WriteLine() statements into the five primary methods to display informational messages to make the regression model more interpretable.

Listing 1: Overall Program Structure

using System;
using System.IO;
using System.Collections.Generic;

namespace NeuralNetworkQuantileRegressionQuantileLoss
{
  internal class NeuralQuantileRegressionProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("Begin neural network" +
        " quantile regression with quantile loss ");

      // 1. load data
      // 2. create model
      // 3. train model
      // 4. evaluate model
      // 5. use model

      Console.WriteLine("End demo ");
      Console.ReadLine();
    }

    // helpers for Main()

    static double[][] MatLoad(string fn, int[] usecols,
      char sep, string comment) { . . }

    static double[] MatToVec(double[][] mat) { . . }

    static void VecShow(double[] vec, int dec,
      int wid) { . . }
  }

  // --------------------------------------------------------

  public class NeuralNetwork
  {
    private int ni; // number input nodes
    private int nh;
    private int no;

    private double[] iNodes;
    private double[][] ihWeights; // input-hidden
    private double[] hBiases;
    private double[] hNodes;
    private double[][] hoWeights; // hidden-output
    private double[] oBiases;
    private double[] oNodes;  // single val as array

    private double[][] ihGrads;
    private double[] hbGrads;
    private double[][] hoGrads;
    private double[] obGrads;
    public double quantile;  // desired model quantile
    public double quantScale = 1.0;  // initial grad scale
    private Random rnd;

    public NeuralNetwork(int numIn, int numHid,
      int numOut, double quantile, int seed) { . . } 

    private static double[][] MatCreate(int rows,
      int cols) { . . }
    private void InitWeights() { . . }
    public void SetWeights(double[] wts) { . . }
    public double[] GetWeights() { . . }

    public double Predict(double[] x) { . . }

    private static double HyperTan(double x) { . . }
    private static double Identity(double x) { . . }

    private void ZeroOutGrads() { . . }
    private void AccumGrads(double y) { . . }
    private void UpdateWeights(double lrnRate) { . . }
    private void Shuffle(int[] sequence) { . . }

    public void Train(double[][] trainX, double[] trainY,
      double lrnRate, double qRate, int batSize,
      int maxEpochs) { . . }

    public double Error(double[][] trainX,
      double[] trainY) { . . }

    public double Accuracy(double[][] dataX,
      double[] dataY, double pctClose) { . . }

    public double ComputedQuantile(double[][] dataX,
      double[] dataY) { . . }

    public void SaveWeights(string fn) { . . }
    public void LoadWeights(string fn) { . . }
  } // NeuralNetwork class
} // ns

The demo program starts by loading the 200-item training data into memory:

string trainFile =  "..\\..\\..\\Data\\synthetic_train_200.txt";
int[] colsX = new int[] { 0, 1, 2, 3, 4, 5, };
double[][] trainX = MatLoad(trainFile, colsX, ',', "#");
double[] trainY =  MatToVec(MatLoad(trainFile,
  new int[] { 6 }, ',', "#"));

The training X data is stored into an array-of-arrays style matrix of type double. The data is assumed to be in a directory named Data, which is located in the project root directory. The arguments to the MatLoad() function mean load zero-based columns 0, 1, 2, 3, 4, 5 where the data is comma-delimited, and lines beginning with "#" are comments to be ignored. The training target y data in column [6] is loaded into a matrix and then converted to a one-dimensional vector using the MatToVec() helper function.

The 40-item test data is loaded into memory using the same pattern that was used to load the training data:

string testFile = "..\\..\\..\\Data\\synthetic_test_40.txt";
double[][] testX = MatLoad(testFile, colsX, ',', "#");
double[] testY = MatToVec(MatLoad(testFile,
  new int[] { 6 }, ',', "#"));

The first three training items are displayed using 4 decimals with 8 columns width:

Console.WriteLine("First three train X: ");
for (int i = 0; i < 3; ++i)
  VecShow(trainX[i], 4, 8);

Console.WriteLine("First three train y: ");
for (int i = 0; i < 3; ++i)
  Console.WriteLine(trainY[i].ToString("F4").PadLeft(8));

In a non-demo scenario, you might want to display all the training data to make sure it was correctly loaded into memory.

Creating and Training the Neural Network Quantile Regression Model
The neural network quantile regression model is created like so:

Console.WriteLine("Creating 6-100-1 0.90 quantile network ");
double quantile = 0.90;
NeuralNetwork nn = new NeuralNetwork(6, 100, 1, 
  quantile, seed: 0);
Console.WriteLine("Done ");

The seed parameter controls an internal random number generator that is used to initialize neural network weight and bias values to small random values. The random number generator is also used to scramble the order in which data items are processed during training.

The quantile regression model is trained using these statements:

int maxEpochs = 180;
double lrnRate = 0.02;
double qRate = 0.01;
int batSize = 10;
Console.WriteLine("Starting training ");
nn.Train(trainX, trainY, lrnRate, qRate, batSize, maxEpochs);
Console.WriteLine("Done ");

The values of the four training parameters must be determined by trial and error. In standard neural network regression, you watch the error value during training to see if it is decreasing. But in neural network quantile regression, you must also watch the current quantile value to make sure you approach the desired quantile percentile value.

Evaluating and Using the Neural Network Quantile Regression Model
The demo program evaluates the trained model accuracy using these statements:

Console.WriteLine("Evaluating quantile regression model ");
double trainAcc = nn.Accuracy(trainX, trainY, 0.15);
Console.WriteLine("Accuracy (15%) on train data = " +
  trainAcc.ToString("F4"));

double testAcc = nn.Accuracy(testX, testY, 0.15);
Console.WriteLine("Accuracy (15%) on test data  = " +
  testAcc.ToString("F4"));

A predicted y value is scored as correct if it is within 15% of the true target value. A reasonable closeness percentage to use will vary from problem to problem.

The demo program computes and displays the quantile percentile of the trained model like so:

double qTrain = nn.ComputedQuantile(trainX, trainY);
Console.WriteLine("Computed quantile training = " +
  qTrain.ToString("F2"));
double qTest = nn.ComputedQuantile(testX, testY);
Console.WriteLine("Computed quantile test = " +
  qTest.ToString("F2"));

The computed quantile is just the percentage of predicted y values that are greater than or equal to the associated target y values. In general, the computed quantile values should be roughly the same for both training and test data.

The demo program concludes by using the trained neural network quantile regression model to make a prediction:

double[] x = trainX[0];
Console.WriteLine("Predicting for x = ");
VecShow(x, 4, 8);
double predY = nn.Predict(x);
Console.WriteLine("Predicted y = " + predY.ToString("F4"));

The input x is the first training item, (-0.1660, 0.4406, -0.9998, -0.3953, -0.7065, -0.8153). The predicted y value is 0.7351, which is reasonably close (within 5%) to the true target y value of 0.7022.

Wrapping Up
The demo program illustrates a scenario where you want to avoid under-predicting by using a high (greater than 0.50) percentile value of 0.90. This loosely leads to predictions such as, "I predict that y = 0.6789 and there's roughly a 90% chance that the prediction will meet demand."

In some scenarios, you might want to generate two predictions, with one high percentile (say 0.90) and one low percentile (say 0.10). This leads to predictions such as "I predict that there is roughly a 90% chance that y will be between 0.4567 and y = 0.6789." This quantile regression two-predicted-values approach is somewhat similar to a classical statistics confidence interval prediction, but the underlying math assumptions are completely different and so the two techniques shouldn't be directly compared.

comments powered by Disqus

Featured

Subscribe on YouTube