The Data Science Lab
Winnow Classification Using C#
Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the Winnow classification technique. Winnow classification is used for a very specific scenario where the target variable to predict is binary and all the predictor variables are also binary.
The Winnow machine learning classification algorithm is used for a very specific scenario where the target variable to predict is binary and all the predictor variables are binary too. This article presents an end-to-end demonstration of binary classification using the Winnow algorithm.
A good way to see where this article is headed is to take a look at the screenshot in Figure 1. The demo program begins by loading a 200-item set of training data and a 32-item set of test data into memory. The first four training predictor values and the corresponding first four target political party values are:
First four train X voting:
0 1 1 0 1 1 0 0 0 0 0 0 1 1 1 1
0 1 0 1 1 1 0 0 0 0 0 1 1 1 0 1
1 1 1 0 0 0 1 1 1 0 1 0 0 0 1 1
1 1 1 0 0 0 1 1 1 0 0 0 0 0 1 1
First four labels (0=Democrat, 1=Republican) :
0 1 0 0
The predictor values represent how a member of the U.S. Congress voted on 16 different bills in 1984, where 0 means a no vote and 1 means a yes vote. The target label to predict is the political party of the representative, where 0 is Democrat and 1 is Republican.
The demo creates and trains a Winnow prediction model:
Starting training
epoch = 0 accuracy = 0.9550
epoch = 1 accuracy = 0.9800
epoch = 2 accuracy = 0.9800
epoch = 3 accuracy = 0.9800
Done
The demo then displays the weights of the trained model and the accuracy of the model on the training and test data:
Model weights:
0.50 0.00 0.00 8.00 1.00 0.03 0.50 0.00
0.03 0.25 0.00 4.00 0.02 0.50 1.00 0.25
Accuracy on train = 0.9800
Accuracy on test = 0.9062
As you'll see shortly, there is one weight per predictor variable. The model correctly predicts the training data with 98.00% accuracy (196 out of 200 correct) and test data with 90.62% accuracy (29 out of 32 correct). The demo computes and displays more granular accuracy metrics for the test data in the form of a confusion table:
Confusion matrix test data
actual 0 | 16 3 | 0.8421
actual 1 | 0 13 | 1.0000
----------
predicted: 0 1
The demo concludes by predicting the political party of a House member who votes in an artificial pattern:
Predicting for voting:
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Predicted party: 1
(0=Democrat, 1=Republican)
This article assumes you have intermediate or better programming skill but doesn't assume you know anything about the Winnow binary classification algorithm. The demo is implemented using C# but you should be able to refactor the demo code to another C-family language if you wish. All normal error checking has been removed to keep the main ideas as clear as possible.
The source code for the demo program is a bit too long to be presented in its entirety in this article. The complete code and data are available in the accompanying file download, and they're also available online.
The Demo Data
The source data is the Congressional Voting Records Dataset from archive.ics.uci.edu/dataset/105/congressional+voting+records. The source data has 435 rows and looks like:
republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y
. . .
Each row represents one of the 435 House of Representatives members in the year 1984. The first column is the political party, and the next 16 values are votes on 16 different bills where "y" is a yes vote, "n" is a no vote, and "?" is a missing value (abstain or not present). Of the 435 rows, 203 have one or more missing values, leaving 232 valid rows.
The 232 valid data items were preprocessed by encoding Democrat = 0, Republican = 1, yes = 1, no = 0. The encoded data looks like:
0,0,1,1,0,1,1,0,0,0,0,0,0,1,1,1,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,1
0,1,1,1,0,0,0,1,1,1,0,1,0,0,0,1,1
0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,1,1
. . .
The first 200 of the encoded data items were saved as for training and the remaining 32 items were saved for testing. In a non-demo scenario it's better to randomly split the data into training and test files.
Understanding How the Winnow Algorithm Works
Each of the 16 predictor variables has an associated weight. To make a prediction for an input vector x, the sum of the products of the x values and the associated weights is computed. If that sum is greater than a threshold value, the prediction is class 1, otherwise the prediction is class 0. The threshold value is usually set to dim / 2 where dim is the number of predictor variables. Because the demo data has 16 predictors, the threshold was set to 16 / 2 = 8.0.
For example, suppose there are just five predictors and the input x = (1, 0, 1, 0, 1) and the five model weights are (0.50, 2.00, 4.00, 0.25, 1.00) and the threshold is set to dim / 2 = 2.5. The sum of the products is (1 * 0.50) + (0 * 2.00) + (1 * 4.00) + (0 * 0.25) + (1 * 1.00) = 5.50 which exceeds the threshold, so the prediction is class 1.
Training the Winnow model is the process of finding the weight values using a so-called alpha value that is almost always equal to 2.0. In pseudo-code:
loop max_epochs times
loop each training item x in scrambled order
fetch actual Y from training data
compute predicted Y using x and curr wts
if predicted Y is less than actual Y
multiply all relevant weights by alpha (2.0)
else if predicted Y is greater than actual Y
divide all relevant weights by alpha (2.0)
else if predicted Y equals actual Y
do nothing
end-loop
end-loop
A relevant weight is one associated with an input variable of 1 (not a 0). The weight values are initialized to 1.0 and increase quickly (1, 2, 4, 8, . . ) or decrease quickly (1, 0.50, 0.25, 0.125 . .) so the max_epochs for training is usually small -- typically about log2(dim).
Many of the resulting trained model weights are often very small, close to zero, and so the associated predictor variables have been effectively winnowed away -- one of the meanings of the word "winnow" is "to remove." For the demo data, notice that the model weights suggest that how a House Representative voted on bill #4 (weight = 8.00) and bill #12 (weight = 4.00) almost completely determines their political party.
The Demo Program
I used Visual Studio 2022 (Community Free Edition) for the demo program. I created a new C# console application and checked the "Place solution and project in the same directory" option. I specified .NET version 8.0. I named the project WinnowVoting. I checked the "Do not use top-level statements" option to avoid the program entry point shortcut syntax.
The demo has no significant .NET dependencies and any relatively recent version of Visual Studio with .NET (Core) or the older .NET Framework will work fine. You can also use the Visual Studio Code program if you like.
After the template code loaded into the editor, I right-clicked on file Program.cs in the Solution Explorer window and renamed the file to the slightly more descriptive WinnowVotingProgram.cs. I allowed Visual Studio to automatically rename class Program.
The overall program structure is presented in Listing 1. All the control logic is in the Main() method. All of the Winnow classification functionality is in a Winnow class. A Utils class holds helper functions for Main() to load data from file to memory, and functions to display vectors and matrices.
Listing 1: Overall Program Structure
using System;
using System.IO;
using System.Collections.Generic;
namespace WinnowVoting
{
internal class WinnowVotingProgram
{
static void Main(string[] args)
{
Console.WriteLine("Begin Winnow classification" +
" on Congressional Voting Dataset ");
// 1. load data
// 2. create and train model
// 3. evaluate model
// 4. use model
Console.WriteLine("End Winnow demo ");
Console.ReadLine();
} // Main
} // class Program
// --------------------------------------------------------
public class Winnow
{
public double[] wts;
public double alpha;
public double threshold;
public Random rnd;
public Winnow(double alpha = 2.0, int seed = 0) { . . }
public void Train(int[][] trainX, int[] trainY,
int maxEpochs) { . . }
public int Predict(int[] dataX) { . . }
private void Shuffle(int[] arr) { . . }
public double Accuracy(int[][] dataX, int[] dataY) { . . }
public int[][] ConfusionMatrix(int[][] dataX,
int[] dataY) { . . }
public void ShowConfusion(int[][] cm) { . . }
}
// --------------------------------------------------------
public class Utils
{
public static int[][] MatLoad(string fn,
int[] usecols, char sep, string comment) { . . }
public static int[] MatToIntVec(int[][] m) { . . }
public static void MatShow(double[][] m,
int dec, int wid) { . . }
public static void VecShow(double[] vec, int dec,
int wid) { . . }
public static void VecShow(int[] vec, int wid) { . . }
}
// --------------------------------------------------------
} // ns
The demo starts by loading the 200-item training data into memory:
Console.WriteLine("Loading train (200) and test (32) data ");
string trainFile = "..\\..\\..\\Data\\votes_train.txt";
int[][] trainX = Utils.MatLoad(trainFile,
new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16 }, ',', "#");
int[] trainY =
Utils.MatToIntVec(Utils.MatLoad(trainFile,
new int[] { 0 }, ',', "#"));
The demo assumes that the data is stored in separate training and test files in a directory named Data which is located in the project root directory. The training data predictor values in zero-based columns [1] to [16] are read into memory as an array-of-arrays style matrix of type double using the program-defined MatLoad() helper function. The arguments to MatLoad() are the 0-based indices of the columns to load, the delimiter character, and string that indicates a comment line. The training data target class labels in column [0] are read into memory as a matrix and then converted to an integer vector using the MatToIntVec() helper function.
The 32-item test data is loaded in the same way:
string testFile = "..\\..\\..\\Data\\votes_test.txt";
int[][] testX = Utils.MatLoad(testFile,
new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16 }, ',', "#");
int[] testY =
Utils.MatToIntVec(Utils.MatLoad(testFile,
new int[] { 0 }, ',', "#"));
Console.WriteLine("Done ");
The demo displays the first four input x vectors and the associated target y values as a sanity check:
Console.WriteLine("First four train X voting: ");
for (int i = 0; i < 4; ++i)
Utils.VecShow(trainX[i], 2);
Console.WriteLine("First four labels (0=Democrat, 1=Republican) : ");
for (int i = 0; i < 4; ++i)
Console.Write(trainY[i] + " ");
Console.WriteLine("");
In a non-demo scenario, you might want to display all the data to make sure it has been correctly loaded into memory.
Creating and Training the Winnow Model
The Winnow binary classification model is created and trained like so:
double alpha = 2.0;
Console.WriteLine("Creating Winnow classifier with alpha = " +
alpha.ToString("F1"));
Winnow wnn = new Winnow(alpha, seed: 0);
Console.WriteLine("Starting training ");
wnn.Train(trainX, trainY, 4);
Console.WriteLine("Done ");
As explained above, the alpha value controls how quickly model weights increase and decrease. The value of 2.0 is almost always used for alpha. If you use a value greater than 2.0, then the model weights will increase and decrease very, very quickly. Note that you can't use a value less than 1.0 for alpha.
The model is trained using 4 epochs which is the log2 of the number of predictor variables. The idea is that if a weight value is initialized to 1.0 and it increases on every epoch, after 4 epochs the weight will be 8.00. Because the threshold is usually set to dim / 2 = 16 / 2 = 8.0, one more increase in the weight value to 16.0 would guarantee that a 1 value in the associated predictor variable would overwhelm all other predictor variables and guarantee a prediction of class 1.
Evaluating and Using the Trained Model
The accuracy of the trained model is computed and displayed like so:
double accTrain = wnn.Accuracy(trainX, trainY);
Console.WriteLine("\nAccuracy on train = " + accTrain.ToString("F4"));
double accTest = wnn.Accuracy(testX, testY);
Console.WriteLine("Accuracy on test = " + accTest.ToString("F4"));
The accuracy is just the number of correct predictions divided by the total number of predictions made. For most binary classification models, it's standard practice to compute separate accuracy metrics for each of the two possible classes in the form of a confusion matrix:
Console.WriteLine("Confusion matrix test data ");
int[][] cmTest = wnn.ConfusionMatrix(testX, testY);
wnn.ShowConfusion(cmTest);
The demo uses two different functions, one to compute the confusion matrix/table and one to display the matrix because exactly how you want to display your confusion data can vary from problem to problem.
The demo concludes by using the Winnow model to make a prediction for a new, previously unseen data item:
int[] x = new int[] { 0, 1, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1 };
Console.WriteLine("Predicting for voting: ");
Utils.VecShow(x, 2); // width 2
int predY = wnn.Predict(x);
Console.WriteLine("Predicted party: " + predY);
Console.WriteLine("(0=Democrat, 1=Republican ");
In some scenarios you might want to display the sum of products of weights times predictor values, which is computed by the Predict() method. For example, for the input above, the sum of products is 13.04 which is well above the threshold of 8.0. This gives you a crude measure of the confidence of the prediction. Sums close to the threshold have relatively low confidence.
The demo program does not implement a Save() method to save the trained model weights to a text file. This would be useful if some other system would be using the trained model and didn't have access to the training data.
Wrapping Up
The Winnow binary classification technique is not used very often. It was published in a 1988 research paper during the early years of modern machine learning. Because the Winnow technique uses a linear prediction function it works best for relatively simple problem scenarios with simple data where the two classes are linearly separable. Winnow classification is useful when you suspect that some of the predictor variables are not important, and if so, Winnow classification can reveal those predictors because their associated weights will be close to zero.