The Data Science Lab

Multi-Class Classification Accuracy by Class Using PyTorch

Dr. James McCaffrey of Microsoft Research: When multi-class data is skewed toward one or more classes, it's very important to analyze accuracy by class.

A multi-class classification problem is one where the goal is to predict a discrete value where there are three or more possibilities. For example, you might want to predict the political leaning (conservative, moderate, liberal) of a person based on their sex, age, state where they live and income.

A naive approach for evaluating the effectiveness of a trained multi-class model is to compute the accuracy of the model on all test data. But suppose your data is skewed toward a particular class. For example, if most of the data items are class moderate (say, 900 out of 1,000) and only a few are class conservative (say, 40 out of 1,000) and class liberal (60 out of 1,000), then a model that predicts class moderate for any input will score 90 percent accuracy.

This article shows how to compute classification accuracy class-by-class. A good way to see where this article is headed is to take a look at the screenshot of a demo program in Figure 1. The demo begins by loading a 200-item set of training data and a 40-item set of test data.

Figure 1: Multi-Class Accuracy by Class
[Click on image for larger view.] Figure 1: Multi-Class Accuracy by Class

After training, the overall classification accuracy of the model on the test data is 75.00 percent. The demo then computes accuracy for each of the three classes: conservative (54.55 percent), moderate (92.86 percent) and liberal (73.33 percent). Interpreting accuracy by class is a bit subjective, but the demo results are reasonable.

To run the demo program, you must have Python and PyTorch installed on your machine. The demo programs were developed on Windows 10/11 using the Anaconda 2020.02 64-bit distribution (which contains Python 3.7.6) and PyTorch version 1.12.1 for CPU installed via pip. You can find detailed step-by-step installation instructions for this configuration in my blog posts here and here.

The complete demo program source code can be found here and is also attached to the article as a downloadable compressed file.

The Data
The format of the training and test data looks like:

 1   0.24   1   0   0   0.2950   2
-1   0.39   0   0   1   0.5120   1
 1   0.63   0   1   0   0.7580   0
-1   0.36   1   0   0   0.4450   1
 1   0.27   0   1   0   0.2860   2
. . .

Each line of tab-delimited data represents a person. The fields are sex (male = -1, female = +1), age (divided by 100), state (Michigan = 100, Nebraska = 010, Oklahoma = 001), income (divided by $100,000) and political type (conservative = 0, moderate = 1, liberal = 2).

The complete training and test data can be found here.

The demo defines a PeopleDataset class in Listing 1. An instance of a PeopleDataset object can be passed to a DataLoader for training, or be used directly to compute classification accuracy.

Listing 1: PeopleDataset Class Definition

class PeopleDataset(T.utils.data.Dataset):
  def __init__(self, src_file):
    all_xy = np.loadtxt(src_file, usecols=range(0,7),
      delimiter="\t", comments="#", dtype=np.float32)
    tmp_x = all_xy[:,0:6]   # cols [0,6) = [0,5]
    tmp_y = all_xy[:,6]     # 1-D

    self.x_data = T.tensor(tmp_x, 
      dtype=T.float32).to(device)
    self.y_data = T.tensor(tmp_y,
      dtype=T.int64).to(device)  # 1-D

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx]
    trgts = self.y_data[idx] 
    return preds, trgts  # as a Tuple

The Demo Program
The structure of the demo program is presented in Listing 2. I prefer to indent my Python programs with two spaces rather than the more common four spaces. The backslash character is used for line continuation in Python.

Listing 2: Overall Demo Program Structure

# people_politics.py
# predict politics type from sex, age, state, income
# PyTorch 1.12.1-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11 

# compute accuracy by class

import numpy as np
import torch as T
device = T.device('cpu')  # apply to Tensor or Module

# -----------------------------------------------------------

class PeopleDataset(T.utils.data.Dataset): . . . 

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(6, 10)  # 6-(10-10)-3
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 3)

    T.nn.init.xavier_uniform_(self.hid1.weight)
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.xavier_uniform_(self.hid2.weight)
    T.nn.init.zeros_(self.hid2.bias)
    T.nn.init.xavier_uniform_(self.oupt.weight)
    T.nn.init.zeros_(self.oupt.bias)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = T.log_softmax(self.oupt(z), dim=1)  # NLLLoss() 
    return z

# -----------------------------------------------------------

def accuracy(model, ds):
  # assumes model.eval()
  # item-by-item version
  n_correct = 0; n_wrong = 0
  for i in range(len(ds)):
    X = ds[i][0].reshape(1,-1)  # make it a batch
    Y = ds[i][1].reshape(1)  # 0 1 or 2, 1D
    with T.no_grad():
      oupt = model(X)  # logits form

    big_idx = T.argmax(oupt)  # 0 or 1 or 2
    if big_idx == Y:
      n_correct += 1
    else:
      n_wrong += 1

  acc = (n_correct * 1.0) / (n_correct + n_wrong)
  return acc

# -----------------------------------------------------------

def do_acc(model, dataset, n_classes):
  X = dataset[0:len(dataset)][0]  # all X values
  Y = dataset[0:len(dataset)][1]  # all Y values
  with T.no_grad():
    oupt = model(X)  #  [40,3]  all logits

  for c in range(n_classes):
    idxs = np.where(Y==c)  # indices where Y is c
    logits_c = oupt[idxs]  # logits corresponding to Y == c
    arg_maxs_c = T.argmax(logits_c, dim=1)  # predicted class
    num_correct = T.sum(arg_maxs_c == c)
    acc_c = num_correct.item() / len(arg_maxs_c)
    print("%0.4f " % acc_c)

# -----------------------------------------------------------

def main():
  # 0. get started
  print("Begin People predict politics type ")
  T.manual_seed(1)
  np.random.seed(1)
  
  # 1. create DataLoader objects
  print("Creating People Datasets ")

  train_file = ".\\Data\\people_train.txt"
  train_ds = PeopleDataset(train_file)  # 200 rows

  test_file = ".\\Data\\people_test.txt"
  test_ds = PeopleDataset(test_file)    # 40 rows

  bat_size = 10
  train_ldr = T.utils.data.DataLoader(train_ds,
    batch_size=bat_size, shuffle=True)

# -----------------------------------------------------------

  # 2. create network
  print("Creating 6-(10-10)-3 neural network ")
  net = Net().to(device)
  net.train()

# -----------------------------------------------------------

  # 3. train model
  max_epochs = 1000
  ep_log_interval = 100
  lrn_rate = 0.01

  loss_func = T.nn.NLLLoss()  # assumes log_softmax()
  optimizer = T.optim.SGD(net.parameters(), lr=lrn_rate)

  print("bat_size = %3d " % bat_size)
  print("loss = " + str(loss_func))
  print("optimizer = SGD")
  print("max_epochs = %3d " % max_epochs)
  print("lrn_rate = %0.3f " % lrn_rate)

  print("Starting training")
  for epoch in range(0, max_epochs):
    epoch_loss = 0  # for one full epoch

    for (batch_idx, batch) in enumerate(train_ldr):
      X = batch[0]  # inputs
      Y = batch[1]  # correct class/label/politics

      optimizer.zero_grad()
      oupt = net(X)
      loss_val = loss_func(oupt, Y)  # a tensor
      epoch_loss += loss_val.item()  # accumulate
      loss_val.backward()
      optimizer.step()

    if epoch % ep_log_interval == 0:
      print("epoch = %5d  |  loss = %10.4f" % \
        (epoch, epoch_loss))
  print("Training done ")

# -----------------------------------------------------------

  # 4. evaluate model accuracy
  print("Computing overall model accuracy")
  net.eval()
  acc_train = accuracy(net, train_ds)  # item-by-item
  print("Accuracy on training data = %0.4f" % acc_train)
  acc_test = accuracy(net, test_ds) 
  print("Accuracy on test data = %0.4f" % acc_test)

  print("Accuracy on test by class (fast technique): ")
  do_acc(net, test_ds, 3)

  # 5. make a prediction
  print("Predicting for M  30  oklahoma  $50,000: ")
  X = np.array([[-1, 0.30,  0,0,1,  0.5000]],
    dtype=np.float32)
  X = T.tensor(X, dtype=T.float32).to(device) 

  with T.no_grad():
    logits = net(X)  # do not sum to 1.0
  probs = T.exp(logits)  # sum to 1.0
  probs = probs.numpy()  # numpy vector prints better
  np.set_printoptions(precision=4, suppress=True)
  print(probs)

  # 6. save model (state_dict approach)
  print("Saving trained model state")
  # fn = ".\\Models\\people_model.pt"
  # T.save(net.state_dict(), fn)

  print("End People predict politics demo")

if __name__ == "__main__":
  main()  main()

All the control logic is in a program-defined main() function. The program begins with some preparation statements:

import numpy as np
import torch as T
device = T.device('cpu')  # apply to Tensor or Module

def main():
  # 0. get started
  print("Begin People predict politics type ")
  T.manual_seed(1)
  np.random.seed(1)
. . .

After loading data into memory, the multi-class classifier is created and training is prepared. Training is accomplished using standard techniques.

Evaluating Accuracy by Class
The key do_acc() function that defines accuracy by class is presented in Listing 3. The function accepts a Dataset object which holds either training or test data.

Listing 3: Accuracy by Class Function

def do_acc(model, dataset, n_classes):
  X = dataset[0:len(dataset)][0]  # all X values
  Y = dataset[0:len(dataset)][1]  # all Y values
  with T.no_grad():
    oupt = model(X)  #  [40,3]  all logits

  for c in range(n_classes):
    idxs = np.where(Y==c)  # indices where Y is c
    logits_c = oupt[idxs]  # logits corresponding to Y == c
    arg_maxs_c = T.argmax(logits_c, dim=1)  # predicted class
    num_correct = T.sum(arg_maxs_c == c)
    acc_c = num_correct.item() / len(arg_maxs_c)
    print("%0.4f " % acc_c)

First, all the inputs are stored into X and all the target outputs are stored into Y. The outputs are in the form of logits where the location of the largest output corresponds to the class (0, 1, 2). The output of the do_acc() function is a print() statement that summarizes the classification for each class. In a non-demo scenario, you might want to return the acc_c matrix that holds the results.

Wrapping Up
When multi-class data is skewed toward one or more classes, it's very important to analyze accuracy by class. This means that before starting work on a multi-class prediction system, it's almost always a good idea to analyze the training data to determine if there is any data skewed by class.

About the Author

Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Azure and Bing. James can be reached at [email protected].

comments powered by Disqus

Featured

  • Creating Reactive Applications in .NET

    In modern applications, data is being retrieved in asynchronous, real-time streams, as traditional pull requests where the clients asks for data from the server are becoming a thing of the past.

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

  • .NET 9 Preview 3: 'I've Been Waiting 9 Years for This API!'

    Microsoft's third preview of .NET 9 sees a lot of minor tweaks and fixes with no earth-shaking new functionality, but little things can be important to individual developers.

  • Data Anomaly Detection Using a Neural Autoencoder with C#

    Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.

Subscribe on YouTube