The Data Science Lab


Sentiment Analysis Using a PyTorch EmbeddingBag Layer

Dr. James McCaffrey of Microsoft Research uses a full movie review example to explain the natural language processing (NLP) problem of sentiment analysis, used to predict whether some text is positive (class 1) or negative (class 0).

Logistic Regression Using PyTorch with L-BFGS

Dr. James McCaffrey of Microsoft Research demonstrates applying the L-BFGS optimization algorithm to the ML logistic regression technique for binary classification -- predicting one of two possible discrete values.

Generating Synthetic Data Using a Generative Adversarial Network (GAN) with PyTorch

Dr. James McCaffrey of Microsoft Research explains a generative adversarial network, a deep neural system that can be used to generate synthetic data for machine learning scenarios, such as generating synthetic males for a dataset that has many females but few males.

Positive and Unlabeled Learning (PUL) Using PyTorch

Dr. James McCaffrey of Microsoft Research provides a code-driven tutorial on PUL problems, which often occur with security or medical data in cases like training a machine learning model to predict if a hospital patient has a disease or not.

Blule Squares

Generating Synthetic Data Using a Variational Autoencoder with PyTorch

Generating synthetic data is useful when you have imbalanced training data for a particular class, for example, generating synthetic females in a dataset of employees that has many males but few females.

Autoencoder Anomaly Detection Using PyTorch

Dr. James McCaffrey of Microsoft Research provides full code and step-by-step examples of anomaly detection, used to find items in a dataset that are different from the majority for tasks like detecting credit card fraud.

How To: Create a Streaming Data Loader for PyTorch

When training data won't fit into machine memory, a streaming data loader using an internal memory buffer can help. Dr. James McCaffrey of Microsoft Research shows how.

Neural Regression Using PyTorch: Model Accuracy

Dr. James McCaffrey of Microsoft Research explains how to evaluate, save and use a trained regression model, used to predict a single numeric value such as the annual revenue of a new restaurant based on variables such as menu prices, number of tables, location and so on.

Black White Wave IMage

Neural Regression Using PyTorch: Training

The goal of a regression problem is to predict a single numeric value, for example, predicting the annual revenue of a new restaurant based on variables such as menu prices, number of tables, location and so on.

Neural Regression Using PyTorch: Defining a Network

Dr. James McCaffrey of Microsoft Research presents the second of four machine learning articles that detail a complete end-to-end production-quality example of neural regression using PyTorch.

Neural Regression Classification Using PyTorch: Preparing Data

Dr. James McCaffrey of Microsoft Research presents the first in a series of four machine learning articles that detail a complete end-to-end production-quality example of neural regression using PyTorch.

Multi-Class Classification Using PyTorch: Model Accuracy

Dr. James McCaffrey of Microsoft Research continues his four-part series on multi-class classification, designed to predict a value that can be one of three or more possible discrete values, by explaining model accuracy.

Multi-Class Classification Using PyTorch: Training

Dr. James McCaffrey of Microsoft Research continues his four-part series on multi-class classification, designed to predict a value that can be one of three or more possible discrete values, by explaining neural network training.

Multi-Class Classification Using PyTorch: Defining a Network

Dr. James McCaffrey of Microsoft Research explains how to define a network in installment No. 2 of his four-part series that will present a complete end-to-end production-quality example of multi-class classification using a PyTorch neural network.

Multi-Class Classification Using PyTorch: Preparing Data

Dr. James McCaffrey of Microsoft Research kicks off a four-part series on multi-class classification, designed to predict a value that can be one of three or more possible discrete values.

Binary Classification Using PyTorch: Model Accuracy

In the final article of a four-part series on binary classification using PyTorch, Dr. James McCaffrey of Microsoft Research shows how to evaluate the accuracy of a trained model, save a model to file, and use a model to make predictions.

Binary Classification Using PyTorch: Training

Dr. James McCaffrey of Microsoft Research continues his examination of creating a PyTorch neural network binary classifier through six steps, here addressing step No. 4: training the network.

Binary Classification Using PyTorch: Defining a Network

Dr. James McCaffrey of Microsoft Research tackles how to define a network in the second of a series of four articles that present a complete end-to-end production-quality example of binary classification using a PyTorch neural network, including a full Python code sample and data files.

Binary Classification Using PyTorch: Preparing Data

Dr. James McCaffrey of Microsoft Research kicks off a series of four articles that present a complete end-to-end production-quality example of binary classification using a PyTorch neural network, including a full Python code sample and data files.

How to Create and Use a PyTorch DataLoader

Dr. James McCaffrey of Microsoft Research provides a full code sample and screenshots to explain how to create and use PyTorch Dataset and DataLoader objects, used to serve up training or test data in order to train a PyTorch neural network.

Data Prep for Machine Learning: Splitting

Dr. James McCaffrey of Microsoft Research explains how to programmatically split a file of data into a training file and a test file, for use in a machine learning neural network for scenarios like predicting voting behavior from a file containing data about people such as sex, age, income and so on.

Data Prep for Machine Learning: Encoding

Dr. James McCaffrey of Microsoft Research uses a full code program and screenshots to explain how to programmatically encode categorical data for use with a machine learning prediction model such as a neural network classification or regression system.

Data Prep for Machine Learning: Normalization

Dr. James McCaffrey of Microsoft Research uses a full code sample and screenshots to show how to programmatically normalize numeric data for use in a machine learning system such as a deep neural network classifier or clustering algorithm.

Data Prep for Machine Learning: Outliers

After previously detailing how to examine data files and how to identify and deal with missing data, Dr. James McCaffrey of Microsoft Research now uses a full code sample and step-by-step directions to deal with outlier data

Data Prep for Machine Learning: Missing Data

Turning his attention to the extremely time-consuming task of machine learning data preparation, Dr. James McCaffrey of Microsoft Research explains how to examine data files and how to identify and deal with missing data.