The Data Science Lab


Binary Classification Using New PyTorch Best Practices, Part 2: Training, Accuracy, Predictions

Dr. James McCaffrey of Microsoft Research explains how to train a network, compute its accuracy, use it to make predictions and save it for use by other programs.

Binary Classification Using PyTorch, Part 1: New Best Practices

Because machine learning with deep neural techniques has advanced quickly, our resident data scientist updates binary classification techniques and best practices based on experience over the past two years.

Multi-Class Classification Using New PyTorch Best Practices, Part 2: Training, Accuracy, Predictions

Following new best practices, Dr. James McCaffrey of Microsoft Research revisits multi-class classification for when the variable to predict has three or more possible values.

Multi-Class Classification Using PyTorch, Part 1: New Best Practices

Dr. James McCaffrey of Microsoft Research updates previous tutorials with new, cutting-edge deep neural machine learning techniques.

Vortex

ANOVA Using C#

One use case for the analysis of variance statistics technique is asking if student performances are the same in three classrooms taught by the same teacher but with different textbooks, says Dr. James McCaffrey of Microsoft Research.

The LogBeta and LogGamma Functions Using C#

With no built-in functions for classical statistics analyses in the .NET library, Dr. James McCaffrey of Microsoft Research explains how to roll your own from scratch.

White and Blue Boxes Graphic

Lightweight Mathematical Combinations Using C#

After previously discussing permutations, Dr. James McCaffrey of Microsoft Research uses step-by-step examples and full code presentations to explore combinations.

Lightweight Mathematical Permutations Using C#

Get ready to use the BigInteger data type as Dr. James McCaffrey of Microsoft Research demonstrates zero-based mathematical permutations with C#.

Circl

Runs Testing Using C# Simulation

Dr. James McCaffrey of Microsoft Research uses a full code program for a step-by-step explanation of this machine learning technique that indicates if patterns are random.

Space

Probit Regression Using C#

Dr. James McCaffrey of Microsoft Research explains the classical machine learning technique typically used for binary classification -- predicting an outcome that can only be one of two discrete values.

Purple Nebula Graphic

Weighted k-NN Classification Using C#

Dr. James McCaffrey of Microsoft Research explains the machine learning technique, which can be used to predict a person's happiness score from their income and education, for example.

Color Wave

Naive Bayes Classification Using C#

Dr. James McCaffrey of Microsoft Research presents a full step-by-step example with all code to predict a person's optimism score from their occupation, eye color and country.

Red Shapes

CIFAR-10 Image Classification Using PyTorch

CIFAR-10 problems analyze crude 32 x 32 color images to predict which of 10 classes the image is. Here, Dr. James McCaffrey of Microsoft Research shows how to create a PyTorch image classification system for the CIFAR-10 dataset.

Nebula

Preparing CIFAR Image Data for PyTorch

CIFAR-10 problems analyze crude 32 x 32 color images to predict which of 10 classes the image is. Here, Dr. James McCaffrey of Microsoft Research explains how to get the raw source CIFAR-10 data, convert it from binary to text and save it as a text file that can be used to train a PyTorch neural network classifier.

Matrix

Sentiment Classification of IMDB Movie Review Data Using a PyTorch LSTM Network

This demo from Dr. James McCaffrey of Microsoft Research of creating a prediction system for IMDB data using an LSTM network can be a guide to create a classification system for most types of text data.

Preparing IMDB Movie Review Data for NLP Experiments

Dr. James McCaffrey of Microsoft Research shows how to get the raw source IMDB data, read the movie reviews into memory, parse and tokenize the reviews, create a vocabulary dictionary and convert the reviews to a numeric form.

Convolutional Neural Networks for MNIST Data Using PyTorch

Dr. James McCaffrey of Microsoft Research details the "Hello World" of image classification: a convolutional neural network (CNN) applied to the MNIST digits dataset.

Preparing MNIST Image Data Text Files

Dr. James McCaffrey of Microsoft Research demonstrates how to fetch and prepare MNIST data for image recognition machine learning problems.

Speed Lines Graphic

Quantum-Inspired Annealing Using C# or Python

Dr. James McCaffrey of Microsoft Research explains a new idea that slightly modifies standard simulated annealing by borrowing ideas from quantum mechanics.

Chi-Square Test Using C#

A chi-square (also called chi-squared) test is a classical statistics technique that can be used to determine if observed-count data matches expected-count data.

Green Motherboard Closeup Graphic

How to Compute Transformer Architecture Model Accuracy

Dr. James McCaffrey of Microsoft Research uses the Hugging Face library to simplify the implementation of NLP systems using Transformer Architecture (TA) models.

Simulated Annealing Optimization Using C# or Python

Dr. James McCaffrey of Microsoft Research shows how to implement simulated annealing for the Traveling Salesman Problem (find the best ordering of a set of discrete items).

Swirl

How to Fine-Tune a Transformer Architecture NLP Model

The goal is sentiment analysis -- accept the text of a movie review (such as, "This movie was a great waste of my time.") and output class 0 (negative review) or class 1 (positive review).

How to Create a Transformer Architecture Model for Natural Language Processing

The goal is to create a model that accepts a sequence of words such as "The man ran through the {blank} door" and then predicts most-likely words to fill in the blank.

Red Brick Graphic

Anomaly Detection Using Principal Component Analysis (PCA)

The main advantage of using PCA for anomaly detection, compared to alternative techniques such as a neural autoencoder, is simplicity -- assuming you have a function that computes eigenvalues and eigenvectors.

Subscribe on YouTube