The Data Science Lab


DBSCAN Clustering and Anomaly Detection Using C#

Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of data clustering and anomaly detection using the DBSCAN (Density Based Spatial Clustering of Applications with Noise) algorithm. Compared to other anomaly detection systems based on data clustering, DBSCAN can find significantly different types of anomalies.

Winnow Classification Using C#

Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the Winnow classification technique. Winnow classification is used for a very specific scenario where the target variable to predict is binary and all the predictor variables are also binary.

Implementing k-NN Classification Using C#

Dr. James McCaffrey of Microsoft Research presents a full demo of k-nearest neighbors classification on mixed numeric and categorical data. Compared to other classification techniques, k-NN is easy to implement, supports numeric and categorical predictor variables, and is highly interpretable.

Logistic Regression with Batch SGD Training and Weight Decay Using C#

Dr. James McCaffrey from Microsoft Research presents a complete end-to-end program that explains how to perform binary classification (predicting a variable with two possible discrete values) using logistic regression, where the prediction model is trained using batch stochastic gradient descent with weight decay.

AdaBoost Binary Classification Using C#

Dr. James McCaffrey from Microsoft Research presents a C# program that illustrates using the AdaBoost algorithm to perform binary classification for spam detection. Compared to other classification algorithms, AdaBoost is powerful and works well with small datasets, but is sometimes susceptible to model overfitting.

Artificial Immune Systems for Intrusion Detection Using C#

Dr. James McCaffrey from Microsoft Research presents a demonstration program that models biological immune systems to identify network intrusion threats. The demo illustrates challenges with artificial immune systems as well as promising new approaches.

Black White Wave IMage

Data Anomaly Detection Using LightGBM

Dr. James McCaffrey from Microsoft Research presents a complete program that uses the Python language LightGBM system to create a custom autoencoder for data anomaly detection. You can easily adapt the demo program for your own anomaly detection scenarios.

Data Dimensionality Reduction Using a Neural Autoencoder with C#

Dr. James McCaffrey of Microsoft Research presents a full-code, step-by-step tutorial on creating an approximation of a dataset that has fewer columns.

Binary Classification Using LightGBM

Dr. James McCaffrey from Microsoft Research presents a full-code, step-by-step tutorial on using the LightGBM tree-based system to perform binary classification (predicting a discrete variable that has exactly two possible values).

Nearest Centroid Classification for Numeric Data Using C#

Here's a complete end-to-end demo of what Dr. James McCaffrey of Microsoft Research says is arguably the simplest possible classification technique.

Regression Using LightGBM

Dr. James McCaffrey of Microsoft Research presents a full-code, step-by-step tutorial on this powerful machine learning technique used to predict a single numeric value.

Clustering Mixed Categorical and Numeric Data Using k-Means with C#

Dr. James McCaffrey of Microsoft Research presents a full-code, step-by-step tutorial on a "very tricky" machine learning technique.

Multi-Class Classification Using LightGBM

Dr. James McCaffrey of Microsoft Research provides a full-code, step-by-step machine learning tutorial on how to use the LightGBM system to perform multi-class classification using Python and the scikit-learn library.

Data Anomaly Detection Using a Neural Autoencoder with C#

Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.

Just for Fun: A Five-Card Poker Library Using C#

Chances are if you've had many coding interviews you've been presented with a poker problem. Here's a great take from Dr. James McCaffrey of Microsoft Research.

The t-SNE Data Visualization Technique from Scratch Using C#

Dr. James McCaffrey of Microsoft Research presents a full-code, step-by-step example of machine learning technique to visualize high-dimensional data.

Data Clustering Using a Self-Organizing Map (SOM) with C#

Dr. James McCaffrey of Microsoft Research presents a full-code, step-by-step tutorial on technique for visualizing and clustering data.

Principal Component Analysis from Scratch Using Singular Value Decomposition with C#

Dr. James McCaffrey of Microsoft Research presents a full-code, step-by-step tutorial on a classical ML technique that transforms a dataset into one with fewer columns, useful for creating a graph of data that has more than two columns, for example.

Matrix Inverse from Scratch Using SVD Decomposition with C#

Dr. James McCaffrey of Microsoft Research presents a full-code, step-by-step tutorial on an implementation of the technique that emphasizes simplicity and ease-of-modification over robustness and performance.

Machine Learning

Principal Component Analysis (PCA) from Scratch Using the Classical Technique with C#

Transforming a dataset into one with fewer columns is more complicated than it might seem, explains Dr. James McCaffrey of Microsoft Research in this full-code, step-by-step machine learning tutorial.

Matrix Inverse from Scratch Using QR Decomposition with C#

Dr. James McCaffrey of Microsoft Research guides you through a full-code, step-by-step tutorial on "one of the most important operations in machine learning."

Spectral Data Clustering from Scratch Using C#

Spectral clustering is quite complex, but it can reveal patterns in data that aren't revealed by other clustering techniques.

K-Means Data Clustering from Scratch Using C#

K-means is comparatively simple and works well with large datasets, but it assumes clusters are circular/spherical in shape, so it can only find simple cluster geometries.

DBSCAN Data Clustering from Scratch Using C#

Compared to other clustering techniques, DBSCAN does not require you to explicitly specify how many data clusters to use, explains Dr. James McCaffrey of Microsoft Research in this full-code, step-by-step machine language tutorial.

Gaussian Mixture Model Data Clustering from Scratch Using C#

Dr. James McCaffrey of Microsoft Research explains GMM clustering in a full-code, step-by-step tutorial, noting his data scientists colleagues have different opinions about the complicated technique.

Subscribe on YouTube