The Data Science Lab


Principal Component Analysis from Scratch Using Singular Value Decomposition with C#

Dr. James McCaffrey of Microsoft Research presents a full-code, step-by-step tutorial on a classical ML technique that transforms a dataset into one with fewer columns, useful for creating a graph of data that has more than two columns, for example.

Matrix Inverse from Scratch Using SVD Decomposition with C#

Dr. James McCaffrey of Microsoft Research presents a full-code, step-by-step tutorial on an implementation of the technique that emphasizes simplicity and ease-of-modification over robustness and performance.

Machine Learning

Principal Component Analysis (PCA) from Scratch Using the Classical Technique with C#

Transforming a dataset into one with fewer columns is more complicated than it might seem, explains Dr. James McCaffrey of Microsoft Research in this full-code, step-by-step machine learning tutorial.

Matrix Inverse from Scratch Using QR Decomposition with C#

Dr. James McCaffrey of Microsoft Research guides you through a full-code, step-by-step tutorial on "one of the most important operations in machine learning."

Spectral Data Clustering from Scratch Using C#

Spectral clustering is quite complex, but it can reveal patterns in data that aren't revealed by other clustering techniques.

K-Means Data Clustering from Scratch Using C#

K-means is comparatively simple and works well with large datasets, but it assumes clusters are circular/spherical in shape, so it can only find simple cluster geometries.

DBSCAN Data Clustering from Scratch Using C#

Compared to other clustering techniques, DBSCAN does not require you to explicitly specify how many data clusters to use, explains Dr. James McCaffrey of Microsoft Research in this full-code, step-by-step machine language tutorial.

Gaussian Mixture Model Data Clustering from Scratch Using C#

Dr. James McCaffrey of Microsoft Research explains GMM clustering in a full-code, step-by-step tutorial, noting his data scientists colleagues have different opinions about the complicated technique.

Neural Network Regression from Scratch Using C#

Compared to other regression techniques, a well-tuned neural network regression system can produce the most accurate prediction model, says Dr. James McCaffrey of Microsoft Research in presenting this full-code, step-by-step tutorial.

Decision Tree Regression from Scratch Using C#

Dr. James McCaffrey of Microsoft Research says the technique is easy to tune, works well with small datasets and produces highly interpretable predictions, but there are also trade-off cons.

Gaussian Process Regression from Scratch Using C#

GPR works well with small datasets and generates a metric of confidence of a predicted result, but it's moderately complex and the results are not easily interpretable, says Dr. James McCaffrey of Microsoft Research in this full-code tutorial.

Blue Squares Floating Small

Weighted k-Nearest Neighbors Regression Using C#

The main advantages of KNNR are simplicity and interpretability, says Dr. James McCaffrey of Microsoft Research in presenting this full-code, step-by-step tutorial.

Kernel Ridge Regression Using C#

KRR is especially useful when there is limited training data, says Dr. James McCaffrey of Microsoft Research in this full-code, step-by-step tutorial.

Linear Ridge Regression Using C#

Implementing LRR from scratch is harder than using a library like scikit-learn, but it helps you customize your code, makes it easier to integrate with other systems, and gives you a complete understanding of how LRR works.

Gaussian Process Regression Using the scikit Library

Dr. James McCaffrey of Microsoft Research offers a full-code, step-by-step tutorial for this technique, especially useful when there is limited training data.

Nebula

Regression Using scikit Kernel Ridge Regression

Dr. James McCaffrey of Microsoft Research presents a full-code, step-by-step tutorial on this regression technique, which is especially useful when there is limited training data.

Space

Binary Classification Using a scikit Neural Network

Machine learning with neural networks is sometimes said to be part art and part science. Dr. James McCaffrey of Microsoft Research teaches both with a full-code, step-by-step tutorial.

Gaussian Naive Bayes Classification Using the scikit Library

Dr. James McCaffrey of Microsoft Research says the main advantage of using Gaussian naive Bayes classification compared to other techniques like decision trees or neural networks is that you don't have to fine tune model parameters.

Classification Using the scikit k-Nearest Neighbors Module

Dr. James McCaffrey of Microsoft Research uses a full-code, step-by-step demo to predict the species of a wheat seed based on seven predictor variables such as seed length, width and perimeter.

Regression Using a scikit MLPRegressor Neural Network

Dr. James McCaffrey of Microsoft Research uses a full-code, step-by-step demo to show how to predict the annual income of a person based on their sex, age, state where they live and political leaning.

Multinomial Naive Bayes Classification Using the scikit Library

A full-code demo from Dr. James McCaffrey of Microsoft Research shows how to predict the type of a college course by analyzing grade counts for each type of course.

Matrix

Multi-Class Classification Using a scikit Neural Network

Dr. James McCaffrey of Microsoft Research says a neural network model is arguably the most powerful multi-class classification technique.

Multi-Class Classification Using a scikit Decision Tree

Decision trees are useful for relatively small datasets that have a relatively simple underlying structure, and when the trained model must be easily interpretable, explains Dr. James McCaffrey of Microsoft Research, who provides step-by-step instructions and full source code.

Nebula

Naive Bayes Classification Using the scikit Library

Dr. James McCaffrey of Microsoft Research shows how to predict a person's sex based on their job type, eye color and country of residence.

Binary Classification Using a scikit Decision Tree

Dr. James McCaffrey of Microsoft Research says decision trees are useful for relatively small datasets and when the trained model must be easily interpretable, but often don't work well with large data sets and can be susceptible to model overfitting.

Subscribe on YouTube