Dr. James McCaffrey of Microsoft Research uses Python code samples and screenshots to explain naive Bayes classification, a machine learning technique used to predict the class of an item based on two or more categorical predictor variables, such as predicting the gender (0 = male, 1 = female) of a person based on occupation, eye color and nationality.
- By James McCaffrey
- 05/14/2019
Need to predict the political party affiliation (democrat, republican, independent) of a person based on their age, annual income, gender, years of education and so on? Our resident data scientist Dr. James McCaffrey shows a technique that can help with that and much more -- with code!
- By James McCaffrey
- 04/10/2019
Our resident doctor of data science this month tackles anomaly detection, using code samples and screenshots to explain the process of finding rare items in a dataset, such as discovering fraudulent login events or fake news items.
- By James McCaffrey
- 03/04/2019
The Data Science doctor delves into supporting vector machines, software systems that can perform binary classification such as creating a model to predict the gender of a person based on their age, annual income, height and weight.
- By James McCaffrey
- 03/04/2019
Dr. James McCaffrey of Microsoft Research uses a full project code sample and screenshots to detail how to use Python to work with self-organizing maps (SOM), which let you investigate the structure of a set of data.
- By James McCaffrey
- 01/15/2019
The Data Science Doctor explains how to use the reinforcement learning branch of machine learning with the Q-learning approach, providing code on how to solve a maze problem for an easy-to-understand example.
- By James McCaffrey
- 10/19/2018
Our resident data scientist provides a hands-on example on how to make a prediction that can be one of just two possible values, which requires a different set of techniques than classification problems where the value to predict can be one of three or more possible values.
- By James McCaffrey
- 08/30/2018
The Data Science Doctor provides a hands-on tutorial, complete with code samples, to explain one of the most common methods for image classification, deep neural network, used, for example, to identify a photograph of an animal as a "dog" or "cat" or "monkey."
- By James McCaffrey
- 06/25/2018
The data science doctor explains everything you need to know about clustering data, the process of grouping items so those in a group (cluster) are similar and items in different groups are dissimilar.
- By James McCaffrey
- 04/30/2018
Our Data Science Lab guru explains how to implement the k-means technique for data clustering, or cluster analysis, which is the process of grouping data items so that similar items belong to the same group/cluster.
- By James McCaffrey
- 03/27/2018
Go hands-on with data scientist Dr. James McCaffrey as he explains neural network dropout, a technique that can be used during training to reduce the likelihood of model overfitting.
- By James McCaffrey
- 02/26/2018
Learn how to do time series regression using a neural network, with "rolling window" data, coded from scratch, using Python.
- By James McCaffrey
- 02/02/2018
The data doctor continues his exploration of Python-based machine learning techniques, explaining binary classification using logistic regression, which he likes for its simplicity.
- By James McCaffrey
- 01/08/2018
The data science doctor continues his exploration of techniques used to reduce the likelihood of model overfitting, caused by training a neural network for too many iterations.
- By James McCaffrey
- 12/05/2017
Our resident data scientist explains how to train neural networks with two popular variations of the back-propagation technique: batch and online.
- By James McCaffrey
- 10/31/2017
Our data science expert continues his exploration of neural network programming, explaining how regularization addresses the problem of model overfitting, caused by network overtraining.
- By James McCaffrey
- 10/05/2017
With the help of Python and the NumPy add-on package, I'll explain how to implement back-propagation training using momentum.
- By James McCaffrey
- 08/15/2017
James McCaffrey uses cross entropy error via Python to train a neural network model for predicting a species of iris flower.
- By James McCaffrey
- 07/20/2017
You don't have to resort to writing C++ to work with popular machine learning libraries such as Microsoft's CNTK and Google's TensorFlow. Instead, we'll use some Python and NumPy to tackle the task of training neural networks.
- By James McCaffrey
- 06/15/2017
With Python and NumPy getting lots of exposure lately, I'll show how to use those tools to build a simple feed-forward neural network.
- By James McCaffrey
- 05/24/2017
Let's explore factor analysis again, this time using the R ability to tap into OOP, but we won't use the RC model.
- By James McCaffrey
- 05/02/2017
Let's use this classical statistics technique -- and some R, of course -- to get to some of the latent variables hiding in your data.
- By James McCaffrey
- 03/16/2017
Find the patterns in your data sets using these Clustering.R script tricks.
- By James McCaffrey
- 02/01/2017
The S3 OOP model is still widely used, so let's use write S3-style OOP code via the R language.
- By James McCaffrey
- 01/10/2017
I predict you'll find this logistic regression example with R to be helpful for gleaning useful information from common binary classification problems.
- By James McCaffrey
- 12/07/2016