The Data Science Lab
Linear Regression Using JavaScript
Dr. James McCaffrey presents a complete end-to-end demonstration of linear regression using JavaScript. Linear regression is the simplest machine learning technique to predict a single numeric value, and a good way to establish baseline results for comparison with other more sophisticated regression techniques.
The goal of a machine learning regression problem is to predict a single numeric value. For example, you might want to predict an employee's salary based on age, height, years of experience, and so on. There are approximately a dozen common regression techniques. The most basic technique is called linear regression, or sometimes multiple linear regression, where the "multiple" indicates two or more predictor variables.
The form of a basic linear regression prediction model is y' = (w0 * x0) + (w1 * x1) + . . . + (wn * xn) + b, where y' is the predicted value, the xi are predictor values, the wi are weights (also called coefficients), and b is the bias (also called the intercept).
Compared to other regression techniques, such as kernel ridge regression and Gaussian process regression which are designed to handle complex data, linear regression often has slightly worse prediction accuracy, but has better model interpretability.
This article presents a demo of linear regression, implemented from scratch, using the JavaScript language. A good way to see where this article is headed is to take a look at the screenshot in Figure 1. The demo program begins by loading synthetic training and test data into memory. The data looks like:
-0.1660, 0.4406, -0.9998, -0.3953, -0.7065, 0.4840
0.0776, -0.1616, 0.3704, -0.5911, 0.7562, 0.1568
-0.9452, 0.3409, -0.1654, 0.1174, -0.7192, 0.8054
0.9365, -0.3732, 0.3846, 0.7528, 0.7892, 0.1345
. . .
The first five values on each line are the x predictors. The last value on each line is the target y variable to predict. There are 200 training items and 40 test items.
The demo creates and trains a linear regression model, evaluates the model accuracy on the training and test data, and then uses the model to predict the target y value for the first training item x = [-0.1660, 0.4406, -0.9998, -0.3953, -0.7065].
[Click on image for larger view.] Figure 1: Linear Regression Using JavaScript in Action
The first part of the demo output shows how a linear regression model is created and trained:
Creating and training model
Setting SGD lrnRate = 0.001
Setting SGD maxEpochs = 200
epoch = 0 MSE = 0.1095 acc = 0.0000
epoch = 40 MSE = 0.0027 acc = 0.5050
epoch = 80 MSE = 0.0026 acc = 0.4650
epoch = 120 MSE = 0.0026 acc = 0.4600
epoch = 160 MSE = 0.0026 acc = 0.4600
Done
In theory, a linear regression model can be trained using a closed-form solution that involves computing a matrix inverse. But in practice, a model is usually trained using iterative stochastic gradient descent (SGD), which requires a learning rate and a maximum number of training epochs. Notice that the value of the MSE (mean squared error) slowly decreases, which indicates training is working.
The next part of the demo output shows the trained model weights and the bias:
Model weights:
-0.2656 0.0333 -0.0453 0.0358 -0.1146
Model bias: 0.3620
Because the demo data has five predictor variables, there are five weights. The demo concludes by evaluating the trained model and making a prediction:
Computing model accuracy
Train acc (within 0.15) = 0.6400
Test acc (within 0.15) = 0.7750
Train MSE = 0.0026
Test MSE = 0.0020
Predicting for x =
-0.1660 0.4406 -0.9998 -0.3953 -0.7065
Predicted y = 0.5329
The model accuracy is only fair compared to many other regression techniques applied to the synthetic data -- 64.00% accuracy on the training data (128 out of 200 correct) and 77.50% accuracy on the test data (31 out of 40 correct). A prediction is scored as correct if it's within 15% of the true target value.
This article assumes you have intermediate or better programming skill but doesn't assume you know anything about linear regression. The demo is implemented using JavaScript running in a node.js environment, but you should be able to refactor the demo code to another C-family language if you wish. All normal error checking has been removed to keep the main ideas as clear as possible.
The source code for the demo program is too long to be presented in its entirety in this article. The complete code and data are available in the accompanying file download, and they're also available online.
The Demo Data
The demo data is synthetic. It was generated by a 5-10-1 neural network with random weights and bias values. The idea here is that the synthetic data does have an underlying, but complex, non-linear structure which can be predicted.
All of the predictor values are between -1 and +1. When using linear regression, technically, it's not necessary to normalize/scale your data. But normalizing usually leads to a better prediction model, especially if some raw predictor values are very large (such as employee income) and some values are small (such as employee age). Also, using normalized data allows you to interpret the model weights more easily (larger magnitudes mean more effect on the predicted y value).
The three most common techniques to normalize numeric data are min-max normalization, z-score normalization, and divide-by-constant normalization. When possible, I recommend divide-by-constant normalization. For example, if you have a predictor variable employee age, you could divide all age values by 100.
Linear regression is most often used with data that has strictly numeric predictor variables. It is possible to use the technique with categorical data that has an inherent order using equal-interval encoding. For example, a predictor variable height with possible values (short, medium, tall) could be encoded as short = 0.25, medium = 0.50, tall = 0.75.
For categorical data without inherent order, such as color with possible values (red, blue, green), you can use one-hot encoding. For example, if a predictor variable is color with possible values (red, blue, green), you can encode red = 100, blue = 010, green = 001.
Understanding Linear Regression
Linear regression is probably best understood by looking at a concrete example. For the demo data, suppose you want to predict the y value for x = (-0.1660, 0.4406, -0.9998, -0.3953, -0.7065), as in Figure 1. The trained model weights w are (-0.2656, 0.0333, -0.0453, 0.0358, -0.1146) and the model bias is b = 0.3620.
The predicted y is:
y' = (w0 * x0) + (w1 * x1) + (w2 * x2) + (w3 * x3) + (w4 * x4) + b
= (-0.2656 * -0.1660) +
( 0.0333 * 0.4406) +
(-0.0453 * -0.9998) +
( 0.0358 * -0.3953) +
(-0.1146 * -0.7065) + 0.3620
= 0.0441 + 0.0147 + 0.0453 + -0.0142 + 0.0810 +
0.3620
= 0.5329
In words, the predicted y is the sum of the weights times inputs, plus the bias. One of the major strengths of linear regression is interpretability. The magnitude of a weight is a measure of the importance of its associated predictor variable. The sign of a weight tells you the direction (increase or decrease) a change in the associated predictor will produce.
Training the Linear Regression Model Using SGD
The demo program uses standard stochastic gradient descent (SGD) optimization to train the regression model. In high-level pseudo-code:
initialize weights and bias to small random values
loop max_epochs times
shuffle order of training data
loop each training item
get training inputs X
get training target value actual y
compute predicted y using curr weights
loop each weight and the bias
new_wt = old_wt * -1 * lrn_rate * (pred_y - actual_y) * x
end-loop
end-loop
end-loop
The idea is to adjust each weight so that the predicted y gets closer to the actual y. The (pred_y - actual_y) is a simple measure of how far off the predicted value is. If pred_y is larger than actual_y, you want to decrease the associated weight to make the pred_y smaller.
The learning rate, lrn_rate, is typically a value like 0.01 which lessens the amount of change in the weights so that there are no wild fluctuations. The learning rate must be determined by trial and error. If the learning rate is too large, SGD might jump past a good weight value. If the learning rate is too small, training might be too slow.
The -1 term controls the direction of the change in the weight being adjusted. If the difference term was (actual_y - pred_y), the -1 term could be dropped, but it's traditional to use (pred_y - actual_y). The x term in the update controls the direction of the weight change, depending on the sign of x, and the amount of change in the weight, depending on the magnitude of x.
The order in which the training data is processed is randomized in each epoch (one complete pass through all items). This prevents training from oscillating back and forth when one item's weight updates undoes the previous item's weight updates.
The Demo Program
To code the demo, I used good old Notepad. I make no apologies. I like Notepad. But you will probably want to use your regular text editor of choice.
I ran the demo program using the node.js runtime system. You can get node.js from nodejs.org/en/download. Any relatively recent version will work fine.
The overall program structure is presented in Listing 1. All the control logic is in the main() function. All the linear regression functionality is in a LinearRegressor class. The LinearRegressor class exposes five methods: a constructor(), predict(), train(), accuracy(), and meanSqError(). Methods next(), nextInt(), and shuffle() are helpers.
Listing 1: Overall Program Structure
// linear_regression.js
// basic linear regression using SGD training
// node.js
let FS = require("fs") // for loadTxt()
// ----------------------------------------------------------
class LinearRegressor
{
constructor(seed)
{
this.seed = seed + 0.5; // avoid 0
this.wts; // allocated in train()
this.bias; // supplied in train()
}
predict(x) { . . }
train(trainX, trainY, lrnRate, maxEpochs) { . . }
accuracy(dataX, dataY, pctClose) { . . }
meanSqError(dataX, dataY) { . . }
next() { . . }
nextInt(lo, hi) { . . }
shuffle(indices) { . . }
}
// ----------------------------------------------------------
// vector and matrix functions :
function vecMake(n, val) { . . }
function matMake(nRows, nCols, val) { . . }
function vecShow(vec, dec, wid, nl) { . . }
function matShow(A, dec, wid) { . . }
function matToVec(m) { . . }
function loadTxt(fn, delimit, usecols, comment) { . . }
// ----------------------------------------------------------
function main()
{
console.log("Begin basic linear regression " +
"with SGD training using node.js JavaScript ");
// 1. load data from file into memory
// 2. create and train linear regression model
// 3. evaluate model
// 4. use model
console.log("End demo");
}
main();
The LinearRegressor class has three fields. The wts and bias fields define the behavior of the model. The seed field is used to generate quasi-random numbers for use by the shuffle() function to randomize the order in which the training items are processed. The shuffle() function calls the nextInt() function, which calls the next() function, which accesses the seed field.
Loading the Data into Memory
The demo program starts by loading the 200-item training data into memory:
let trainFile = ".\\Data\\synthetic_train_200.txt";
let trainX = loadTxt(trainFile, ",", [0,1,2,3,4], "#");
let trainY = loadTxt(trainFile, ",", [5], "#");
trainY = matToVec(trainY);
The training X data is stored into an array-of-arrays style matrix of type double. The data is assumed to be in a directory named Data, which is located in the project root directory. The arguments to the loadTxt() function mean load columns 0, 1, 2, 3, 4 where the data is comma-delimited, and lines beginning with "#" are comments to be ignored. The training y data in column [5] is loaded into a matrix and then converted to a one-dimensional vector using the matToVec() helper function.
The 40-item test data is loaded into memory using the same pattern that was used to load the training data:
let testFile = ".\\Data\\synthetic_test_40.txt";
let testX = loadTxt(testFile, ",", [0,1,2,3,4], "#");
let testY = loadTxt(testFile, ",", [5], "#");
testY = matToVec(testY);
The first three training items are displayed with four decimals like so:
console.log("First three train X: ");
for (let i = 0; i < 3; ++i)
vecShow(trainX[i], 4, 8, true);
console.log("First three train y: ");
for (let i = 0; i < 3; ++i)
console.log(trainY[i].toFixed(4).toString().
padStart(9, ' '));
In a non-demo scenario, you might want to display all the training data to make sure it was correctly loaded into memory.
If you are running linear regression inside a web page, you'll need to access training and test data in some other way. My blog post shows an example of a web page that uses HTML FileReader to fetch client-side data, and displays output in an HTML textarea zone. Another possibility is to fetch server-side data from a SQL database.
Creating and Training the Model
The linear regression with two-way interactions is created like so:
let seed = 0;
console.log("Creating and training model ");
let model = new LinearRegressor(seed);
The constructor accepts only a seed value for the internal quasi-random number generator that's used by the train() method. The value of the seed parameter can have a significant effect on the accuracy of the resulting model, but you shouldn't vary the seed value to train your model. Next, the demo prepares the training parameters and calls the train() method:
let lrnRate = 0.001;
let maxEpochs = 200;
console.log("Setting SGD lrnRate = " +
lrnRate.toFixed(3).toString());
console.log("Setting SGD maxEpochs = " +
maxEpochs.toString());
model.train(trainX, trainY, lrnRate, maxEpochs);
console.log("Done ");
The values of lrnRate and maxEpochs must be determined by trial and error. In a non-demo scenario, you can programmatically try different values using a grid search.
After training has completed, the demo program displays the model weights and bias:
console.log("Model weights: ");
vecShow(model.wts, 4, 8, true);
console.log("Model bias: " +
model.bias.toFixed(4).toString());
The demo program does not use regularization, which is one of several techniques that restrict the magnitude of the resulting model weights. Large model weights often lead to model overfitting when the model predicts well on the training data but poorly on new, previously unseen data. A simple form of regularization is weight decay. After every training epoch, all weights are reduced by multiplying each by a decay constant value like 0.995, where the decay constant must be determined by trial and error.
Evaluating and Using the Model
The demo program evaluates the trained model accuracy using these statements:
console.log("Computing model accuracy ");
let trainAcc = model.accuracy(trainX, trainY, 0.15);
let testAcc = model.accuracy(testX, testY, 0.15);
console.log("Train acc (within 0.15) = " +
trainAcc.toFixed(4).toString());
console.log("Test acc (within 0.15) = " +
testAcc.toFixed(4).toString());
The accuracy() method scores a prediction value as correct if it's within a specified percentage of the true target value. A reasonable closeness percentage to use will vary from problem to problem.
Accuracy is not a very granular metric. The demo computes and displays mean squared error, which is a more granular metric:
let trainMSE = model.meanSqError(trainX, trainY);
let testMSE = model.meanSqError(testX, testY);
console.log("Train MSE = " +
trainMSE.toFixed(4).toString());
console.log("Test MSE = " +
testMSE.toFixed(4).toString());
Mean squared error is best used to compare different models. Many of the regression modules in the Python language scikit-learn library use the coefficient of determination, R-squared, as the primary evaluation metric, but in my opinion mean squared error is easier to interpret. If you wish, you can easily implement a coefficient of determination method by using the meanSqError() method as a template.
The demo concludes by using the trained model to predict the y value for the first training item, (-0.1660, 0.4406, -0.9998, -0.3953, -0.7065):
let x = trainX[0];
console.log("\nPredicting for x = ");
vecShow(x, 4, 9, true); // add newline
let predY = model.predict(x);
console.log("Predicted y = " +
predY.toFixed(4).toString());
The predicted y value, 0.5329, is reasonably close (within 10%) to the true target value of 0.4840. This is a good result because, as mentioned previously, the synthetic demo data was generated by a neural network, which has complex interactions between predictor variables.
Wrapping Up
Linear regression is the simplest form of machine learning prediction. Linear regression works well when the source data is linear, meaning the relationship between predictor variables and the target variable can be defined by the math equation y' = (w0 * x0) + . . + (wn * xn) + b. In most cases, you won't know beforehand if your data is linear.
In many scenarios, it's a good idea to start with linear regression. If your linear regression model does not predict well, then you can try more complex math-based techniques including k-nearest neighbors regression, linear regression with interactions, kernel ridge regression, and neural network regression. You can also try tree-based regression techniques including decision tree regression, random forest regression, and gradient boosting regression. All of these techniques will be explained in future Visual Studio Magazine articles.