The Data Science Lab
Linear Regression with Two-Way Interactions Using JavaScript
Dr. James McCaffrey presents a complete end-to-end demonstration of linear regression with two-way interactions between predictor variables. Standard linear regression predicts a single numeric value based only on a linear combination of predictor values. Linear regression with interactions between predictor variables can handle more complex data while retaining a high level of model interpretability.
The goal of a machine learning regression problem is to predict a single numeric value. For example, you might want to predict an employee's bank account balance based on age, height, annual income, and so on. There are approximately a dozen common regression techniques. The most basic technique is called linear regression, sometimes called multiple linear regression, where the "multiple" term indicates two or more predictor variables.
The form of a basic linear regression prediction model without interactions is y' = (w0 * x0) + (w1 * x1) + . . . + (wn * xn) + b, where y' is the predicted value, the xi are predictor values, the wi are weights (also called coefficients), and b is the bias (also called the intercept).
The form of a linear regression model with two-way interactions is y' = (w0 * x0) + . . . + (wn * xn) + (w01 * x0 * x1) + (w02 * x0 * x2) + . . . + b. The interaction terms are the multiplication products of all combinations of pairs of the predictor variables.
Compared to basic linear regression, linear regression with two-way interactions can handle more complex data. Compared to other regression techniques that are designed to handle complex data, such as kernel ridge regression and neural network regression, linear regression with interactions often has slightly worse prediction accuracy, but has better model interpretability. This can be important in scenarios where interpretability is useful, or even required by law.
This article presents a demo of linear regression with two-way interactions, implemented from scratch, using the JavaScript language. A good way to see where this article is headed is to take a look at the screenshot in Figure 1. The demo program begins by loading synthetic training and test data into memory. The data looks like:
-0.1660, 0.4406, -0.9998, -0.3953, -0.7065, 0.4840
0.0776, -0.1616, 0.3704, -0.5911, 0.7562, 0.1568
-0.9452, 0.3409, -0.1654, 0.1174, -0.7192, 0.8054
0.9365, -0.3732, 0.3846, 0.7528, 0.7892, 0.1345
. . .
The first five values on each line are the x predictors. The last value on each line is the target y variable to predict. There are 200 training items and 40 test items.
The demo creates and trains a linear regression with two-way interactions model, evaluates the model accuracy on the training and test data, and then uses the model to predict the target y value for the first training item x = [-0.1660, 0.4406, -0.9998, -0.3953, -0.7065]. The predicted y is 0.5090 which is reasonably close to the target 0.4840 value.
The first part of the demo output shows how a linear regression with interactions model is created and trained:
Creating and training model
Setting SGD lrnRate = 0.001
Setting SGD maxEpochs = 100
epoch = 0 MSE = 0.1116 acc = 0.0000
epoch = 20 MSE = 0.0033 acc = 0.4400
epoch = 40 MSE = 0.0012 acc = 0.5950
epoch = 60 MSE = 0.0009 acc = 0.6750
epoch = 80 MSE = 0.0008 acc = 0.7000
Done
In theory, a linear regression with interactions model can be trained using a closed-form solution that involves computing a matrix pseudo-inverse. But in practice, a model is often trained using iterative stochastic gradient descent (SGD), which requires a learning rate and a maximum number of training epochs. The value of the MSE (mean squared error) slowly decreases, which indicates training is working.
The next part of the demo output shows the trained model weights and the bias:
Model base weights:
-0.2625 0.0341 -0.0460 0.0324 -0.1151
Model bias/intercept: 0.3619
Model interaction weights:
0.0000 0.0000 0.0000 0.0000 0.0000
-0.0008 0.0000 0.0000 0.0000 0.0000
0.0313 0.0107 0.0000 0.0000 0.0000
0.0169 -0.0104 0.0013 0.0000 0.0000
0.0952 0.0315 -0.0445 0.0003 0.0000
Because the demo data has five predictor variables, there are five basic weights. With two-way interactions, there are an additional 10 interaction weights: w01, w02, w03, w04, w12, w13, w14, w23, w24, w34. In general, if there are n predictor variables, there are (n * (n-1)) / 2 interaction weights. The demo stores the interaction weights in the lower left part of an n-by-n matrix where the row index is the first predictor and the column index is the second predictor. For instance, the -0.0104 value at [3][1] is the weight for the x3 and x1 interaction.
[Click on image for larger view.] Figure 1: Linear Regression with Two-Way Interactions in Action
The demo concludes by evaluating the trained model and making a prediction:
Train acc (within 0.15) = 0.8350
Test acc (within 0.15) = 0.8000
Train MSE = 0.0008
Test MSE = 0.0006
Predicting for x =
-0.1660 0.4406 -0.9998 -0.3953 -0.7065
Predicted y = 0.5090
The model accuracy is reasonably good compared to other regression techniques -- 83.50% accuracy on the training data (167 out of 200 correct) and 80.00% accuracy on the test data (32 out of 40 correct). A prediction is scored as correct if it's within 15% of the true target value.
This article assumes you have intermediate or better programming skill but doesn't assume you know anything about linear regression with two-way interactions. The demo is implemented using C# but you should be able to refactor the demo code to another C-family language if you wish. All normal error checking has been removed to keep the main ideas as clear as possible.
The source code for the demo program is too long to be presented in its entirety in this article. The complete code and data are available in the accompanying file download, and they're also available online.
The Demo Data
The demo data is synthetic. It was generated by a 5-10-1 neural network with random weights and bias values. The idea here is that the synthetic data is not just random -- it has an underlying, complex, non-linear structure which can be predicted.
All of the predictor values are between -1 and +1. When using linear regression with interactions, technically, it's not necessary to normalize/scale your data because data items are not compared using Euclidean distance (for instance, in k-nearest neighbors regression). But normalizing predictor values usually leads to a better prediction model, especially if some raw predictor values are very large (such as employee income) and some values are small (such as employee age). Also, using normalized data allows you to interpret the model weights more easily (larger magnitudes indicate greater effect on the predicted y value).
Linear regression with interactions is most often used with data that has strictly numeric predictor variables. It is possible to use the technique with categorical data that has an inherent order, using equal-interval encoding. For example, a predictor variable height with possible values (short, medium, tall) could be encoded as short = 0.25, medium = 0.50, tall = 0.75.
However, for categorical data without inherent order, such as color with possible values (red, blue, green), there's no obvious way to encode the values so that when multiplied together you get a meaningful number. That said, I have seen examples where non-ordered categorical predictor values were equal-interval encoded and the resulting prediction model worked quite well. But as far as I know, there are no solid research results that support or contradict the validity of this technique.
Understanding Linear Regression with Two-Way Interactions
Linear regression with two-way interactions is probably best understood by examining a concrete example. Suppose you have three predictor variables (instead of five as in the demo data). And suppose the predictor is a vector x with values (x0, x1, x2), and the base weights are w0, w1, w2, and the interaction weights are w01, w02, w12, and the bias is b. The prediction equation is y' = (w0 * x0) + (w1 * x1) + (w2 * x2) + (w01 * x0 * x1) + (w02 * x0 * x2) + (w12 * x1 * x2) + b.
If x = (x0, x1, x2) = (0.5, 0.7, 0.3), and w = (w0, w1, w2) = (0.4, 0.9, 1.5), and w01 = 1.1, w02 = 1.8, w12 = 2.2, and b = -2.1, then:
y' = (w0 * x0) + (w1 * x1) + (w2 * x2) +
(w01 * x0 * x1) +
(w02 * x0 * x2) +
(w12 * x1 * x2) + b
= (0.4 * 0.5) + (0.9 * 0.7) + (1.5 * 0.3) +
(1.1 * 0.5 * 0.7) +
(1.8 * 0.5 * 0.3) +
(2.2 * 0.7 * 0.3) + (-2.1)
= 0.20 + 0.63 + 0.45 + 0.385 + 0.270 + 0.462 - 2.1
= 0.297
The interaction effects are captured by multiplying predictor values. There are other ways to generate interaction effects but multiplication is simple and effective.
An obvious extension of two-way interactions between all possible pairs of predictor variables is to use three-way or four-way or greater interactions. However, anything greater than two-way interactions quickly becomes a complicated mess, and in my opinion, you're usually better off using a more sophisticated regression technique that is designed for all-way interactions, specifically kernel ridge regression or neural network regression. However, these approaches sacrifice model interpretability.
Training the Linear Regression with Interactions Model
The demo program uses standard stochastic gradient descent (SGD) optimization to train the regression model. In high-level pseudo-code:
initialize weights and bias to small random values
loop max_epochs times
shuffle the order of training data
loop each training item
get training inputs X
get training target value actual y
compute predicted y using curr weights
loop each weight and the bias
new_wt = old_wt * -1 * lrn_rate * (pred_y - actual_y) * x
end-loop
end-loop
end-loop
The idea is to adjust each weight so that the predicted y gets closer to the actual y. The (pred_y - actual_y) is a simple measure of how far off the predicted value is. If pred_y is larger than actual_y, you want to decrease the associated weight to make the pred_y smaller.
The learning rate, lrn_rate, is typically a value like 0.01 that lessens the amount of change in the weights so that there are no wild fluctuations. The learning rate must be determined by trial and error. If the learning rate is too large, SGD might jump past a good weight value. If the learning rate is too small, training might be too slow.
The -1 term controls the direction of the change in the weight being adjusted. If the difference term was (actual_y - pred_y), the -1 term could be dropped, but it's traditional to use (pred_y - actual_y). The x term in the update controls the direction of the weight change, depending on the sign of x, and the amount of change in the weight, depending on the magnitude of x.
The order in which the training data is processed is randomized in each epoch (one complete pass through all items). This prevents training from oscillating back and forth when one item's weight updates undoes the previous item's weight updates.
The Demo Program
To code the demo, I used Notepad. I like Notepad. The old Notepad that is. But you will probably want to use your regular editor of choice. Visual Studio Code is a popular choice with my colleagues.
I executed the demo program using the node.js runtime system. You can get node.js from nodejs.org/en/download. Any relatively recent version will work fine.
The overall program structure is presented in Listing 1. All the control logic is in the main() function. All the linear regression functionality is in a LinearRegressor class. The LinearRegressor class exposes five methods: a constructor(), predict(), train(), accuracy(), and meanSqError(). Methods next(), nextInt(), and shuffle() are helpers.
Listing 1: Overall Program Structure
// linear_regression_interactions.js
// linear regression with two-way interactions (using SGD)
// node.js
let FS = require("fs") // for loadTxt()
// ----------------------------------------------------------
class LinearRegressor
{
constructor(seed)
{
this.seed = seed + 0.5; // avoid 0
this.weights = null; // allocated in train()
this.interactionWts = null;
this.bias = 0.0;
}
// --------------------------------------------------------
predict(x) { . . }
train(trainX, trainY, lrnRate, maxEpochs) { . . }
accuracy(dataX, dataY, pctClose) { . . }
meanSqError(dataX, dataY) { . . }
next() { . . }
nextInt(lo, hi) { . . }
shuffle(indices) { . . }
}
// vector and matrix helper functions
function vecMake(n, val) { . . }
function matMake(nRows, nCols, val) { . . }
function vecShow(vec, dec, wid, nl) { . . }
function matShow(A, dec, wid) { . . }
function matToVec(M) { . . }
function loadTxt(fn, delimit, usecols, comment) { . . }
// ----------------------------------------------------------
function main()
{
console.log("Begin linear regression with two-way " +
"interactions using node.js JavaScript ");
// 1. load data
// 2. create and train linear regression model
// 3. evaluate model
// 4. use model
console.log("End demo");
}
main();
The LinearRegressor class has four fields. The weights, interactionWts, and bias fields define the behavior of the model. The seed field is used to generate quasi-random numbers for use by the shuffle() function to randomize the order in which the training items are processed, and to initialize the weights. The shuffle() function calls the nextInt() function, which calls the next() function, which accesses the seed field.
Loading the Data into Memory
The demo program starts by loading the 200-item training data into memory:
let trainFile = ".\\Data\\synthetic_train_200.txt";
let trainX = loadTxt(trainFile, ",", [0,1,2,3,4], "#");
let trainY = loadTxt(trainFile, ",", [5], "#");
trainY = matToVec(trainY);
The training X data is stored into an array-of-arrays style matrix. The data is assumed to be in a directory named Data, which is located in the project root directory. The arguments to the loadTxt() function mean load columns 0, 1, 2, 3, 4 where the data is comma-delimited, and lines beginning with "#" are comments to be ignored. The training y data in column [5] is loaded into a matrix and then converted to a one-dimensional vector using the matToVec() helper function.
Notice that during development, I use "let" instead of the preferable "const" when a variable does not change. This is not good practice for production code.
The 40-item test data is loaded into memory using the same pattern that was used to load the training data:
let testFile = ".\\Data\\synthetic_test_40.txt";
let testX = loadTxt(testFile, ",", [0,1,2,3,4], "#");
let testY = loadTxt(testFile, ",", [5], "#");
testY = matToVec(testY);
The first three training items are displayed with four decimals like so:
console.log("First three train X: ");
for (let i = 0; i < 3; ++i)
vecShow(trainX[i], 4, 8, true);
console.log("First three train y: ");
for (let i = 0; i < 3; ++i)
console.log(trainY[i].toFixed(4).toString().
padStart(9, ' '));
In a non-demo scenario, you might want to display all the training data to make sure it was correctly loaded into memory.
If you are running linear regression inside a Web page, you'll need to access training and test data in some other way. My blog post shows an example of a Web page that uses HTML FileReader to fetch client-side data, and displays output in an HTML textarea zone. Another of many possibilities is to fetch server-side data from a SQL database.
Creating and Training the Model
The linear regression model with two-way interactions is created like so:
let seed = 0;
console.log("Creating and training model ");
let model = new LinearRegressor(seed);
The constructor accepts only a seed value for the internal quasi-random number generator that's used by the train() method. The value of the seed parameter can have a significant effect on the accuracy of the resulting model, but you shouldn't vary the seed value to train your model. Next, the demo prepares the training parameters and calls the train() method:
let lrnRate = 0.001;
let maxEpochs = 100;
console.log("Setting SGD lrnRate = " + lrnRate.toFixed(3).toString());
console.log("Setting SGD maxEpochs = " + maxEpochs.toString());
model.train(trainX, trainY, lrnRate, maxEpochs);
console.log("Done ");
The values of lrnRate and maxEpochs must be determined by trial and error. In a non-demo scenario, you can programmatically try different values using a grid search.
After training has completed, the demo program displays the model weights and bias:
console.log("Model weights: ");
vecShow(model.wts, 4, 8, true);
console.log("Model bias: " + model.bias.toFixed(4).toString());
console.log("Model interaction weights: ");
matShow(model.interactionWts, 4, 8);
The demo program does not use regularization, which is one of several techniques that restrict the magnitude of the resulting model weights. Large model weights often lead to model overfitting when the model predicts well on the training data but poorly on new, previously unseen data. A simple form of regularization is weight decay. After every training epoch, all weights are reduced by multiplying each by a decay constant value like 0.995, where the decay constant must be determined by trial and error.
Both regular linear regression and linear regression with two-way interactions are usually much less susceptible to model overfitting than other regression techniques.
Evaluating and Using the Model
The demo program evaluates the trained model accuracy using these statements:
console.log("Computing model accuracy ");
let trainAcc = model.accuracy(trainX, trainY, 0.15);
let testAcc = model.accuracy(testX, testY, 0.15);
console.log("Train acc (within 0.15) = " +
trainAcc.toFixed(4).toString());
console.log("Test acc (within 0.15) = " +
testAcc.toFixed(4).toString());
The accuracy() method scores a prediction value as correct if it's within a specified percentage of the true target value. A reasonable closeness percentage to use will vary from problem to problem.
Accuracy is not a very granular metric. The demo computes and displays mean squared error, which is a more granular metric:
let trainMSE = model.meanSqError(trainX, trainY);
let testMSE = model.meanSqError(testX, testY);
console.log("Train MSE = " + trainMSE.toFixed(4).toString());
console.log("Test MSE = " + testMSE.toFixed(4).toString());
Mean squared error is best used to compare different models. Many of the regression modules in the Python language scikit-learn library use the coefficient of determination, R2, as the primary evaluation metric, but in my opinion mean squared error is easier to interpret. If you wish, you can easily implement a coefficient of determination method. See jamesmccaffreyblog.com/2025/09/23/an-example-of-coefficient-of-determination-using-javascript/.
The demo concludes by using the trained model to predict the y value for the first training item, (-0.1660, 0.4406, -0.9998, -0.3953, -0.7065):
let x = trainX[0];
console.log("Predicting for x = ");
vecShow(x, 4, 9, true); // add newline
let predY = model.predict(x);
console.log("Predicted y = " +
predY.toFixed(4).toString());
The predicted y value, 0.5090, is reasonably close (within six percent) to the true target value of 0.4840. This is a good result because, as mentioned previously, the synthetic demo data was generated by a neural network, which has complex interactions between predictor variables.
Wrapping Up
Linear regression with two-way interactions has a nice balance of prediction power and interpretability. The model weights/coefficients are easy to interpret. If the predictor values have been normalized to the same scale, larger magnitudes mean larger effect, and the sign of the weights indicate the direction of the effect. You should be a bit careful interpreting the interaction weights. If an interaction weight is positive, it can indicate an increase in y when the corresponding pair of predictor values are both positive or both negative.
Linear regression with two-way interactions is not always effective -- if it were, it would have replaced basic linear regression. Put another way, linear regression with two-way interactions can sometimes provide a big improvement in model quality for a relatively small investment in effort, and so it's usually worth exploring.