Tolerance for Multicollinearity Detection
Tolerance (T) is a diagnostic measure to assess the multicollinearity in regression models.
Tolerance is calculated as 1 − R2
, where R-Squared is the coefficient of determination obtained from the regression model. R-Squared
is obtained by regressing the predictor of interest on the remaining predictor variables in the model.
Tolerance is also a reciprocal of the variance inflation factor (VIF) which is calculated as 1/1 − R2
.
For example, if the R-Squared for some predictor is 0, then the variance of the remaining predictors can not be predicted i.e. they are not correlated. This leads to values of VIF=1 and tolerance=1, and suggests no multicollinearity. Similarly, R-Squared = 1 leads to exact multicollinearity.
The following example explains how can you calculate the tolerance to detect the multicollinearity in R.
Input dataset
We will use the Boston dataset for regression analysis and calculating the tolerance for predictors.
This dataset contains the housing prices and various features influencing the housing prices.
# load data
data("Boston")
# view data
head(Boston)
crim zn indus chas nox rm age dis rad tax ptratio black lstat medv
1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0
2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6
3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2
6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21 28.7
The Boston dataset has all numerical predictors. The medv
is a target variable which indicates the median value for homes in $1000s .
We will use all numerical variables except medv
to fit the linear regression and calculate the tolerance.
Calculate tolerance
You can use the ols_coll_diag
function from the olsrr package to calculate the tolerance values for detecting
the Multicollinearity in the model.
# import package
# install.packages("olsrr")
library(olsrr)
# fit regression model
model <- lm(formula = medv ~ ., data = Boston)
# Calculate tolerance
ols_coll_diag(model)
# output
Tolerance and Variance Inflation Factor
---------------------------------------
Variables Tolerance VIF
1 crim 0.5579761 1.792192
2 zn 0.4350175 2.298758
3 indus 0.2505263 3.991596
4 chas 0.9311027 1.073995
5 nox 0.2275976 4.393720
6 rm 0.5171314 1.933744
7 age 0.3224948 3.100826
8 dis 0.2527841 3.955945
9 rad 0.1336095 7.484496
10 tax 0.1110056 9.008554
11 ptratio 0.5558384 1.799084
12 black 0.7415531 1.348521
13 lstat 0.3399636 2.941491
The output contains the tolerance and VIF values for all 13 predictors.
There are 5 predictors that have lower tolerance values <= 0.25 (suggesting moderate to strong multicollinearity).
It is ideal to remove one or more of these predictors from the model which causes strong multicollinearity. This will improve the stability and reliability of the regression models.