Contents

Tolerance for Multicollinearity Detection

Tolerance (T) is a diagnostic measure to assess the multicollinearity in regression models.

Tolerance is calculated as 1 − R2, where R-Squared is the coefficient of determination obtained from the regression model. R-Squared is obtained by regressing the predictor of interest on the remaining predictor variables in the model.

Tolerance is also a reciprocal of the variance inflation factor (VIF) which is calculated as 1/1 − R2.

Tip
Tolerance values closer to 1 indicate low multicollinearity, while values close to 0 indicate moderate to strong multicollinearity. As a rule of thumb, a tolerance value below 0.25 indicates that there could be moderate to strong multicollinearity.

For example, if the R-Squared for some predictor is 0, then the variance of the remaining predictors can not be predicted i.e. they are not correlated. This leads to values of VIF=1 and tolerance=1, and suggests no multicollinearity. Similarly, R-Squared = 1 leads to exact multicollinearity.

The following example explains how can you calculate the tolerance to detect the multicollinearity in R.

Input dataset

We will use the Boston dataset for regression analysis and calculating the tolerance for predictors.

This dataset contains the housing prices and various features influencing the housing prices.

# load data
data("Boston")

# view data
head(Boston)

    crim zn indus chas   nox    rm  age    dis rad tax ptratio  black lstat medv
1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90  4.98 24.0
2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90  9.14 21.6
3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83  4.03 34.7
4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7 394.63  2.94 33.4
5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7 396.90  5.33 36.2
6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222    18.7 394.12  5.21 28.7

The Boston dataset has all numerical predictors. The medv is a target variable which indicates the median value for homes in $1000s .

We will use all numerical variables except medv to fit the linear regression and calculate the tolerance.

Calculate tolerance

You can use the ols_coll_diag function from the olsrr package to calculate the tolerance values for detecting the Multicollinearity in the model.

# import package 
# install.packages("olsrr")
library(olsrr)

# fit regression model
model <- lm(formula = medv ~ ., data = Boston)

# Calculate tolerance
ols_coll_diag(model)

# output
Tolerance and Variance Inflation Factor
---------------------------------------
   Variables Tolerance      VIF
1       crim 0.5579761 1.792192
2         zn 0.4350175 2.298758
3      indus 0.2505263 3.991596
4       chas 0.9311027 1.073995
5        nox 0.2275976 4.393720
6         rm 0.5171314 1.933744
7        age 0.3224948 3.100826
8        dis 0.2527841 3.955945
9        rad 0.1336095 7.484496
10       tax 0.1110056 9.008554
11   ptratio 0.5558384 1.799084
12     black 0.7415531 1.348521
13     lstat 0.3399636 2.941491

The output contains the tolerance and VIF values for all 13 predictors.

There are 5 predictors that have lower tolerance values <= 0.25 (suggesting moderate to strong multicollinearity).

It is ideal to remove one or more of these predictors from the model which causes strong multicollinearity. This will improve the stability and reliability of the regression models.