Contents

Condition Index for Multicollinearity Detection

Similar to the variance inflation factor (VIF), the Condition Index (CI) is used for detecting multicollinearity in regression models.

The condition indices are calculated based on the eigenvalues (variance of linear combinations of the scaled matrix) of the predictors. The condition index is the square root of the ratio of the maximum eigenvalue of the predictors to each eigenvalue of the predictors.

The largest condition index is also known as the condition number.

Tip
As a thumb of rule, a condition index above 30 suggests strong multicollinearity, while values between 10 and 30 indicate the presence of low to moderate multicollinearity.

The following example explains how can you calculate the condition index to detect the multicollinearity in R.

Input dataset

We will use the Boston dataset for regression analysis and calculating the condition index for predictors.

This dataset contains the housing prices and various features influencing the housing prices.

# load data
data("Boston")

# view data
head(Boston)

    crim zn indus chas   nox    rm  age    dis rad tax ptratio  black lstat medv
1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90  4.98 24.0
2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90  9.14 21.6
3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83  4.03 34.7
4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7 394.63  2.94 33.4
5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7 396.90  5.33 36.2
6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222    18.7 394.12  5.21 28.7

The Boston dataset has all numerical predictors. The medv is a target variable which indicates the median value for homes in $1000s .

We will use all numerical variables except medv to fit the linear regression and calculate the condition index.

Calculate condition index

You can use the cond.index function from the klaR package to calculate the condition index values for detecting the Multicollinearity in the model.

# import package 
# install.packages("klaR")
library(klaR)

# Calculate condition index
cond.index(formula = medv ~ ., data = Boston)

# output
 [1]  1.000000  2.516245  3.242633  3.905405  6.469852  7.785597  9.651169 11.640227 15.580784 19.840440 27.662710 28.912366 37.417710 87.318288

The output contains the condition indices for all predictors along with for intercept (1.00).

There are 7 predictors which have condition indices > 10 (suggesting moderate multicollinearity) and 2 predictors that have condition indices > 30 (suggesting strong multicollinearity).

In addition, you can also perform the VIF and pairwise correlation analysis to identify the predictors causing the strong multicollinearity.

It is ideal to remove or combine the predictors from the model which cause strong Multicollinearity. This will improve the stability and reliability of the regression models.