Condition Index for Multicollinearity Detection
Similar to the variance inflation factor (VIF), the Condition Index (CI) is used for detecting multicollinearity in regression models.
The condition indices are calculated based on the eigenvalues (variance of linear combinations of the scaled matrix) of the predictors. The condition index is the square root of the ratio of the maximum eigenvalue of the predictors to each eigenvalue of the predictors.
The largest condition index is also known as the condition number.
The following example explains how can you calculate the condition index to detect the multicollinearity in R.
Input dataset
We will use the Boston dataset for regression analysis and calculating the condition index for predictors.
This dataset contains the housing prices and various features influencing the housing prices.
# load data
data("Boston")
# view data
head(Boston)
crim zn indus chas nox rm age dis rad tax ptratio black lstat medv
1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0
2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6
3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2
6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21 28.7
The Boston dataset has all numerical predictors. The medv
is a target variable which indicates the median value for homes in $1000s .
We will use all numerical variables except medv
to fit the linear regression and calculate the condition index.
Calculate condition index
You can use the cond.index
function from the klaR package to calculate the condition index values for detecting
the Multicollinearity in the model.
# import package
# install.packages("klaR")
library(klaR)
# Calculate condition index
cond.index(formula = medv ~ ., data = Boston)
# output
[1] 1.000000 2.516245 3.242633 3.905405 6.469852 7.785597 9.651169 11.640227 15.580784 19.840440 27.662710 28.912366 37.417710 87.318288
The output contains the condition indices for all predictors along with for intercept (1.00).
There are 7 predictors which have condition indices > 10 (suggesting moderate multicollinearity) and 2 predictors that have condition indices > 30 (suggesting strong multicollinearity).
In addition, you can also perform the VIF and pairwise correlation analysis to identify the predictors causing the strong multicollinearity.
It is ideal to remove or combine the predictors from the model which cause strong Multicollinearity. This will improve the stability and reliability of the regression models.