Axis and Dimension in NumPy

Axis and dimensions are fundamental concepts in NumPy for understanding and manipulating the multi-dimensional arrays. This article describes the basics of array, dimension, and axis, and how to use them for manipulation multi-dimensional arrays. Array It is essential to understand array before learning about axes and dimensions. An array is a homogeneous container of numerical elements (int, float, or a combination of them). Example of a simple NumPy 1-dimensional (D) array:

DBSCAN vs. HDBSCAN. Which One You Should Use?

The two density-based clustering algorithms DBSCAN (Density-Based Spatial Clustering of Applications with Noise) and HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) share many similarities, but some key differences make it easier to choose the right one. This article describes some key differences between DBSCAN and HDBSCAN and helps you choose the best algorithm for your clustering application. Distance scale eps parameter DBSCAN uses two main parameters viz. epsilon (eps) and minPts (min_samples) for clustering.

Calculate VIF for Categorical Variable

Variance Inflation Factor (VIF) is a commonly used method for detecting multicollinearity in regression models. VIF is generally calculated for the continuous variables. When there are categorical variables in the dataset, the VIF calculation can be tricky, and we may need to consider additional metrics such as Generalized Variance Inflation Factor (GVIF) for evaluating the multicollinearity for categorical variables. GVIF is an extension of VIF and is generally used for detecting the multicollinearity in regression models which include categorical variables with more than two levels.

Condition Index for Multicollinearity Detection

Similar to the variance inflation factor (VIF), the Condition Index (CI) is used for detecting multicollinearity in regression models. The condition indices are calculated based on the eigenvalues (variance of linear combinations of the scaled matrix) of the predictors. The condition index is the square root of the ratio of the maximum eigenvalue of the predictors to each eigenvalue of the predictors. The largest condition index is also known as the condition number.

Tolerance for Multicollinearity Detection

Tolerance (T) is a diagnostic measure to assess the multicollinearity in regression models. Tolerance is calculated as 1 − R2, where R-Squared is the coefficient of determination obtained from the regression model. R-Squared is obtained by regressing the predictor of interest on the remaining predictor variables in the model. Tolerance is also a reciprocal of the variance inflation factor (VIF) which is calculated as 1/1 − R2. Tip Tolerance values closer to 1 indicate low multicollinearity, while values close to 0 indicate moderate to strong multicollinearity.

Python: Replace Value at Specific Index in List and Array

You can replace a value at a specific index in a list in Python by using direct indexing. The basic syntax for replacing a value at a specific index using direct indexing: # replacing the value at index 3 (remember in python index start at 0) ex_list[3] = 100 Tip In Python, lists and numpy arrays are mutable and you can modify list or array elements directly by updating the indexes.