# DBSCAN vs. HDBSCAN. Which One You Should Use?

The two density-based clustering algorithms **DBSCAN** (Density-Based Spatial Clustering of Applications with Noise) and **HDBSCAN**
(Hierarchical Density-Based Spatial Clustering of Applications with Noise) share many similarities, but some key differences make it easier to choose the right one.

This article describes some key differences between DBSCAN and HDBSCAN and helps you choose the best algorithm for your clustering application.

## Distance scale `eps`

parameter

DBSCAN uses two main parameters viz. epsilon (`eps`

) and `minPts`

(`min_samples`

) for clustering. The `eps`

defines the maximum radius within which points are considered to belong to the same cluster, whereas
`minPts`

is a density threshold which is the minimum number of points required to form a core point (dense regions).

HDBSCAN is an extension of DBSCAN and uses three main parameters viz. `min_cluster_size`

, `min_samples`

, and `cluster_selection_epsilon`

which may have a significant effect on clustering. I have covered the details of each of these parameters
in this article.

In HDBSCAN, you can get optimal clustering using `min_cluster_size`

and `min_samples`

parameters. HDBSCAN searches all possible `eps`

parameters to find the optimal cluster.

Hence, HDBSCAN eliminates the need to set `eps`

parameter which makes HDBSCAN more easier and useful over DBSCAN.

## Clusters with variable densities

Density-based clustering algorithms can discover clusters of arbitrary shapes and effectively identify noise or outliers.

DBSCAN calculates the density around each data point and can identify clusters with high densities. However, it struggles to find clusters with varying densities.

HDBSCAN addresses this limitation and can find clusters with varying densities. This is mostly due to its ability to autotune `eps`

parameter. HDBSCAN constructs a hierarchy of clusters,
prunes it to find stable clusters, and detects clusters at different scales.

## Prediction of new points

The DBSCAN does not allow the prediction of new points based on the fitted model. There is no prediction function for DBSCAN in the scikit-learn for assigning new points to the clusters.

However, you can make predictions of new points using the HDBSCAN. You can use the `approximate_predict`

function from the DBSCAN package for making predictions of new points.

Please read this article which covers the prediction of new points using the HDBSCAN.

## Packages

Both DBSCAN and HDBSCAN are implemented in scikit-learn Python package. You can also use the hdbscan Python package for HDBSCAN.