Bedtools: Merge Genomic Intervals With Minimum Overlap

bedtools merge is a useful tool in bioinformatics for merging the overlapping or book-ended genomic intervals from the BED file.

By default, bedtools merge function merge overlapping (by at least 1 bp) interval, but you can control the maximum distance between the two intervals using the -d parameter for merging book-ended genomic intervals.

The -d parameter can also be used to specify the minimum overlap between the genomic intervals by giving the negative value equal to the amount of overlap.

The basic syntax for bedtools merge for different scenarios:

# merge intervals with at least 1 bp Overlap
bedtools merge -i file.bed

# merge intervals if two intervals separated by 10 bp
bedtools merge -i file.bed -d 10

# merge intervals if two intervals have at least 10 bp
bedtools merge -i file.bed -d -10

Where, file.bed contains the genomic intervals in BED format.

Note
The bedtools merge command requires a sorted BED file by chromosome and start position. Please read this article on how to sort BED file effectively.

The following examples focus on how to use bedtools merge for merging the genomic intervals that have a minimum amount of overlap.

The example BED file with genomic intervals:

# BED file
cat file.bed
chr1    1       10
chr1    5       15
chr1    30      40
chr1    38      50

The file.bed contains the four genomic intervals with and without overlap.

If we run bedtools merge as default, the intervals that have at least 1 bp overlap will be merged.

bedtools merge -i file.bed

# output
chr1    1       15
chr1    30      50

You can see that the intervals which had at least 1 bp overlap got merged into a single interval.

If you want to merge intervals that have at least 5 bp overlap, you can use the -d parameter with a negative value of 5.

bedtools merge -i file.bed -d -5

# output
chr1    1       15
chr1    30      40
chr1    38      50

You can see that the intervals which had at least 5 bp overlap got merged into a single interval.

Similalry, if you want to merge merge intervals that have at least 2 bp overlap, you can use the -d parameter with a negative value of 2.

bedtools merge -i file.bed -d -2

# output
chr1    1       15
chr1    30      50

You can see that the intervals which had at least 2 bp overlap got merged into a single interval.

In addition to merging the intervals based on overlap, you can also use the name column in the BED file to merge the intervals.