Bedtools: Merge Genomic Intervals With Minimum Overlap
bedtools merge
is a useful tool in bioinformatics for merging the overlapping or book-ended genomic intervals from the BED file.
By default, bedtools merge
function merge overlapping (by at least 1 bp) interval, but you can control the maximum distance between the two intervals using the -d
parameter for merging book-ended genomic intervals.
The -d
parameter can also be used to specify the minimum overlap between the genomic intervals by giving the negative value equal to the amount of overlap.
The basic syntax for bedtools merge
for different scenarios:
# merge intervals with at least 1 bp Overlap
bedtools merge -i file.bed
# merge intervals if two intervals separated by 10 bp
bedtools merge -i file.bed -d 10
# merge intervals if two intervals have at least 10 bp
bedtools merge -i file.bed -d -10
Where, file.bed
contains the genomic intervals in BED format.
bedtools merge
command requires a sorted BED file by chromosome and start position. Please read this article on how to sort BED file effectively.The following examples focus on how to use bedtools merge
for merging the genomic intervals that have a minimum amount of overlap.
The example BED file with genomic intervals:
# BED file
cat file.bed
chr1 1 10
chr1 5 15
chr1 30 40
chr1 38 50
The file.bed
contains the four genomic intervals with and without overlap.
If we run bedtools merge
as default, the intervals that have at least 1 bp overlap will be merged.
bedtools merge -i file.bed
# output
chr1 1 15
chr1 30 50
You can see that the intervals which had at least 1 bp overlap got merged into a single interval.
If you want to merge intervals that have at least 5 bp overlap, you can use the -d
parameter with a negative value of 5.
bedtools merge -i file.bed -d -5
# output
chr1 1 15
chr1 30 40
chr1 38 50
You can see that the intervals which had at least 5 bp overlap got merged into a single interval.
Similalry, if you want to merge merge intervals that have at least 2 bp overlap, you can use the -d
parameter with a negative value of 2.
bedtools merge -i file.bed -d -2
# output
chr1 1 15
chr1 30 50
You can see that the intervals which had at least 2 bp overlap got merged into a single interval.
In addition to merging the intervals based on overlap, you can also use the name column in the BED file to merge the intervals.