How to Filter Out Overlapping Regions in BED File

2024-08-11 321 words 2 minutes

You can use bedtools merge tools to filter out the overlapping regions (i.e. keep regions that are not overlapping) from the BED file.

For example, if you have the following BED file and you want to keep only non-overlapping regions from the BED file.

cat file.bed

# output

chr1    1       100     exon1
chr1    80      300     exon1
chr1    400     700     exon1
chr1    900     1000    exon1

This BED file (file.bed) has four regions among which two regions overlap.

Here, you can use bedtools merge to filter out the overlapping regions and only keep non-overlapping regions.

bedtools merge -i file.bed -c 2,4 -o count_distinct,collapse 

# output

chr1    1       300     2       exon1,exon1
chr1    400     700     1       exon1
chr1    900     1000    1       exon1

In the above command, we used bedtools merge command with -c and -o parameters. The -c parameter defines the column number and -o parameter defines which function should be applied to that column.

Here, we counted the distinct regions (using count_distinct) in column 2 and reported their features in column 4 using the collapse. Please read this article on using collapse to retain the name column.

The output obtained from the above command is not in a BED format.

You can use unix pipe to redirect the output to awk command to keep unique (non-overlapping regions) and feature names.

bedtools merge -i file.bed -c 2,4 -o count_distinct,collapse | awk '{OFS="\t"}{if ($4 == 1) print $1,$2, $3, $5}'

# output

chr1    400     700     exon1
chr1    900     1000    exon1

You can see that we have used bedtools merge and awk command to filter out the overlapping regions from the BED file.

If you want to save the output to a file, you can redirect the output to a file as below,

bedtools merge -i file.bed -c 2,4 -o count_distinct,collapse | awk '{OFS="\t"}{if ($4 == 1) print $1,$2, $3, $5}' > filtered.bed

# see the output
cat filtered.bed

chr1    400     700     exon1
chr1    900     1000    exon1