How to Filter Out Overlapping Regions in BED File
You can use bedtools merge
tools to filter out the overlapping regions (i.e. keep regions that are not overlapping) from the BED file.
For example, if you have the following BED file and you want to keep only non-overlapping regions from the BED file.
cat file.bed
# output
chr1 1 100 exon1
chr1 80 300 exon1
chr1 400 700 exon1
chr1 900 1000 exon1
This BED file (file.bed
) has four regions among which two regions overlap.
Here, you can use bedtools merge
to filter out the overlapping regions and only keep non-overlapping regions.
bedtools merge -i file.bed -c 2,4 -o count_distinct,collapse
# output
chr1 1 300 2 exon1,exon1
chr1 400 700 1 exon1
chr1 900 1000 1 exon1
In the above command, we used bedtools merge
command with -c
and -o
parameters. The -c
parameter defines the column number
and -o
parameter defines which function should be applied to that column.
Here, we counted the distinct regions (using count_distinct
) in column 2 and reported their features in column 4 using the collapse
.
Please read this article on using collapse
to retain the name column.
The output obtained from the above command is not in a BED format.
You can use unix pipe to redirect the output to awk
command to keep unique (non-overlapping regions) and feature names.
bedtools merge -i file.bed -c 2,4 -o count_distinct,collapse | awk '{OFS="\t"}{if ($4 == 1) print $1,$2, $3, $5}'
# output
chr1 400 700 exon1
chr1 900 1000 exon1
You can see that we have used bedtools merge
and awk
command to filter out the overlapping regions from the BED file.
If you want to save the output to a file, you can redirect the output to a file as below,
bedtools merge -i file.bed -c 2,4 -o count_distinct,collapse | awk '{OFS="\t"}{if ($4 == 1) print $1,$2, $3, $5}' > filtered.bed
# see the output
cat filtered.bed
chr1 400 700 exon1
chr1 900 1000 exon1