How to Merge BED Files And Retain Other Columns

bedtools merge is a useful tool in bioinformatics for merging the overlapping or book-ended genomic intervals from the BED file.

Most of the time BED files contains the first three required columns (chrom, start, end). But, often there is a fourth name column for the feature annotation.

When you merge the BED file with four or more columns, the information is not retained from the fourth column in a merged file.

However, the bedtools merge function has additional parameters (-c and -o) for retaining the information for additional columns.

The following example explains how to use bedtools merge for merging the overlapping intervals from BED files and retaining the other additional column information.

The example BED files with an additional fourth column for feature annotation:

# BED file
cat file1.bed
chr1    10      100     exon1_f1
chr1    400     500     exon3_f1

cat file2.bed
chr1    50      200     exon1_f2
chr1    600     700     exon4_f2

These example BED files contain overlapping genomic intervals.

Now, merge these overlapping genomic intervals into a single interval from both BED files using bedtools merge.

Note
The bedtools merge command requires a sorted BED file by chromosome and start position. Please read this article on how to sort BED file effectively.
cat file1.bed file2.bed | bedtools sort | bedtools merge

# output

chr1    10      200
chr1    400     500
chr1    600     700

You can see that the overlapping genomic intervals are merged into a single interval. However, the name column (fourth column) information is not retained in the merged output.

To retain name column (fourth column) information, you can use -c parameter with bedtools merge. The -c parameter specifies which columns from the input BED Files to analyze with the -o parameter.

To retain name column (fourth column) information in output, we will use the collapse operator as a value for the -o parameter.

cat file1.bed file2.bed | bedtools sort | bedtools merge -c 4,4 -o collapse,collapse

# output

chr1    10      200     exon1_f1,exon1_f2       exon1_f1,exon1_f2
chr1    400     500     exon3_f1        exon3_f1
chr1    600     700     exon4_f2        exon4_f2

You can see that the output contains the merged intervals and feature annotation information from both BED files.

Similarly, you can also use bedtools merge to merge the overlapping intervals from a single BED file.

# BED file
cat file3.bed
chr1    100     200     exon1
chr1    150     300     exon1
chr1    500     600     exon2

bedtools merge -i file3.bed -c 4 -o collapse

# output
chr1    100     300     exon1,exon1
chr1    500     600     exon2

You can also bedtools merge command to filter out the overlapping regions and keep non-overlapping regions from the BED file.