Contents

How to Sort BED Files Effectively

BED (Browser Extensible Data) files are text-formatted files widely used in genomic interval analysis and visualization.

BED format represents genomic features such as genes, exons, cds, and other custom regions.

Various bioinformatics tools such as bedtools and UCSC genomic browser uses BED format for various genomic analysis tasks such as finding the overlapping genomic intervals, merging genomic intervals, and visualization of the genomic features.

BED files must be sorted by chromosome and start position for various analyses such as merging the intervals and visualization.

The BED files can be sorted using various methods such as bedtools sort and UNIX sort command.

The following examples explain how to use bedtools sort and sort command to sort the BED files.

bedtools sort

bedtools has a sort function to sort the BED files by chromosome and start position.

The example BED file with genomic intervals:

# BED file
cat file.bed
chr1    1       10
chr1    30      35
chr1    20      25
chr1    38      50

This example BED file is not sorted and we will sort it using the bedtools sort command.

bedtools sort -i file.bed

# output
chr1    1       10
chr1    20      25
chr1    30      35
chr1    38      50

You can see that the BED file is sorted by the chromosome and start position.

Tip
The bedtools sort command could be slow for large BED files. If you have a large BED file, it is recommended to use the UNIX sort command which is a faster and consumes less memory.

UNIX sort

In addition to bedtools sort, you can also use the UNIX sort command to sort the BED file by the chromosome and start position.

The example BED file with genomic intervals:

# BED file
cat file.bed
chr1    1       10
chr1    30      35
chr1    20      25
chr1    38      50

This example BED file is not sorted and we will sort it using the UNIX sort command.

sort -k1,1 -k2,2n file.bed

# output
chr1    1       10
chr1    20      25
chr1    30      35
chr1    38      50

You can see that the UNIX sort command sorted the BED file by the chromosome and start position.

Note
The above commands will not sort the BED file alphanumerically. If you want to sort BED file alphanumerically and with start and end coordinates, use the following command.

The following example shows how to sort a BED file alphanumerically considering chr, start, and end coordinates.

The example BED file with genomic intervals:

# BED file
cat file.bed
chr1    1       10
chr10   30      99
chr10   30      60
chr2    40      50
chrX    60      80
chrX    60      70

Sort the BED file alphanumerically considering chr, start, and end coordinates,

sort -k1,1V -k2,2n -k3,3n file.bed

# output
chr1    1       10
chr2    40      50
chr10   30      60
chr10   30      99
chrX    60      70
chrX    60      80

In the above command -k1,1V sort the chr column alphanumerically, -k2,2n sort the start position numerically, and -k3,3n sort the end position numerically.