Does anyone know a program which can split BED file according to the chromosome? I have generate a BED file which contains the data for all chromosome, but it is not sorted. When I did sorting using BedSort, the output was not ordered according the numeric order, it always give chr10 on the top and then followed chr11, up to chr19. It seems I have to do the sorting for each chr respectively, I wonder whether there is a program which can split BED file according to the chromosome. Thanks
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
You could try the following with your bed file:
Code:sort -k 1V,1 -k 2n,2 file.bed -o file.sorted.bed
Code:mkdir -p split_results for chr in `cut -f 1 file.bed | sort | uniq`; do grep -w $chr file.bed > split_results/$chr.output.bed done
-
Similar to adamdeluca's suggestion, here is another simple awk solution. Note that the ">>" creates and appends to files named CHROM.bed, where CHROM is column 1 of the bed input bed file (in this case, example.bed).
So, in plain English, the awk command prints each entire line ($0) from example.bed to distinct files that are each named by the chrom field ($1).
This strategy is useful in many other cases where you want to do a context-based "grep", and route the results to distinct files.
Code:$ awk '{print $0 >> $1".bed"}' example.bed $ ls -1 *.bed chr1.bed chr2.bed ... (snip) chrY.bed example.bed
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
57 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
51 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
56 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment