SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Q. Cufflinks: sort order of reads in BAMs must be the same syintel87 Bioinformatics 17 02-09-2017 04:49 AM
Splitting a bed file in multiple bed files by track albireo Bioinformatics 3 12-18-2014 11:58 AM
Is there a BED file format validator? Does a BED file have to be sorted position? LauraSmith Bioinformatics 3 05-21-2013 11:54 AM
Sort fastq files in order of quality? naragam Bioinformatics 6 07-02-2012 04:56 AM
GATK error because of the order of reference chr. dkrtndhkd Bioinformatics 5 03-20-2012 07:34 AM

Reply
 
Thread Tools
Old 07-19-2013, 04:57 AM   #1
gmarco
Member
 
Location: Spain

Join Date: Oct 2012
Posts: 36
Unhappy Sort bed file by chr order and not lexicographically

I've been trying to sort a bed file with bedtools and bedops.

Both seem to sort the bed file lexicographically.

That means if my input test file is:

input:
Code:
chr1 10 30
chr2 30 10
chr3 10 30
chr13 30 40
chr5 10 20
output:

Code:
chr1 10 30
chr 13 30 40
chr2 30 10
chr3 10 30
chr5 10 20
and desired output is:

Code:
chr1 10 30
chr2 30 10
chr3 10 30
chr5 10 20
chr13 30 40
any ideas how to achieve that?
gmarco is offline   Reply With Quote
Old 07-19-2013, 05:06 AM   #2
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

I convert the chrX and chrY to chr23 and chr24 with "sed -e 's,chrX,chr23,' -e 's,chrY,chr24,' input.file" and then pipe that into "sort -k 1.4,1n -k 2,2n" and then change the chr23 and chr24 back. This has to be altered if you have other "chr" values (M and all the small contigs).
Heisman is offline   Reply With Quote
Old 07-19-2013, 05:17 AM   #3
gmarco
Member
 
Location: Spain

Join Date: Oct 2012
Posts: 36
Default

Don't you have to also remove "chr" from line start to make the sort work?
gmarco is offline   Reply With Quote
Old 07-19-2013, 05:42 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

More here: http://www.biostars.org/p/64687/
GenoMax is offline   Reply With Quote
Old 07-19-2013, 08:22 AM   #5
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Quote:
Originally Posted by gmarco View Post
Don't you have to also remove "chr" from line start to make the sort work?
So if you 'sort -k 1.4,1n', the "1.4" tells it to start at the 4th character of the 1st column, so you don't have to remove the "chr". The thread GenoMax linked has other things worth looking at.

Also I typed this above ("sort -k 1.4,1n -k 2,2n"); it should be ("sort -k 1.4,1n -k 2,2n -k 3,3n").
Heisman is offline   Reply With Quote
Old 07-22-2013, 01:42 AM   #6
gmarco
Member
 
Location: Spain

Join Date: Oct 2012
Posts: 36
Default

Quote:
Originally Posted by Heisman View Post
I convert the chrX and chrY to chr23 and chr24 with "sed -e 's,chrX,chr23,' -e 's,chrY,chr24,' input.file" and then pipe that into "sort -k 1.4,1n -k 2,2n" and then change the chr23 and chr24 back. This has to be altered if you have other "chr" values (M and all the small contigs).
Hello Heisman,

I'm trying sed command with no success:

Code:
sed -e 's,chrX,chr23,' -e 's,chrY,chr24' input.bed
sed: -e expression #2, char 12: unterminated `s' command
Fixed missing a comma after chr24 in second sed expression.
gmarco is offline   Reply With Quote
Old 07-22-2013, 05:22 AM   #7
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Yeah, sorry about that. I probably make that typo more than any other.
Heisman is offline   Reply With Quote
Reply

Tags
bed, chromosome, lexicographically, sort

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO