SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
another stupid question about dindel libiyagirl Bioinformatics 7 07-30-2012 07:38 AM
Newbie. May ask a lot of stupid questions. entropy Introductions 1 03-07-2011 06:12 PM
BWA mapping colorspace reads Todd Scheetz SOLiD 2 08-25-2010 07:16 PM
sam output from bwa for SOLiD reads in colorspace? nisha SOLiD 19 01-07-2010 05:05 AM
sam output from bwa colorspace alignment Mr Mutundes Bioinformatics 0 12-15-2009 04:02 AM

Reply
 
Thread Tools
Old 09-19-2010, 08:11 PM   #1
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default bwa of colorspace: obviously I am doing something stupid

I'm trying to run bwa (0.5.8a, the last version on SourceForge) on a 64-bit box running Enterprise Oracle Linux with colorspace data (using this guide) & am getting an obviously bogus result. These reads are known to align with Bowtie; here is one

Code:
@SRR040290.24278 VAB_ugc_85__100_137__138_121__123_bc_Frag50_solid0032_20090715_ugc_121__1232_543_1118 length=50
T32310003110000202001010330012330302002022323210221
+SRR040290.24278 VAB_ugc_85__100_137__138_121__123_bc_Frag50_solid0032_20090715_ugc_121__1232_543_1118 length=50
!:8>>;=6=<9<9>>799;>::9:7<<=<<<7<:=8;7;:;/,81<)9;:&
First I indexed hg18 in colorspace
Code:
bwa index -a bwtsw -c hg18.fasta > build-cs.sh.out
Which produced STDERR output which seemed to indicate success
Code:
[bwa_index] Pack nucleotide FASTA... 41.87 sec
[bwa_index] Convert nucleotide PAC to color PAC... 16.71 sec
[bwa_index] Reverse the packed sequence... 11.21 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 1675.10 seconds elapse.
[bwa_index] Construct BWT for the reverse packed sequence...
[bwa_index] 1658.72 seconds elapse.
[bwa_index] Update BWT... 12.65 sec
[bwa_index] Update reverse BWT... 11.94 sec
[bwa_index] Construct SA from BWT and Occ... 616.15 sec
[bwa_index] Construct SA from reverse BWT and Occ... 618.90 sec
Then I have this code
Code:
bwa aln -c -t 8 ./hg18.fasta test-reads.fastq > test-reads.sai
bwa samse ./hg18.fasta test-reads.sai test-reads.fastq > test-reads.sam
with the usual sort of STDERR output
Code:
[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_aln_core] calculate SA coordinate... 0.00 sec
[bwa_aln_core] write to the disk... 0.00 sec
[bwa_aln_core] 482 sequences have been processed.
[bwa_aln_core] convert to sequence coordinate... 3.14 sec
[bwa_aln_core] refine gapped alignments... 1.45 sec
[bwa_aln_core] print alignments... 0.00 sec
[bwa_aln_core] 482 sequences have been processed.
with the rather peculiar SAM output of everything looking like the below (i.e. nothing aligned)
Code:
SRR040290.24278 4       *       0       0       *       *       0       0       TNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN     !:8>>;=6=<9<9>>799;>::9:7<<=<<<7<:=8;7;:;/,81<)9;:&
SRR040290.113515        4       *       0       0       *       *       0       0       TNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN     !5;2.25:15/2//77*:.48:9466195.<7949.:9)4)/-,1*1/4,4
SRR040290.129188        4       *       0       0       *       *       0       0       TNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN     !796878778;:<:8:5::95;<331:=397A;7;998)4:<;/,936&9.
SRR040290.182661        4       *       0       0       *       *       0       0       TNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN     !:==?A><8>>?<9>>>/>:<?/)?7>3:<<<)3=<>/;::<,<:31<282
SRR040290.188969        4       *       0       0       *       *       0       0       TNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN     !:<:<:?:>>9<<;<<(:57=2@<<<689>>8<<>=9:):8*4)7</<8&,
Any clue what I did wrong?
krobison is offline   Reply With Quote
Old 09-19-2010, 10:32 PM   #2
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Hi Krobison,

I had the exact same problem and the problem is that you have to double-encode your colorspace reads before mapping them with BWA. NovoalignCS, Bowtie and BFAST will take these as CSFASTQ as-is.
For BWA I ran a preprocessing step with a perl script to do the job then mapped the reads. Your example read with double encoding would look like this:

Code:
@SRR040290.24278 VAB_ugc_85__100_137__138_121__123_bc_Frag50_solid0032_20090715_ugc_121__1232_543_1118 length=50
TGTCAAATCCAAAAGAGAACACATTAACGTTATAGAAGAGGTGTGCAGGC
+SRR040290.24278 VAB_ugc_85__100_137__138_121__123_bc_Frag50_solid0032_20090715_ugc_121__1232_543_1118 length=50
:8>>;=6=<9<9>>799;>::9:7<<=<<<7<:=8;7;:;/,81<)9;:&

My code is below and only works for colorspace fastq. Just copy and paste if you would like to use it (no warranty ofcourse):

Code:
#!/usr/bin/perl -w
my $f=shift or die "No input";
if ($f =~ /.gz/) {
        open(IN,"zcat $f|") or die "$!";
}else {
        open(IN,$f) or die "$!";
}
while(<IN>) {
        my $h=$_;
        my $s=<IN>;
        my $c=<IN>;
        my $d=<IN>;
        my $seq= substr($s,1,length($s));
        my $qual= substr($d,1,length($d));
        $seq=~tr/[0123.]/[ACGTN]/;
        
        print $h;
        print $seq;
        print $c;
        print $qual;

}
close IN;

Last edited by zee; 09-19-2010 at 11:03 PM.
zee is offline   Reply With Quote
Old 09-20-2010, 05:31 AM   #3
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Thank you for the rapid & informative reply!

I must say "UGH!" though about rewriting colorspace so it masquerades as basespace. It will be too easy to get confused with that.
krobison is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:44 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO