SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > SOLiD



Similar Threads
Thread Thread Starter Forum Replies Last Post
Mapping SOLiD colorspace paired end reads NestorNotabilis SOLiD 10 12-12-2012 07:14 PM
how do I output the CS tag for BWA align of SOLID reads? KevinLam Bioinformatics 16 07-23-2011 11:06 PM
BWA mapping colorspace reads Todd Scheetz SOLiD 2 08-25-2010 07:16 PM
sam output from bwa colorspace alignment Mr Mutundes Bioinformatics 0 12-15-2009 04:02 AM

Reply
 
Thread Tools
Old 07-28-2009, 06:22 PM   #1
nisha
Junior Member
 
Location: US

Join Date: Jun 2009
Posts: 5
Default sam output from bwa for SOLiD reads in colorspace?

Hi all,

I'm using bwa for mapping SOLiD paired reads to the reference genome. After going through the bwa aln and bwa samse/sampe stages I get output in the SAM format. Is this SAM output in colorspace? If so are there tools to convert the SAM format to nucleotide space so that I can generate pile ups in nucleotide space?

Eg. of the output I'm getting for a mapped single-end read is as follows:

./fastq_files/Part_0:6_32_1000 0 chr6 112832228 37 49M = 112832228 0 TNTAGGTAGTGTATTAAATGGCGACAGGACTGGGGGACCCCAGCGCCAA @!9:79,676=*+98:&2(>;5&315+(9:41+8>58-5<18745;0)+ XT:A:U NM:i:3 X0:i:1 X1:i:0 XM:i:3 XO:i:0 XG:i:0 MD:Z:1T35A5T5

Here columns 10, 11 which report the query sequence and the qualities are shown in color space

Any help would be appreciated.

Thanks
N
nisha is offline   Reply With Quote
Old 07-28-2009, 07:02 PM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by nisha View Post
Hi all,

I'm using bwa for mapping SOLiD paired reads to the reference genome. After going through the bwa aln and bwa samse/sampe stages I get output in the SAM format. Is this SAM output in colorspace? If so are there tools to convert the SAM format to nucleotide space so that I can generate pile ups in nucleotide space?

Eg. of the output I'm getting for a mapped single-end read is as follows:

./fastq_files/Part_0:6_32_1000 0 chr6 112832228 37 49M = 112832228 0 TNTAGGTAGTGTATTAAATGGCGACAGGACTGGGGGACCCCAGCGCCAA @!9:79,676=*+98:&2(>;5&315+(9:41+8>58-5<18745;0)+ XT:A:U NM:i:3 X0:i:1 X1:i:0 XM:i:3 XO:i:0 XG:i:0 MD:Z:1T35A5T5

Here columns 10, 11 which report the query sequence and the qualities are shown in color space

Any help would be appreciated.

Thanks
N
I tried aligning the above read "TNTAGGTAGTGTATTAAATGGCGACAGGACTGGGGGACCCCAGCGCCAA" to the reference (both in cs and nt), but it did not match anywhere (some local homology) with high confidence so it looks like it is still in color space to me (double encoded). Did BWA not come with a tool to convert the output from color space to nt space (for example BFAST does this natively and MAQ has the "maq csmapnt" command)?
nilshomer is offline   Reply With Quote
Old 09-05-2009, 05:24 AM   #3
xgai
Junior Member
 
Location: Chicago

Join Date: May 2008
Posts: 9
Default

I have the same question here. Does anyone know the answer?
xgai is offline   Reply With Quote
Old 09-05-2009, 10:19 AM   #4
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Quote:
Originally Posted by xgai View Post
I have the same question here. Does anyone know the answer?
It reports mapped reads in nucleotide space for me. Unmapped reads are i CS. What does your output look like?
Chipper is offline   Reply With Quote
Old 09-05-2009, 11:55 AM   #5
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Chipper View Post
It reports mapped reads in nucleotide space for me. Unmapped reads are i CS. What does your output look like?
I am surprised that BWA and MAQ do not use the "CS" and "CZ" fields.
nilshomer is offline   Reply With Quote
Old 09-05-2009, 02:41 PM   #6
xgai
Junior Member
 
Location: Chicago

Join Date: May 2008
Posts: 9
Default

Does it? I am attaching a screenshot of the alignment (using tview). It just does not make sense to me. And the pileup file I got from "samtools pileup" command shows that the consensus is different than the reference sequence at almost every position..
xgai is offline   Reply With Quote
Old 09-05-2009, 03:01 PM   #7
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

I can't see the screenshot, but have you checked that you are using the correct fastQ format and an index in cs-format, and are using aln -c? I think I have made all these mistakes at some point with starnge results...
Chipper is offline   Reply With Quote
Old 09-05-2009, 03:29 PM   #8
xgai
Junior Member
 
Location: Chicago

Join Date: May 2008
Posts: 9
Default

Thanks, Chipper. I have not been able to attach it for some reason.

Regarding the NGS exercise, I might have done something wrong at some step then. There wasn't any error or warning along the way, so there was no clue. I tried to post the same question on the samtools-help list. I am copying it below and see if it helps you see my question better. Thanks in advance.

Can someone provide me some pointers regarding SAM format in color space and correct ways to use samtools for processing such SAM files, especially for SNP and indel calling? I looked everywhere but could not find any documentations.

Specifically, as an exercise, here is what I did:

- Simulated some SOLiD reads using wgsim (-c option) from a reference sequence.
- Generated the bwa index with the following command: bwa index -c ref.fa -a is
- Align the reads (in fastq format) back to the reference sequence using bwa:
bwa aln -c ref.fa r1.fq > r1.sai
bwa samse ref.fa r1.sai r1.fq > r1.sam

And I ran the usual faidx, import, sort, index, and pileup commands of samtools and they went smoothly with no errors or warnings. I can view it with samtools tview. Nonetheless, the pileup file just does not make sense to me, as the consensus sequence is almost different to the reference sequence at every position. And, tview seems to be showing the reads still in color space (double encoded?), which is hard or impossible to interpret for me.
xgai is offline   Reply With Quote
Old 09-05-2009, 03:56 PM   #9
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by xgai View Post
Thanks, Chipper. I have not been able to attach it for some reason.

Regarding the NGS exercise, I might have done something wrong at some step then. There wasn't any error or warning along the way, so there was no clue. I tried to post the same question on the samtools-help list. I am copying it below and see if it helps you see my question better. Thanks in advance.

Can someone provide me some pointers regarding SAM format in color space and correct ways to use samtools for processing such SAM files, especially for SNP and indel calling? I looked everywhere but could not find any documentations.

Specifically, as an exercise, here is what I did:

- Simulated some SOLiD reads using wgsim (-c option) from a reference sequence.
- Generated the bwa index with the following command: bwa index -c ref.fa -a is
- Align the reads (in fastq format) back to the reference sequence using bwa:
bwa aln -c ref.fa r1.fq > r1.sai
bwa samse ref.fa r1.sai r1.fq > r1.sam

And I ran the usual faidx, import, sort, index, and pileup commands of samtools and they went smoothly with no errors or warnings. I can view it with samtools tview. Nonetheless, the pileup file just does not make sense to me, as the consensus sequence is almost different to the reference sequence at every position. And, tview seems to be showing the reads still in color space (double encoded?), which is hard or impossible to interpret for me.
This might be the problem, could you see if there are any files named "ref.cs.fa.*"?

Here are the files I have for hg18. Instead of the ref.fa above, I would use ref.cs.fa for both the aln and samse commands!

Code:
[bash$] ls -1
hg18.cs.fa.amb
hg18.cs.fa.ann
hg18.cs.fa.bwt
hg18.cs.fa.nt.amb
hg18.cs.fa.nt.ann
hg18.cs.fa.nt.pac
hg18.cs.fa.pac
hg18.cs.fa.rbwt
hg18.cs.fa.rpac
hg18.cs.fa.rsa
hg18.cs.fa.sa
hg18.fa
Check out my own aligner BFAST if you get completely frustrated.
nilshomer is offline   Reply With Quote
Old 09-05-2009, 04:34 PM   #10
xgai
Junior Member
 
Location: Chicago

Join Date: May 2008
Posts: 9
Default

Thanks, Nils.

Did you have to do something first to generate the .cs.fa file? I ran the command:

> bwa index -a is -c ref.fa

And I got the following files:

ref.fa.amb
ref.fa.bwt
ref.fa.nt.ann
ref.fa.pac
ref.fa.rpac
ref.fa.sa
ref.fa.ann
ref.fa.nt.amb
ref.fa.nt.pac
ref.fa.rbwt
ref.fa.rsa

And there is no ref.cs.fa to be found anywhere.

Btw, I did manage to compile bfast a couple of hours ago on my MacBook Pro. I might have some questions for you if you don't mind.
xgai is offline   Reply With Quote
Old 09-05-2009, 05:13 PM   #11
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by xgai View Post
Thanks, Nils.

Did you have to do something first to generate the .cs.fa file? I ran the command:

> bwa index -a is -c ref.fa

And I got the following files:

ref.fa.amb
ref.fa.bwt
ref.fa.nt.ann
ref.fa.pac
ref.fa.rpac
ref.fa.sa
ref.fa.ann
ref.fa.nt.amb
ref.fa.nt.pac
ref.fa.rbwt
ref.fa.rsa

And there is no ref.cs.fa to be found anywhere.

Btw, I did manage to compile bfast a couple of hours ago on my MacBook Pro. I might have some questions for you if you don't mind.
Sorry, it is was product of my prefix (-p). Try specifying a prefix like mine.
Code:
/share/apps/bwa-0.4.9/bwa index -a bwtsw -p hg18.cs.fa -c hg18.fa
Feel free to post questions about BFAST (in a different thread) or to the BFAST help mailing list (bfast-help@lists.sourceforge.net).
nilshomer is offline   Reply With Quote
Old 09-05-2009, 05:59 PM   #12
xgai
Junior Member
 
Location: Chicago

Join Date: May 2008
Posts: 9
Default

-p option was indeed the reason. You have to specify it, although it seems to be optional (default is said to be the fasta name). It fixed my problem, although I am still puzzled by the alignment result that I got previously. I wish I could figure out the way to attach the file here, as you will see what I meant. Thanks, Nils.
xgai is offline   Reply With Quote
Old 09-05-2009, 06:08 PM   #13
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by xgai View Post
-p option was indeed the reason. You have to specify it, although it seems to be optional (default is said to be the fasta name). It fixed my problem, although I am still puzzled by the alignment result that I got previously. I wish I could figure out the way to attach the file here, as you will see what I meant. Thanks, Nils.
Bug hl3 (via PM), who is the author of BWA. You can also try the maq mailing lists and bug tracker on maq.sourceforge.net.
nilshomer is offline   Reply With Quote
Old 01-06-2010, 09:04 AM   #14
ikrier
Member
 
Location: Lausanne

Join Date: Dec 2009
Posts: 19
Default bwa samse problem as well

I've posted somewhere else, more appropriate (in the bioinformatics section) because it's not about solid reads.

Hi, I did the indexing with bwtsw and no -p and I got the following files :
Mouse_genome.fa.amb
Mouse_genome.fa.ann
Mouse_genome.fa.bwt
Mouse_genome.fa.pac
Mouse_genome.fa.rbwt
Mouse_genome.fa.rpac
Mouse_genome.fa.rsa
Mouse_genome.fa.sa

I managed to get the .sai file from the aln command, but now I'm stuck because the samse command gives me the error:
fail to open file '../Mouse_genome.fa.nt.ann'. Abort!

But I never get the .nt.ann file with indexing. I'm confused.

Last edited by ikrier; 01-07-2010 at 05:06 AM.
ikrier is offline   Reply With Quote
Old 01-06-2010, 09:08 AM   #15
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by ikrier View Post
Hi, I did the indexing with bwtsw and no -p and I got the following files :
Mouse_genome.fa.amb
Mouse_genome.fa.ann
Mouse_genome.fa.bwt
Mouse_genome.fa.pac
Mouse_genome.fa.rbwt
Mouse_genome.fa.rpac
Mouse_genome.fa.rsa
Mouse_genome.fa.sa

I managed to get the .sai file from the aln command, but now I'm stuck because the samse command gives me the error:
fail to open file '../Mouse_genome.fa.nt.ann'. Abort!

But I never get the .nt.ann file with indexing. I'm confused.
It looks like you are specifying the wrong prefix. Can you give us your full samse command?
nilshomer is offline   Reply With Quote
Old 01-06-2010, 09:18 AM   #16
ikrier
Member
 
Location: Lausanne

Join Date: Dec 2009
Posts: 19
Default

bwa samse ../Mouse_genome.fa tags_all.sai tags_all.fastq > tags_all.sam
ikrier is offline   Reply With Quote
Old 01-06-2010, 09:31 AM   #17
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by ikrier View Post
bwa samse ../Mouse_genome.fa tags_all.sai tags_all.fastq > tags_all.sam
What was your aln command? Are you running this on Illumina or SOLiD reads?
nilshomer is offline   Reply With Quote
Old 01-06-2010, 09:37 AM   #18
ikrier
Member
 
Location: Lausanne

Join Date: Dec 2009
Posts: 19
Default

bwa aln ../Mouse_genome.fa tags_all.fastq > tags_all.sai
ikrier is offline   Reply With Quote
Old 01-06-2010, 07:42 PM   #19
DNAANDDAN
Junior Member
 
Location: china

Join Date: Jan 2010
Posts: 2
Default edition of BWA is concern

what's is your edition of BWA ?
BWA is perfect for Solexa reads, but have some bug for Solid reads. the reads on mins strand have a complentary sequencse ,both single and pair-end reads are have this problem, from some edition of BWA.
use 4.9 try again, it is stable for my result.


Quote:
Originally Posted by xgai View Post
Thanks, Chipper. I have not been able to attach it for some reason.

Regarding the NGS exercise, I might have done something wrong at some step then. There wasn't any error or warning along the way, so there was no clue. I tried to post the same question on the samtools-help list. I am copying it below and see if it helps you see my question better. Thanks in advance.

Can someone provide me some pointers regarding SAM format in color space and correct ways to use samtools for processing such SAM files, especially for SNP and indel calling? I looked everywhere but could not find any documentations.

Specifically, as an exercise, here is what I did:

- Simulated some SOLiD reads using wgsim (-c option) from a reference sequence.
- Generated the bwa index with the following command: bwa index -c ref.fa -a is
- Align the reads (in fastq format) back to the reference sequence using bwa:
bwa aln -c ref.fa r1.fq > r1.sai
bwa samse ref.fa r1.sai r1.fq > r1.sam

And I ran the usual faidx, import, sort, index, and pileup commands of samtools and they went smoothly with no errors or warnings. I can view it with samtools tview. Nonetheless, the pileup file just does not make sense to me, as the consensus sequence is almost different to the reference sequence at every position. And, tview seems to be showing the reads still in color space (double encoded?), which is hard or impossible to interpret for me.
DNAANDDAN is offline   Reply With Quote
Old 01-07-2010, 05:05 AM   #20
ikrier
Member
 
Location: Lausanne

Join Date: Dec 2009
Posts: 19
Default

I'm removing my post here because it's not about Solid reads...
ikrier is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:41 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO