Seqanswers Leaderboard Ad

**nilshomer** · 07-28-2009, 06:02 PM

Originally posted by nisha View Post

Hi all,

I'm using bwa for mapping SOLiD paired reads to the reference genome. After going through the bwa aln and bwa samse/sampe stages I get output in the SAM format. Is this SAM output in colorspace? If so are there tools to convert the SAM format to nucleotide space so that I can generate pile ups in nucleotide space?

Eg. of the output I'm getting for a mapped single-end read is as follows:

./fastq_files/Part_0:6_32_1000 0 chr6 112832228 37 49M = 112832228 0 TNTAGGTAGTGTATTAAATGGCGACAGGACTGGGGGACCCCAGCGCCAA @!9:79,676=*+98:&2(>;5&315+(9:41+8>58-5<18745;0)+ XT:A:U NM:i:3 X0:i:1 X1:i:0 XM:i:3 XO:i:0 XG:i:0 MD:Z:1T35A5T5

Here columns 10, 11 which report the query sequence and the qualities are shown in color space

Any help would be appreciated.

Thanks
N

I tried aligning the above read "TNTAGGTAGTGTATTAAATGGCGACAGGACTGGGGGACCCCAGCGCCAA" to the reference (both in cs and nt), but it did not match anywhere (some local homology) with high confidence so it looks like it is still in color space to me (double encoded). Did BWA not come with a tool to convert the output from color space to nt space (for example BFAST does this natively and MAQ has the "maq csmapnt" command)?

**xgai** · 09-05-2009, 04:24 AM

I have the same question here. Does anyone know the answer?

**Chipper** · 09-05-2009, 09:19 AM

Originally posted by xgai View Post

I have the same question here. Does anyone know the answer?

It reports mapped reads in nucleotide space for me. Unmapped reads are i CS. What does your output look like?

**nilshomer** · 09-05-2009, 10:55 AM

Originally posted by Chipper View Post

It reports mapped reads in nucleotide space for me. Unmapped reads are i CS. What does your output look like?

I am surprised that BWA and MAQ do not use the "CS" and "CZ" fields.

**xgai** · 09-05-2009, 01:41 PM

Does it? I am attaching a screenshot of the alignment (using tview). It just does not make sense to me. And the pileup file I got from "samtools pileup" command shows that the consensus is different than the reference sequence at almost every position..

**Chipper** · 09-05-2009, 02:01 PM

I can't see the screenshot, but have you checked that you are using the correct fastQ format and an index in cs-format, and are using aln -c? I think I have made all these mistakes at some point with starnge results...

**xgai** · 09-05-2009, 02:29 PM

Thanks, Chipper. I have not been able to attach it for some reason.

Regarding the NGS exercise, I might have done something wrong at some step then. There wasn't any error or warning along the way, so there was no clue. I tried to post the same question on the samtools-help list. I am copying it below and see if it helps you see my question better. Thanks in advance.

Can someone provide me some pointers regarding SAM format in color space and correct ways to use samtools for processing such SAM files, especially for SNP and indel calling? I looked everywhere but could not find any documentations.

Specifically, as an exercise, here is what I did:

- Simulated some SOLiD reads using wgsim (-c option) from a reference sequence.
- Generated the bwa index with the following command: bwa index -c ref.fa -a is
- Align the reads (in fastq format) back to the reference sequence using bwa:
bwa aln -c ref.fa r1.fq > r1.sai
bwa samse ref.fa r1.sai r1.fq > r1.sam

And I ran the usual faidx, import, sort, index, and pileup commands of samtools and they went smoothly with no errors or warnings. I can view it with samtools tview. Nonetheless, the pileup file just does not make sense to me, as the consensus sequence is almost different to the reference sequence at every position. And, tview seems to be showing the reads still in color space (double encoded?), which is hard or impossible to interpret for me.

**nilshomer** · 09-05-2009, 02:56 PM

Originally posted by xgai View Post

Thanks, Chipper. I have not been able to attach it for some reason.

Regarding the NGS exercise, I might have done something wrong at some step then. There wasn't any error or warning along the way, so there was no clue. I tried to post the same question on the samtools-help list. I am copying it below and see if it helps you see my question better. Thanks in advance.

Can someone provide me some pointers regarding SAM format in color space and correct ways to use samtools for processing such SAM files, especially for SNP and indel calling? I looked everywhere but could not find any documentations.

Specifically, as an exercise, here is what I did:

- Simulated some SOLiD reads using wgsim (-c option) from a reference sequence.
- Generated the bwa index with the following command: bwa index -c ref.fa -a is
- Align the reads (in fastq format) back to the reference sequence using bwa:
bwa aln -c ref.fa r1.fq > r1.sai
bwa samse ref.fa r1.sai r1.fq > r1.sam

And I ran the usual faidx, import, sort, index, and pileup commands of samtools and they went smoothly with no errors or warnings. I can view it with samtools tview. Nonetheless, the pileup file just does not make sense to me, as the consensus sequence is almost different to the reference sequence at every position. And, tview seems to be showing the reads still in color space (double encoded?), which is hard or impossible to interpret for me.

This might be the problem, could you see if there are any files named "ref.cs.fa.*"?

Here are the files I have for hg18. Instead of the ref.fa above, I would use ref.cs.fa for both the aln and samse commands!

Code:

[bash$] ls -1
hg18.cs.fa.amb
hg18.cs.fa.ann
hg18.cs.fa.bwt
hg18.cs.fa.nt.amb
hg18.cs.fa.nt.ann
hg18.cs.fa.nt.pac
hg18.cs.fa.pac
hg18.cs.fa.rbwt
hg18.cs.fa.rpac
hg18.cs.fa.rsa
hg18.cs.fa.sa
hg18.fa

Check out my own aligner BFAST if you get completely frustrated.

**xgai** · 09-05-2009, 03:34 PM

Thanks, Nils.

Did you have to do something first to generate the .cs.fa file? I ran the command:

> bwa index -a is -c ref.fa

And I got the following files:

ref.fa.amb
ref.fa.bwt
ref.fa.nt.ann
ref.fa.pac
ref.fa.rpac
ref.fa.sa
ref.fa.ann
ref.fa.nt.amb
ref.fa.nt.pac
ref.fa.rbwt
ref.fa.rsa

And there is no ref.cs.fa to be found anywhere.

Btw, I did manage to compile bfast a couple of hours ago on my MacBook Pro. I might have some questions for you if you don't mind.

**nilshomer** · 09-05-2009, 04:13 PM

Originally posted by xgai View Post

Thanks, Nils.

Did you have to do something first to generate the .cs.fa file? I ran the command:

> bwa index -a is -c ref.fa

And I got the following files:

ref.fa.amb
ref.fa.bwt
ref.fa.nt.ann
ref.fa.pac
ref.fa.rpac
ref.fa.sa
ref.fa.ann
ref.fa.nt.amb
ref.fa.nt.pac
ref.fa.rbwt
ref.fa.rsa

And there is no ref.cs.fa to be found anywhere.

Btw, I did manage to compile bfast a couple of hours ago on my MacBook Pro. I might have some questions for you if you don't mind.

Sorry, it is was product of my prefix (-p). Try specifying a prefix like mine.

Code:

/share/apps/bwa-0.4.9/bwa index -a bwtsw -p hg18.cs.fa -c hg18.fa

Feel free to post questions about BFAST (in a different thread) or to the BFAST help mailing list ([email protected]).

**xgai** · 09-05-2009, 04:59 PM

-p option was indeed the reason. You have to specify it, although it seems to be optional (default is said to be the fasta name). It fixed my problem, although I am still puzzled by the alignment result that I got previously. I wish I could figure out the way to attach the file here, as you will see what I meant. Thanks, Nils.

**nilshomer** · 09-05-2009, 05:08 PM

Originally posted by xgai View Post

-p option was indeed the reason. You have to specify it, although it seems to be optional (default is said to be the fasta name). It fixed my problem, although I am still puzzled by the alignment result that I got previously. I wish I could figure out the way to attach the file here, as you will see what I meant. Thanks, Nils.

Bug hl3 (via PM), who is the author of BWA. You can also try the maq mailing lists and bug tracker on maq.sourceforge.net.

**ikrier** · 01-06-2010, 09:04 AM

bwa samse problem as well

I've posted somewhere else, more appropriate (in the bioinformatics section) because it's not about solid reads.

Hi, I did the indexing with bwtsw and no -p and I got the following files :
Mouse_genome.fa.amb
Mouse_genome.fa.ann
Mouse_genome.fa.bwt
Mouse_genome.fa.pac
Mouse_genome.fa.rbwt
Mouse_genome.fa.rpac
Mouse_genome.fa.rsa
Mouse_genome.fa.sa

I managed to get the .sai file from the aln command, but now I'm stuck because the samse command gives me the error:
fail to open file '../Mouse_genome.fa.nt.ann'. Abort!

But I never get the .nt.ann file with indexing. I'm confused.

**nilshomer** · 01-06-2010, 09:08 AM

Originally posted by ikrier View Post

Hi, I did the indexing with bwtsw and no -p and I got the following files :
Mouse_genome.fa.amb
Mouse_genome.fa.ann
Mouse_genome.fa.bwt
Mouse_genome.fa.pac
Mouse_genome.fa.rbwt
Mouse_genome.fa.rpac
Mouse_genome.fa.rsa
Mouse_genome.fa.sa

I managed to get the .sai file from the aln command, but now I'm stuck because the samse command gives me the error:
fail to open file '../Mouse_genome.fa.nt.ann'. Abort!

But I never get the .nt.ann file with indexing. I'm confused.

It looks like you are specifying the wrong prefix. Can you give us your full samse command?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

sam output from bwa for SOLiD reads in colorspace?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News