Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sam output from bwa for SOLiD reads in colorspace?

    Hi all,

    I'm using bwa for mapping SOLiD paired reads to the reference genome. After going through the bwa aln and bwa samse/sampe stages I get output in the SAM format. Is this SAM output in colorspace? If so are there tools to convert the SAM format to nucleotide space so that I can generate pile ups in nucleotide space?

    Eg. of the output I'm getting for a mapped single-end read is as follows:

    ./fastq_files/Part_0:6_32_1000 0 chr6 112832228 37 49M = 112832228 0 TNTAGGTAGTGTATTAAATGGCGACAGGACTGGGGGACCCCAGCGCCAA @!9:79,676=*+98:&2(>;5&315+(9:41+8>58-5<18745;0)+ XT:A:U NM:i:3 X0:i:1 X1:i:0 XM:i:3 XO:i:0 XG:i:0 MD:Z:1T35A5T5

    Here columns 10, 11 which report the query sequence and the qualities are shown in color space

    Any help would be appreciated.

    Thanks
    N

  • #2
    Originally posted by nisha View Post
    Hi all,

    I'm using bwa for mapping SOLiD paired reads to the reference genome. After going through the bwa aln and bwa samse/sampe stages I get output in the SAM format. Is this SAM output in colorspace? If so are there tools to convert the SAM format to nucleotide space so that I can generate pile ups in nucleotide space?

    Eg. of the output I'm getting for a mapped single-end read is as follows:

    ./fastq_files/Part_0:6_32_1000 0 chr6 112832228 37 49M = 112832228 0 TNTAGGTAGTGTATTAAATGGCGACAGGACTGGGGGACCCCAGCGCCAA @!9:79,676=*+98:&2(>;5&315+(9:41+8>58-5<18745;0)+ XT:A:U NM:i:3 X0:i:1 X1:i:0 XM:i:3 XO:i:0 XG:i:0 MD:Z:1T35A5T5

    Here columns 10, 11 which report the query sequence and the qualities are shown in color space

    Any help would be appreciated.

    Thanks
    N
    I tried aligning the above read "TNTAGGTAGTGTATTAAATGGCGACAGGACTGGGGGACCCCAGCGCCAA" to the reference (both in cs and nt), but it did not match anywhere (some local homology) with high confidence so it looks like it is still in color space to me (double encoded). Did BWA not come with a tool to convert the output from color space to nt space (for example BFAST does this natively and MAQ has the "maq csmapnt" command)?

    Comment


    • #3
      I have the same question here. Does anyone know the answer?

      Comment


      • #4
        Originally posted by xgai View Post
        I have the same question here. Does anyone know the answer?
        It reports mapped reads in nucleotide space for me. Unmapped reads are i CS. What does your output look like?

        Comment


        • #5
          Originally posted by Chipper View Post
          It reports mapped reads in nucleotide space for me. Unmapped reads are i CS. What does your output look like?
          I am surprised that BWA and MAQ do not use the "CS" and "CZ" fields.

          Comment


          • #6
            Does it? I am attaching a screenshot of the alignment (using tview). It just does not make sense to me. And the pileup file I got from "samtools pileup" command shows that the consensus is different than the reference sequence at almost every position..

            Comment


            • #7
              I can't see the screenshot, but have you checked that you are using the correct fastQ format and an index in cs-format, and are using aln -c? I think I have made all these mistakes at some point with starnge results...

              Comment


              • #8
                Thanks, Chipper. I have not been able to attach it for some reason.

                Regarding the NGS exercise, I might have done something wrong at some step then. There wasn't any error or warning along the way, so there was no clue. I tried to post the same question on the samtools-help list. I am copying it below and see if it helps you see my question better. Thanks in advance.

                Can someone provide me some pointers regarding SAM format in color space and correct ways to use samtools for processing such SAM files, especially for SNP and indel calling? I looked everywhere but could not find any documentations.

                Specifically, as an exercise, here is what I did:

                - Simulated some SOLiD reads using wgsim (-c option) from a reference sequence.
                - Generated the bwa index with the following command: bwa index -c ref.fa -a is
                - Align the reads (in fastq format) back to the reference sequence using bwa:
                bwa aln -c ref.fa r1.fq > r1.sai
                bwa samse ref.fa r1.sai r1.fq > r1.sam

                And I ran the usual faidx, import, sort, index, and pileup commands of samtools and they went smoothly with no errors or warnings. I can view it with samtools tview. Nonetheless, the pileup file just does not make sense to me, as the consensus sequence is almost different to the reference sequence at every position. And, tview seems to be showing the reads still in color space (double encoded?), which is hard or impossible to interpret for me.

                Comment


                • #9
                  Originally posted by xgai View Post
                  Thanks, Chipper. I have not been able to attach it for some reason.

                  Regarding the NGS exercise, I might have done something wrong at some step then. There wasn't any error or warning along the way, so there was no clue. I tried to post the same question on the samtools-help list. I am copying it below and see if it helps you see my question better. Thanks in advance.

                  Can someone provide me some pointers regarding SAM format in color space and correct ways to use samtools for processing such SAM files, especially for SNP and indel calling? I looked everywhere but could not find any documentations.

                  Specifically, as an exercise, here is what I did:

                  - Simulated some SOLiD reads using wgsim (-c option) from a reference sequence.
                  - Generated the bwa index with the following command: bwa index -c ref.fa -a is
                  - Align the reads (in fastq format) back to the reference sequence using bwa:
                  bwa aln -c ref.fa r1.fq > r1.sai
                  bwa samse ref.fa r1.sai r1.fq > r1.sam

                  And I ran the usual faidx, import, sort, index, and pileup commands of samtools and they went smoothly with no errors or warnings. I can view it with samtools tview. Nonetheless, the pileup file just does not make sense to me, as the consensus sequence is almost different to the reference sequence at every position. And, tview seems to be showing the reads still in color space (double encoded?), which is hard or impossible to interpret for me.
                  This might be the problem, could you see if there are any files named "ref.cs.fa.*"?

                  Here are the files I have for hg18. Instead of the ref.fa above, I would use ref.cs.fa for both the aln and samse commands!

                  Code:
                  [bash$] ls -1
                  hg18.cs.fa.amb
                  hg18.cs.fa.ann
                  hg18.cs.fa.bwt
                  hg18.cs.fa.nt.amb
                  hg18.cs.fa.nt.ann
                  hg18.cs.fa.nt.pac
                  hg18.cs.fa.pac
                  hg18.cs.fa.rbwt
                  hg18.cs.fa.rpac
                  hg18.cs.fa.rsa
                  hg18.cs.fa.sa
                  hg18.fa
                  Check out my own aligner BFAST if you get completely frustrated.

                  Comment


                  • #10
                    Thanks, Nils.

                    Did you have to do something first to generate the .cs.fa file? I ran the command:

                    > bwa index -a is -c ref.fa

                    And I got the following files:

                    ref.fa.amb
                    ref.fa.bwt
                    ref.fa.nt.ann
                    ref.fa.pac
                    ref.fa.rpac
                    ref.fa.sa
                    ref.fa.ann
                    ref.fa.nt.amb
                    ref.fa.nt.pac
                    ref.fa.rbwt
                    ref.fa.rsa

                    And there is no ref.cs.fa to be found anywhere.

                    Btw, I did manage to compile bfast a couple of hours ago on my MacBook Pro. I might have some questions for you if you don't mind.

                    Comment


                    • #11
                      Originally posted by xgai View Post
                      Thanks, Nils.

                      Did you have to do something first to generate the .cs.fa file? I ran the command:

                      > bwa index -a is -c ref.fa

                      And I got the following files:

                      ref.fa.amb
                      ref.fa.bwt
                      ref.fa.nt.ann
                      ref.fa.pac
                      ref.fa.rpac
                      ref.fa.sa
                      ref.fa.ann
                      ref.fa.nt.amb
                      ref.fa.nt.pac
                      ref.fa.rbwt
                      ref.fa.rsa

                      And there is no ref.cs.fa to be found anywhere.

                      Btw, I did manage to compile bfast a couple of hours ago on my MacBook Pro. I might have some questions for you if you don't mind.
                      Sorry, it is was product of my prefix (-p). Try specifying a prefix like mine.
                      Code:
                      /share/apps/bwa-0.4.9/bwa index -a bwtsw -p hg18.cs.fa -c hg18.fa
                      Feel free to post questions about BFAST (in a different thread) or to the BFAST help mailing list ([email protected]).

                      Comment


                      • #12
                        -p option was indeed the reason. You have to specify it, although it seems to be optional (default is said to be the fasta name). It fixed my problem, although I am still puzzled by the alignment result that I got previously. I wish I could figure out the way to attach the file here, as you will see what I meant. Thanks, Nils.

                        Comment


                        • #13
                          Originally posted by xgai View Post
                          -p option was indeed the reason. You have to specify it, although it seems to be optional (default is said to be the fasta name). It fixed my problem, although I am still puzzled by the alignment result that I got previously. I wish I could figure out the way to attach the file here, as you will see what I meant. Thanks, Nils.
                          Bug hl3 (via PM), who is the author of BWA. You can also try the maq mailing lists and bug tracker on maq.sourceforge.net.

                          Comment


                          • #14
                            bwa samse problem as well

                            I've posted somewhere else, more appropriate (in the bioinformatics section) because it's not about solid reads.

                            Hi, I did the indexing with bwtsw and no -p and I got the following files :
                            Mouse_genome.fa.amb
                            Mouse_genome.fa.ann
                            Mouse_genome.fa.bwt
                            Mouse_genome.fa.pac
                            Mouse_genome.fa.rbwt
                            Mouse_genome.fa.rpac
                            Mouse_genome.fa.rsa
                            Mouse_genome.fa.sa

                            I managed to get the .sai file from the aln command, but now I'm stuck because the samse command gives me the error:
                            fail to open file '../Mouse_genome.fa.nt.ann'. Abort!

                            But I never get the .nt.ann file with indexing. I'm confused.
                            Last edited by ikrier; 01-07-2010, 05:06 AM.

                            Comment


                            • #15
                              Originally posted by ikrier View Post
                              Hi, I did the indexing with bwtsw and no -p and I got the following files :
                              Mouse_genome.fa.amb
                              Mouse_genome.fa.ann
                              Mouse_genome.fa.bwt
                              Mouse_genome.fa.pac
                              Mouse_genome.fa.rbwt
                              Mouse_genome.fa.rpac
                              Mouse_genome.fa.rsa
                              Mouse_genome.fa.sa

                              I managed to get the .sai file from the aln command, but now I'm stuck because the samse command gives me the error:
                              fail to open file '../Mouse_genome.fa.nt.ann'. Abort!

                              But I never get the .nt.ann file with indexing. I'm confused.
                              It looks like you are specifying the wrong prefix. Can you give us your full samse command?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X