Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Read qualities in a sam file produced by BWA.

    Hopefully someone can help me with this one.

    I am using Bowtie2 and BWA to do an alignment of paired fastq files from Illumina HiSeq2000.


    When i open the SAM files produced, the big difference i notice is on the SAM field 11 corresponding to the read qualities.

    here is how the stuff looks - i am pasting the first read and as you can see from the results, the mapping quality is different: 42 against 37 which i understand as the algorithms are different but when it comes to the SAM field 11 its so different ! Now i've read that Bowtie is aware of base quality and bwa is not, but can anyone explain to me what did BWA write in this 11th field of my sam file.

    bowtie2: (settings: --end-to-end --very-fast )
    SRR385642.1 73 chr14 21216082 42 101M = 21216082 0 ACANAGAGCAGAAGCTTCAGCTACATTGAATTCCATTGTGGCGTAGATGGATATGTTGATAACATAGAAGACCTGAGGATTATAGAACCTATCAGCAANAN 380#2>/+138678.@@56?=972486:7:.'22*/*8+=74<(6;0%770//'8((1**4163(/./10-,'**20%/5350(001//6.+..**11#(# AS:i:-5 XN:i:0 XM:i:4 XO:i:0 XG:i:0 NM:i:4 MD:Z:3G94C0T0A0 YT:Z:UP RG:Z:@RGtID:SRR385642tLB:ExercisetSM:1tPL:ILLUMINA

    bwasettings: standard, i run bwa aln on each fast, and then bwa sampe to get the SAM)
    SRR385642.1 73 chr14 21216082 37 101M = 21216082 0 ACANAGAGCAGAAGCTTCAGCTACATTGAATTCCATTGTGGCGTAGATGGATATGTTGATAACATAGAAGACCTGAGGATTATAGAACCTATCAGCAANAN
    ^T^Y^Q^D^S^_^P^L^R^T^Y^W^X^Y^O!!^V^W ^^^Z^X^S^U^Y^WESC^XESC^O^H^S^S^K^P^K^Y^L^^^X^U^] ^W^\^Q^F^X^X^Q^P^P^H^Y ^R^K^K^U^R^W^T ^P^O^P^R^Q^N^M^H^K^K^S^Q^F^P^V^T^V^Q ^Q^Q^R^P^P^W^O^L^O^O^K^K
    ^R^R^D ^D RG:Z:SRR385642 XT:A:U NM:i:4 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:4 XO:i:0 XG:i:0 MD:Z:3G94C0T0A0


    Thank you very much!

  • #2
    The read quality scores in the second example look very broken (I'm assuming the caret ^ symbols are where the terminal has tried to show non-printing characters). This could have happened if the original FASTQ scores were misinterpreted - for example a mixup between the obsolete Solexa/Illumina encodings and the standard Sanger FASTQ encoding?

    Do you have the original FASTQ file for this read?

    Did you note the exact command lines used to run bwa?

    Comment


    • #3
      Thank you very much for having a look at this : i have actually looked up at the fastq for this read and the qualities are exactly the ones which are re-given by bowtie2. Obviously bwa goes wrong somewhere.

      The command is the following:

      bwa aln -t 4 -f input1.sai -I hg19 my_file1.fastq
      bwa aln -t 4 -f input2.sai -I hg19 my_file2.fastq
      bwa sampre -f out.sam -r "@RG\tID.........." hg19 input1.sai input2.sai my_file1.fastq my_file2.fastq

      This has been running for the desperating time of 5 days too.

      Any ideas ?

      Comment


      • #4
        What version of bwa do you have?

        Could you post the first few reads from the FASTQ to clarify this? Use the [ code ] and [ /code ] tags (via the advanced editor view) to prevent the forum messing up the display.

        My hunch is you've wrongly told bwa you have the obsolete Illumina 1.3 to 1.7 FASTQ encoding (by using the -I switch). Illumina 1.8 onwards adopted the original Sanger standard encoding.

        Comment


        • #5
          the BWA version is bwa-0.7.4.

          here is my extract from on of my fastq files:
          Code:
          @SRR385642.1 HWI-ST513_0136:1:1:3631:2202/1
          ACANAGAGCAGAAGCTTCAGCTACATTGAATTCCATTGTGGCGTAGATGGATATGTTGATAACATAGAAGACCTGAGGATTATAGAACCTATCAGCAANAN
          +
          380#2>/+138678.@@56?=972486:7:.'22*/*8+=74<(6;0%770//'8((1**4163(/./10-,'**20%/5350(001//6.+..**11#(#
          @SRR385642.2 HWI-ST513_0136:1:1:3774:2213/1
          CCANCAGGGGAGTCATTAAATCTTCAAGAGCCAAAATAATTTCCTTTTACTCCATGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGCCCGATAACNAN
          +
          8;+#7*+*.++(>*)/--*((:2(:**2(/6*/*'()(%173.5777(%:486(4>182<9776672*)861677305760636659,(,7667:)(2#*#
          @SRR385642.3 HWI-ST513_0136:1:1:4217:2180/1
          TAANTAAATTAACCATTTAATTCTTCCTCGTTGACCCTTCCCTTACATACCTCACCCGCCTGAGAGTCTGAAAGTTCTAGCAGGAAAGGAATTTCTGTNCN
          +
          >67#>608>?675;0@?1(69?.>>3)=)%**=69*/+**500*897+33*-/270+/(.*0,.+%(0(/0,)(-*30,)02.)/./(((,2*((/(+#(#
          @SRR385642.4 HWI-ST513_0136:1:1:6084:2239/1
          GAANAAACTCATGCACCAAACTCGAACTGGGCATATGTGATGTTACCTCGGCATTATTCTTGCAGCATTTCTGACTGCCAATAAAGATCTGAAGAGAGNGN
          +
          >/*#/*((0(7+99862256=<:9;0/007(%*((%(%.(**(('**(%'*+*((*((%.%((**(*(*''*'*-.+(--**%(('%(*(%((('(%(#%#

          Comment


          • #6
            The ASCII characters in your FASTQ quality strings range from '#' (ASCII 35) to '@' (ASCII 64), which is consistent with the Sanger quality encoding. You do NOT have either of the older Solexa/Illumina specific FASTQ encodings.

            Therefore I think you need to rerun bwa WITHOUT using the -I switch.

            Comment


            • #7
              I can see that now, thank you very much, i will then re-run this . Thanks again !

              Comment


              • #8
                Personally I am surprised that bwa doesn't appear to check for this and treat it as an error - it must happen quite often

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X