Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • can somene explain how BWA do its trimming

    In BWA manual, there is a -q option.

    Parameter for read trimming. BWA trims a read down to argmax_x{\sum_{i=x+1}^l(INT-q_i)} if q_l<INT where l is the original read length.


    I am not really sure what this means. If I say -q 15, what does it really mean?

  • #2
    Originally posted by foxyg View Post
    In BWA manual, there is a -q option.

    Parameter for read trimming. BWA trims a read down to argmax_x{\sum_{i=x+1}^l(INT-q_i)} if q_l<INT where l is the original read length.


    I am not really sure what this means. If I say -q 15, what does it really mean?
    I believe it tries to find local maxima in the (INT-q_i) function, and chooses the rightmost maximum. That means that it trims when the quality starts to decrease monotonically below your threshold. This is pretty smart: suppose you have two/three bad qualities at the beginning of your read (say from 5 to 8 bp): hard trimming below a certain threshold results in a 5 bp long read. bwa method actually checks if you have better qualities after that and trims later.

    d

    Comment


    • #3
      So does BWA start scanning from left or right?

      Also what is the common trim parameter here I should use if I have offset 64 FASTQ data?

      Comment


      • #4
        bwa learned from phred.

        @foxyg

        Use Sanger fastq. That is the standard. Use -q15 or -q20. Usually the threshold does not matter too much.

        Comment


        • #5
          Hi again!

          Originally posted by foxyg View Post
          So does BWA start scanning from left or right?
          Well... How else? :-)

          Originally posted by foxyg View Post
          Also what is the common trim parameter here I should use if I have offset 64 FASTQ data?
          As pointed by lh3 you should always have your scores in Sanger format and then you may apply a filter to 15-20 (which corresponds to a ~0.03-0.01 probability).
          BTW, if you have your fastq in Illumina (Pipieline 1.3+) you may try this patch I've written. It enables a '-I' option to bwa aln so that you can use Illumina reads and trim (and output) as they were in Sanger scale.

          d

          Comment


          • #6
            Originally posted by lh3 View Post
            bwa learned from phred.
            Use Sanger fastq. That is the standard. Use -q15 or -q20. Usually the threshold does not matter too much.
            Hi,
            I wonder when does the threshold matter too much? And could anybody explain why it usually doesn't matter?
            Thanks!

            Comment


            • #7
              Originally posted by dawe View Post
              As pointed by lh3 you should always have your scores in Sanger format and then you may apply a filter to 15-20 (which corresponds to a ~0.03-0.01 probability).
              lh3 also pointed out that
              Originally posted by lh3 View Post
              Usually the threshold does not matter too much.
              So on the same tone of ElMichael, why would the threshold not matter? I mean, if the quality threshold is higher, you would select fewer bases, as the overall quality would decrease, right?

              Dawe, what did you base your choice of read trimming threshold (15-20) upon? Is there a specific paper saying "this is a commonly used threshold value", like the use of p-value=0.05 for hypothesis testing? I just want to have some confirmation of the threshold selection.
              "Though it may seem that all's been said and done, originality still lives on" - some unoriginal guy who had nothing better to write as his signature

              Comment


              • #8
                I suppose if the probability of a base call being wrong is less than .01, you'd still want to keep it.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                66 views
                0 likes
                Last Post seqadmin  
                Working...
                X