Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • foxyg
    Member
    • May 2010
    • 54

    can somene explain how BWA do its trimming

    In BWA manual, there is a -q option.

    Parameter for read trimming. BWA trims a read down to argmax_x{\sum_{i=x+1}^l(INT-q_i)} if q_l<INT where l is the original read length.


    I am not really sure what this means. If I say -q 15, what does it really mean?
  • dawe
    Senior Member
    • Apr 2009
    • 258

    #2
    Originally posted by foxyg View Post
    In BWA manual, there is a -q option.

    Parameter for read trimming. BWA trims a read down to argmax_x{\sum_{i=x+1}^l(INT-q_i)} if q_l<INT where l is the original read length.


    I am not really sure what this means. If I say -q 15, what does it really mean?
    I believe it tries to find local maxima in the (INT-q_i) function, and chooses the rightmost maximum. That means that it trims when the quality starts to decrease monotonically below your threshold. This is pretty smart: suppose you have two/three bad qualities at the beginning of your read (say from 5 to 8 bp): hard trimming below a certain threshold results in a 5 bp long read. bwa method actually checks if you have better qualities after that and trims later.

    d

    Comment

    • foxyg
      Member
      • May 2010
      • 54

      #3
      So does BWA start scanning from left or right?

      Also what is the common trim parameter here I should use if I have offset 64 FASTQ data?

      Comment

      • lh3
        Senior Member
        • Feb 2008
        • 686

        #4
        bwa learned from phred.

        @foxyg

        Use Sanger fastq. That is the standard. Use -q15 or -q20. Usually the threshold does not matter too much.

        Comment

        • dawe
          Senior Member
          • Apr 2009
          • 258

          #5
          Hi again!

          Originally posted by foxyg View Post
          So does BWA start scanning from left or right?
          Well... How else? :-)

          Originally posted by foxyg View Post
          Also what is the common trim parameter here I should use if I have offset 64 FASTQ data?
          As pointed by lh3 you should always have your scores in Sanger format and then you may apply a filter to 15-20 (which corresponds to a ~0.03-0.01 probability).
          BTW, if you have your fastq in Illumina (Pipieline 1.3+) you may try this patch I've written. It enables a '-I' option to bwa aln so that you can use Illumina reads and trim (and output) as they were in Sanger scale.

          d

          Comment

          • ElMichael
            Member
            • Jun 2009
            • 31

            #6
            Originally posted by lh3 View Post
            bwa learned from phred.
            Use Sanger fastq. That is the standard. Use -q15 or -q20. Usually the threshold does not matter too much.
            Hi,
            I wonder when does the threshold matter too much? And could anybody explain why it usually doesn't matter?
            Thanks!

            Comment

            • Orr Shomroni
              Member
              • Oct 2011
              • 26

              #7
              Originally posted by dawe View Post
              As pointed by lh3 you should always have your scores in Sanger format and then you may apply a filter to 15-20 (which corresponds to a ~0.03-0.01 probability).
              lh3 also pointed out that
              Originally posted by lh3 View Post
              Usually the threshold does not matter too much.
              So on the same tone of ElMichael, why would the threshold not matter? I mean, if the quality threshold is higher, you would select fewer bases, as the overall quality would decrease, right?

              Dawe, what did you base your choice of read trimming threshold (15-20) upon? Is there a specific paper saying "this is a commonly used threshold value", like the use of p-value=0.05 for hypothesis testing? I just want to have some confirmation of the threshold selection.
              "Though it may seem that all's been said and done, originality still lives on" - some unoriginal guy who had nothing better to write as his signature

              Comment

              • carmeyeii
                Senior Member
                • Mar 2011
                • 137

                #8
                I suppose if the probability of a base call being wrong is less than .01, you'd still want to keep it.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM
                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                25 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                42 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                48 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-04-2026, 08:59 AM
                0 responses
                49 views
                0 reactions
                Last Post SEQadmin2  
                Working...