Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Bio.X2Y
    Member
    • Apr 2010
    • 46

    BWA Soft Clipping

    Hi,

    When I run BWA without specifying a "q" value (which defaults to 0 as I understand it from the manual), I would not expect any trimming to occur.

    However, the resulting alignments have lots of soft-clippings at the edges. Aren't these trimmings?

    Thanks!
  • dawe
    Senior Member
    • Apr 2009
    • 258

    #2
    Which value have you specified? Why would you expect trimming not to occur?
    Also, if you specify a q value, you should see information about trimming while bwa is running.

    d

    Comment

    • Bio.X2Y
      Member
      • Apr 2010
      • 46

      #3
      Hi, I didn't specify a "q" value, and the BWA manual implies that this means a default value of "0" is used.

      The official description of "q" is a bit cryptic for a non-mathematician, but I thought that the default value of "0" would lead to no trimming? If this isn't the case, how can I prevent trimming?

      Thanks.

      Comment

      • dawe
        Senior Member
        • Apr 2009
        • 258

        #4
        Originally posted by Bio.X2Y View Post
        Hi, I didn't specify a "q" value, and the BWA manual implies that this means a default value of "0" is used.

        The official description of "q" is a bit cryptic for a non-mathematician, but I thought that the default value of "0" would lead to no trimming? If this isn't the case, how can I prevent trimming?

        Thanks.
        Whoops! Sorry for misreading your post.
        Can you post a soft-clipped entry? Could it be some effect of SW alignment instead?

        d

        Comment

        • Bio.X2Y
          Member
          • Apr 2010
          • 46

          #5
          Hi,
          Below is an example (both ends shown).

          I'm not sure what you mean by this being an artefact of SW alignment? I would have thought that trimming would either (a) be allowed or (b) not allowed.

          Thanks for your help!

          SRR018256.13099683 83 RN28S1|NR_003287.2 4925 29 51M 4550 -426 CCCCCCGTCACGCACCGCACGTTCGTGGGGAACCTGGCGCTAAACCATTCG #%#&&$($($&'%$,#&+%+'+&)((0,**.0++,+1)65.7C+II<@II. XT:A:U NM:i:2 SM:i:29 AM:i:29X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:0T1G48
          SRR018256.13099683 163 RN28S1|NR_003287.2 4550 29 45M6S 4925 426 GTTAGTTTTACCCTACTGATGATGTGTTGTTGCCATAGTAATCCTNTNTAG I+I;-77I=,10>9/55I)*;%1+%*++%0+))&$%#'$&"'%))!#!$"% XT:A:M NM:i:1 SM:i:29 AM:i:29XM:i:1 XO:i:0 XG:i:0 MD:Z:36G8

          Comment

          • dawe
            Senior Member
            • Apr 2009
            • 258

            #6
            Originally posted by Bio.X2Y View Post
            Hi,
            Below is an example (both ends shown).

            I'm not sure what you mean by this being an artefact of SW alignment? I would have thought that trimming would either (a) be allowed or (b) not allowed.
            I don't mean that's an artifact. bwa extends your match by smith-waterman alignment. I guess the terminal part of a read may be soft-clipped if this implies a higher score.
            Trimming is quite different, as it is performed at alignment time evaluating the read qualities.

            d

            Comment

            • pparg
              Member
              • Aug 2008
              • 19

              #7
              How do I know that –q INT option I set has taken effect? I have 51-nt pair-end reads too. It seems to me that nothing has been trimmed, as the read lengths indicated by CIGAR string column are all 51. is there any other column to check on whether quality trimming has occured?
              A related question is the same as Bio.X2Y’s: the resulting alignments have lots of soft-clippings at the edges. Aren't these trimmings? Based on the description of –q INT option in BWA documentation, I would expect soft-clippings (due to trimming) only occur at the right end of sequences, instead of both ends. But I see soft-clippings occur at both ends frequently.
              Thanks a lot for any inputs! It would also be great if anyone could clarify BWA quality trimming issue a little bit as quite a few people here have similar questions.

              Comment

              • lh3
                Senior Member
                • Feb 2008
                • 686

                #8
                bwa may do smith-waterman alignment, which produces soft clipping.

                Comment

                • pparg
                  Member
                  • Aug 2008
                  • 19

                  #9
                  What about the quality trimming? Does it actually happen, or it produces soft-clippings too? Thanks!

                  Comment

                  • CNVboy
                    Member
                    • Jun 2011
                    • 27

                    #10
                    Originally posted by pparg View Post
                    How do I know that –q INT option I set has taken effect? I have 51-nt pair-end reads too. It seems to me that nothing has been trimmed, as the read lengths indicated by CIGAR string column are all 51. is there any other column to check on whether quality trimming has occured?
                    A related question is the same as Bio.X2Y’s: the resulting alignments have lots of soft-clippings at the edges. Aren't these trimmings? Based on the description of –q INT option in BWA documentation, I would expect soft-clippings (due to trimming) only occur at the right end of sequences, instead of both ends. But I see soft-clippings occur at both ends frequently.
                    Thanks a lot for any inputs! It would also be great if anyone could clarify BWA quality trimming issue a little bit as quite a few people here have similar questions.

                    This may be a late answer.
                    To my understanding, you guys are confused about "-q opiton(quality trimming)" and " soft-clipped".

                    -q option is to trim those crappy ends of reads with very low Phred score, ie. bad quality, which can be due to sequencing errors. Such trimming serves as pre-processing before running BWA.

                    While "soft-clipped" refers to the reads whose certain part may find nowhere to align to, say, for those split-read covering breakpoints. BWA still preserves those "unmapped" part for downstream analysis because it could be caused by say translocation, deletion blablabla.

                    So basically you are talking about two different things.

                    Comment

                    • xiangwulu
                      Member
                      • Apr 2014
                      • 18

                      #11
                      Originally posted by CNVboy View Post
                      This may be a late answer.
                      To my understanding, you guys are confused about "-q opiton(quality trimming)" and " soft-clipped".

                      -q option is to trim those crappy ends of reads with very low Phred score, ie. bad quality, which can be due to sequencing errors. Such trimming serves as pre-processing before running BWA.

                      While "soft-clipped" refers to the reads whose certain part may find nowhere to align to, say, for those split-read covering breakpoints. BWA still preserves those "unmapped" part for downstream analysis because it could be caused by say translocation, deletion blablabla.

                      So basically you are talking about two different things.
                      Hi, I think that we know that the trimming and soft-clipping are made for different purposes, but in the SAM file, the cigar string shows the clipping information: e.g. 4S26M but not the reason why its clipped.

                      The problem here is: why does bwa clipped/trimmed reads when -q option is not specified? is soft-clipping its part of bwa's nature?

                      I have also noticed that lots alignment tools do the soft-clipping, even it is not an option stated in the manual or parameters. On one side, soft-clipping would generate more alignments, or maybe 'higher' alignment rate, but what about if we want the alignment results with exactly 1 mismatch?

                      I think the soft-clipping is a bit collision to the mismatch option. For "4S26M", would the '4' also count as mismatch allowed = 4?

                      Comment

                      • Brian Bushnell
                        Super Moderator
                        • Jan 2014
                        • 2709

                        #12
                        I don't know if this is that case for that specific read, since you didn't post the whole line, but the sam specification requires clipping if a read goes of the end of a reference sequence.

                        Comment

                        Latest Articles

                        Collapse

                        • SEQadmin2
                          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                          by SEQadmin2


                          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                          ...
                          06-02-2026, 10:05 AM
                        • SEQadmin2
                          Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                          by SEQadmin2


                          With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                          Introduction

                          Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                          05-22-2026, 06:42 AM
                        • SEQadmin2
                          Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                          by SEQadmin2

                          Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                          Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                          05-06-2026, 09:04 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, Today, 08:59 AM
                        0 responses
                        10 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-02-2026, 12:03 PM
                        0 responses
                        21 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-02-2026, 11:40 AM
                        0 responses
                        17 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 05-28-2026, 11:40 AM
                        0 responses
                        31 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...