Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA Soft Clipping

    Hi,

    When I run BWA without specifying a "q" value (which defaults to 0 as I understand it from the manual), I would not expect any trimming to occur.

    However, the resulting alignments have lots of soft-clippings at the edges. Aren't these trimmings?

    Thanks!

  • #2
    Which value have you specified? Why would you expect trimming not to occur?
    Also, if you specify a q value, you should see information about trimming while bwa is running.

    d

    Comment


    • #3
      Hi, I didn't specify a "q" value, and the BWA manual implies that this means a default value of "0" is used.

      The official description of "q" is a bit cryptic for a non-mathematician, but I thought that the default value of "0" would lead to no trimming? If this isn't the case, how can I prevent trimming?

      Thanks.

      Comment


      • #4
        Originally posted by Bio.X2Y View Post
        Hi, I didn't specify a "q" value, and the BWA manual implies that this means a default value of "0" is used.

        The official description of "q" is a bit cryptic for a non-mathematician, but I thought that the default value of "0" would lead to no trimming? If this isn't the case, how can I prevent trimming?

        Thanks.
        Whoops! Sorry for misreading your post.
        Can you post a soft-clipped entry? Could it be some effect of SW alignment instead?

        d

        Comment


        • #5
          Hi,
          Below is an example (both ends shown).

          I'm not sure what you mean by this being an artefact of SW alignment? I would have thought that trimming would either (a) be allowed or (b) not allowed.

          Thanks for your help!

          SRR018256.13099683 83 RN28S1|NR_003287.2 4925 29 51M 4550 -426 CCCCCCGTCACGCACCGCACGTTCGTGGGGAACCTGGCGCTAAACCATTCG #%#&&$($($&'%$,#&+%+'+&)((0,**.0++,+1)65.7C+II<@II. XT:A:U NM:i:2 SM:i:29 AM:i:29X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:0T1G48
          SRR018256.13099683 163 RN28S1|NR_003287.2 4550 29 45M6S 4925 426 GTTAGTTTTACCCTACTGATGATGTGTTGTTGCCATAGTAATCCTNTNTAG I+I;-77I=,10>9/55I)*;%1+%*++%0+))&$%#'$&"'%))!#!$"% XT:A:M NM:i:1 SM:i:29 AM:i:29XM:i:1 XO:i:0 XG:i:0 MD:Z:36G8

          Comment


          • #6
            Originally posted by Bio.X2Y View Post
            Hi,
            Below is an example (both ends shown).

            I'm not sure what you mean by this being an artefact of SW alignment? I would have thought that trimming would either (a) be allowed or (b) not allowed.
            I don't mean that's an artifact. bwa extends your match by smith-waterman alignment. I guess the terminal part of a read may be soft-clipped if this implies a higher score.
            Trimming is quite different, as it is performed at alignment time evaluating the read qualities.

            d

            Comment


            • #7
              How do I know that –q INT option I set has taken effect? I have 51-nt pair-end reads too. It seems to me that nothing has been trimmed, as the read lengths indicated by CIGAR string column are all 51. is there any other column to check on whether quality trimming has occured?
              A related question is the same as Bio.X2Y’s: the resulting alignments have lots of soft-clippings at the edges. Aren't these trimmings? Based on the description of –q INT option in BWA documentation, I would expect soft-clippings (due to trimming) only occur at the right end of sequences, instead of both ends. But I see soft-clippings occur at both ends frequently.
              Thanks a lot for any inputs! It would also be great if anyone could clarify BWA quality trimming issue a little bit as quite a few people here have similar questions.

              Comment


              • #8
                bwa may do smith-waterman alignment, which produces soft clipping.

                Comment


                • #9
                  What about the quality trimming? Does it actually happen, or it produces soft-clippings too? Thanks!

                  Comment


                  • #10
                    Originally posted by pparg View Post
                    How do I know that –q INT option I set has taken effect? I have 51-nt pair-end reads too. It seems to me that nothing has been trimmed, as the read lengths indicated by CIGAR string column are all 51. is there any other column to check on whether quality trimming has occured?
                    A related question is the same as Bio.X2Y’s: the resulting alignments have lots of soft-clippings at the edges. Aren't these trimmings? Based on the description of –q INT option in BWA documentation, I would expect soft-clippings (due to trimming) only occur at the right end of sequences, instead of both ends. But I see soft-clippings occur at both ends frequently.
                    Thanks a lot for any inputs! It would also be great if anyone could clarify BWA quality trimming issue a little bit as quite a few people here have similar questions.

                    This may be a late answer.
                    To my understanding, you guys are confused about "-q opiton(quality trimming)" and " soft-clipped".

                    -q option is to trim those crappy ends of reads with very low Phred score, ie. bad quality, which can be due to sequencing errors. Such trimming serves as pre-processing before running BWA.

                    While "soft-clipped" refers to the reads whose certain part may find nowhere to align to, say, for those split-read covering breakpoints. BWA still preserves those "unmapped" part for downstream analysis because it could be caused by say translocation, deletion blablabla.

                    So basically you are talking about two different things.

                    Comment


                    • #11
                      Originally posted by CNVboy View Post
                      This may be a late answer.
                      To my understanding, you guys are confused about "-q opiton(quality trimming)" and " soft-clipped".

                      -q option is to trim those crappy ends of reads with very low Phred score, ie. bad quality, which can be due to sequencing errors. Such trimming serves as pre-processing before running BWA.

                      While "soft-clipped" refers to the reads whose certain part may find nowhere to align to, say, for those split-read covering breakpoints. BWA still preserves those "unmapped" part for downstream analysis because it could be caused by say translocation, deletion blablabla.

                      So basically you are talking about two different things.
                      Hi, I think that we know that the trimming and soft-clipping are made for different purposes, but in the SAM file, the cigar string shows the clipping information: e.g. 4S26M but not the reason why its clipped.

                      The problem here is: why does bwa clipped/trimmed reads when -q option is not specified? is soft-clipping its part of bwa's nature?

                      I have also noticed that lots alignment tools do the soft-clipping, even it is not an option stated in the manual or parameters. On one side, soft-clipping would generate more alignments, or maybe 'higher' alignment rate, but what about if we want the alignment results with exactly 1 mismatch?

                      I think the soft-clipping is a bit collision to the mismatch option. For "4S26M", would the '4' also count as mismatch allowed = 4?

                      Comment


                      • #12
                        I don't know if this is that case for that specific read, since you didn't post the whole line, but the sam specification requires clipping if a read goes of the end of a reference sequence.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        51 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        68 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X