SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Non-commercial SNP calling soft *#1* Bioinformatics 4 07-08-2012 02:16 AM
Clipping adapters nsl RNA Sequencing 6 05-27-2011 02:08 PM
Basic Seq Soft ednot General 1 05-02-2011 05:42 PM
controlling clipping behavior in bwa 'aln' and 'bwtsw' aligners jnfass Bioinformatics 1 01-10-2010 01:49 PM
what's clipping point? jordi 454 Pyrosequencing 1 05-26-2009 01:47 AM

Reply
 
Thread Tools
Old 10-26-2010, 09:57 AM   #1
Bio.X2Y
Member
 
Location: Europe

Join Date: Apr 2010
Posts: 46
Default BWA Soft Clipping

Hi,

When I run BWA without specifying a "q" value (which defaults to 0 as I understand it from the manual), I would not expect any trimming to occur.

However, the resulting alignments have lots of soft-clippings at the edges. Aren't these trimmings?

Thanks!
Bio.X2Y is offline   Reply With Quote
Old 10-26-2010, 08:57 PM   #2
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Which value have you specified? Why would you expect trimming not to occur?
Also, if you specify a q value, you should see information about trimming while bwa is running.

d
dawe is offline   Reply With Quote
Old 10-27-2010, 02:08 AM   #3
Bio.X2Y
Member
 
Location: Europe

Join Date: Apr 2010
Posts: 46
Default

Hi, I didn't specify a "q" value, and the BWA manual implies that this means a default value of "0" is used.

The official description of "q" is a bit cryptic for a non-mathematician, but I thought that the default value of "0" would lead to no trimming? If this isn't the case, how can I prevent trimming?

Thanks.
Bio.X2Y is offline   Reply With Quote
Old 10-27-2010, 02:11 AM   #4
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Quote:
Originally Posted by Bio.X2Y View Post
Hi, I didn't specify a "q" value, and the BWA manual implies that this means a default value of "0" is used.

The official description of "q" is a bit cryptic for a non-mathematician, but I thought that the default value of "0" would lead to no trimming? If this isn't the case, how can I prevent trimming?

Thanks.
Whoops! Sorry for misreading your post.
Can you post a soft-clipped entry? Could it be some effect of SW alignment instead?

d
dawe is offline   Reply With Quote
Old 10-27-2010, 02:46 AM   #5
Bio.X2Y
Member
 
Location: Europe

Join Date: Apr 2010
Posts: 46
Default

Hi,
Below is an example (both ends shown).

I'm not sure what you mean by this being an artefact of SW alignment? I would have thought that trimming would either (a) be allowed or (b) not allowed.

Thanks for your help!

SRR018256.13099683 83 RN28S1|NR_003287.2 4925 29 51M 4550 -426 CCCCCCGTCACGCACCGCACGTTCGTGGGGAACCTGGCGCTAAACCATTCG #%#&&$($($&'%$,#&+%+'+&)((0,**.0++,+1)65.7C+II<@II. XT:A:U NM:i:2 SM:i:29 AM:i:29X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:0T1G48
SRR018256.13099683 163 RN28S1|NR_003287.2 4550 29 45M6S 4925 426 GTTAGTTTTACCCTACTGATGATGTGTTGTTGCCATAGTAATCCTNTNTAG I+I;-77I=,10>9/55I)*;%1+%*++%0+))&$%#'$&"'%))!#!$"% XT:A:M NM:i:1 SM:i:29 AM:i:29XM:i:1 XO:i:0 XG:i:0 MD:Z:36G8
Bio.X2Y is offline   Reply With Quote
Old 10-27-2010, 04:35 AM   #6
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Quote:
Originally Posted by Bio.X2Y View Post
Hi,
Below is an example (both ends shown).

I'm not sure what you mean by this being an artefact of SW alignment? I would have thought that trimming would either (a) be allowed or (b) not allowed.
I don't mean that's an artifact. bwa extends your match by smith-waterman alignment. I guess the terminal part of a read may be soft-clipped if this implies a higher score.
Trimming is quite different, as it is performed at alignment time evaluating the read qualities.

d
dawe is offline   Reply With Quote
Old 11-24-2010, 08:41 AM   #7
pparg
Member
 
Location: NY

Join Date: Aug 2008
Posts: 19
Default

How do I know that –q INT option I set has taken effect? I have 51-nt pair-end reads too. It seems to me that nothing has been trimmed, as the read lengths indicated by CIGAR string column are all 51. is there any other column to check on whether quality trimming has occured?
A related question is the same as Bio.X2Y’s: the resulting alignments have lots of soft-clippings at the edges. Aren't these trimmings? Based on the description of –q INT option in BWA documentation, I would expect soft-clippings (due to trimming) only occur at the right end of sequences, instead of both ends. But I see soft-clippings occur at both ends frequently.
Thanks a lot for any inputs! It would also be great if anyone could clarify BWA quality trimming issue a little bit as quite a few people here have similar questions.
pparg is offline   Reply With Quote
Old 11-24-2010, 07:01 PM   #8
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

bwa may do smith-waterman alignment, which produces soft clipping.
lh3 is offline   Reply With Quote
Old 11-25-2010, 01:11 PM   #9
pparg
Member
 
Location: NY

Join Date: Aug 2008
Posts: 19
Default

What about the quality trimming? Does it actually happen, or it produces soft-clippings too? Thanks!
pparg is offline   Reply With Quote
Old 03-17-2012, 12:41 PM   #10
CNVboy
Member
 
Location: boston

Join Date: Jun 2011
Posts: 27
Default

Quote:
Originally Posted by pparg View Post
How do I know that q INT option I set has taken effect? I have 51-nt pair-end reads too. It seems to me that nothing has been trimmed, as the read lengths indicated by CIGAR string column are all 51. is there any other column to check on whether quality trimming has occured?
A related question is the same as Bio.X2Ys: the resulting alignments have lots of soft-clippings at the edges. Aren't these trimmings? Based on the description of q INT option in BWA documentation, I would expect soft-clippings (due to trimming) only occur at the right end of sequences, instead of both ends. But I see soft-clippings occur at both ends frequently.
Thanks a lot for any inputs! It would also be great if anyone could clarify BWA quality trimming issue a little bit as quite a few people here have similar questions.

This may be a late answer.
To my understanding, you guys are confused about "-q opiton(quality trimming)" and " soft-clipped".

-q option is to trim those crappy ends of reads with very low Phred score, ie. bad quality, which can be due to sequencing errors. Such trimming serves as pre-processing before running BWA.

While "soft-clipped" refers to the reads whose certain part may find nowhere to align to, say, for those split-read covering breakpoints. BWA still preserves those "unmapped" part for downstream analysis because it could be caused by say translocation, deletion blablabla.

So basically you are talking about two different things.
CNVboy is offline   Reply With Quote
Old 03-09-2015, 04:56 AM   #11
xiangwulu
Member
 
Location: ireland

Join Date: Apr 2014
Posts: 18
Default

Quote:
Originally Posted by CNVboy View Post
This may be a late answer.
To my understanding, you guys are confused about "-q opiton(quality trimming)" and " soft-clipped".

-q option is to trim those crappy ends of reads with very low Phred score, ie. bad quality, which can be due to sequencing errors. Such trimming serves as pre-processing before running BWA.

While "soft-clipped" refers to the reads whose certain part may find nowhere to align to, say, for those split-read covering breakpoints. BWA still preserves those "unmapped" part for downstream analysis because it could be caused by say translocation, deletion blablabla.

So basically you are talking about two different things.
Hi, I think that we know that the trimming and soft-clipping are made for different purposes, but in the SAM file, the cigar string shows the clipping information: e.g. 4S26M but not the reason why its clipped.

The problem here is: why does bwa clipped/trimmed reads when -q option is not specified? is soft-clipping its part of bwa's nature?

I have also noticed that lots alignment tools do the soft-clipping, even it is not an option stated in the manual or parameters. On one side, soft-clipping would generate more alignments, or maybe 'higher' alignment rate, but what about if we want the alignment results with exactly 1 mismatch?

I think the soft-clipping is a bit collision to the mismatch option. For "4S26M", would the '4' also count as mismatch allowed = 4?
xiangwulu is offline   Reply With Quote
Old 03-09-2015, 09:08 AM   #12
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

I don't know if this is that case for that specific read, since you didn't post the whole line, but the sam specification requires clipping if a read goes of the end of a reference sequence.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
bwa, soft-clipping

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:47 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO