SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BWA Soft Clipping Bio.X2Y Bioinformatics 11 03-09-2015 10:08 AM
Soft-clipping resequencing panel BnaT Bioinformatics 0 12-18-2014 06:42 AM
High Percentage of Soft Clipping, BWA-MEM, MiSeq logicthief Bioinformatics 9 09-11-2014 06:19 PM
Soft-clipping in Stampy msalm Bioinformatics 0 01-23-2014 04:41 AM
unmapped vs soft-clipping CNVboy Bioinformatics 2 04-08-2012 11:41 PM

Reply
 
Thread Tools
Old 02-03-2015, 06:34 AM   #1
svos
Member
 
Location: Germany

Join Date: Feb 2014
Posts: 16
Default No soft-clipping in BWA 0.7.10 anymore?

Hi all!

After moving from bwa version 0.5.8c to 0.7.10 I discovered differences that look like errors in the newer version: it seems as BWA doesn't perform soft-clipping anymore, resulting in false positive variant calls:



Is that true? I could find parameters to disable soft-clipping, but no parameter to explicitly turn on soft-clipping.

Both version were used with default parameters (both, "bwa aln" and "bwa sampe").


Any help would be appreciated!

Thank you in advance,


Sebastian
svos is offline   Reply With Quote
Old 02-03-2015, 06:46 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

Please don't cross-post on here, biostars and the BWA mailing list.
dpryan is offline   Reply With Quote
Old 02-03-2015, 02:31 PM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

Given you have already cross posted, please at take the time to add the URLs here (and there) for cross referencing. People here don't like to waste their volunteered time repeating an answer you've already heard on another forum/platform.

BioStars duplicate: https://www.biostars.org/p/129443/

Mailing list duplicate: http://sourceforge.net/p/bio-bwa/mai...sage/33329232/ where Heng Li replied that it was a bug fixed in 0.7.12

Last edited by maubp; 02-04-2015 at 08:40 AM. Reason: Adding links
maubp is offline   Reply With Quote
Old 02-04-2015, 08:32 AM   #4
Zaag
Senior Member
 
Location: Amsterdam

Join Date: Nov 2009
Posts: 112
Default

use bwa-mem
Zaag is offline   Reply With Quote
Old 02-04-2015, 09:12 AM   #5
svos
Member
 
Location: Germany

Join Date: Feb 2014
Posts: 16
Default

First, sorry for cross-posting!

Second, I use reference genome GRCh37, so this bug should not make any difference for me. I think this is another problem.

Third, bwa mem didn't work at all for our data (2x100bp), it resulted in obviously wrong mapping of reads so I returned to bwa aln and bwa sampe....


To be more clear: I do have soft-clipped alignments im my SAM file, but I would expect the reads in the screenshot to be soft-clipped, too. Or am I wrong??
svos is offline   Reply With Quote
Old 02-04-2015, 09:17 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,814
Default

Check your preferences for "alignments" in IGV (if the screenshot is from IGV): http://www.broadinstitute.org/igv/Pr...ces#Alignments

You have likely selected "show" soft-clipped bases.

Last edited by GenoMax; 02-04-2015 at 09:20 AM.
GenoMax is offline   Reply With Quote
Old 02-04-2015, 11:39 PM   #7
svos
Member
 
Location: Germany

Join Date: Feb 2014
Posts: 16
Default

No, soft-clipped alignments are not shown in IGV. Additionally, here is an entry of the SAM file:


HISEQ:136:C5L2YANXX:3:1104:20703:26673 81 chr7 100682889 25 100M = 100682893 -96 GGGAACCTACAACTGCTGAAGGTACCAGCATGCGAATCTCAACTCCTAGTGATGGAAGTACTCCATTAACAAGTATACTTGTCAGCACCCTGCCAGTGGC FC0F>GGGGGGEF@BFGGGGGGGGGGGGGDGGFFGFCGEFF>E: DGGGGGFCGGEGGEF<F=GGFGGGGGGGFGGGGGGGGGGGGGGGGFCEGGFBBBBB X0:i:1 X1:i:0 MD:Z:0C0T0T0C0T95 XG:i:0 AM:i:25 NM:i:5 SM:i:25 XM:i:5 XO:i:0 XT:A:U


The CIGAR string tells me that there was no soft-clipping, although 100M isn't correct either (?). Is it possible that its due to the insert size (-96)? At least this is were the mismatched bases are coming from, as the DNA fragment was shorter than the read length is (2x101bp)...
svos is offline   Reply With Quote
Old 02-05-2015, 12:27 AM   #8
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

The sequence is 100 bases long, so 100M is correct, though I'm not sure how a tlen of 96 would be possible given that. This does seem a bit like a bug. If you can whittle this down to be just a hand full of reads and that's sufficient to reproduce things, then consider filing a bug report on github.
dpryan is offline   Reply With Quote
Old 02-10-2015, 07:19 AM   #9
Zaag
Senior Member
 
Location: Amsterdam

Join Date: Nov 2009
Posts: 112
Default

Quote:
Originally Posted by svos View Post
To be more clear: I do have soft-clipped alignments im my SAM file, but I would expect the reads in the screenshot to be soft-clipped, too. Or am I wrong??
I guess 4 mismatches at the end of the read is less penalty than clipping of the 4. What are the qualities of the 4 bases?
Zaag is offline   Reply With Quote
Old 02-10-2015, 07:21 AM   #10
Zaag
Senior Member
 
Location: Amsterdam

Join Date: Nov 2009
Posts: 112
Default

Quote:
Originally Posted by svos View Post
The CIGAR string tells me that there was no soft-clipping, although 100M isn't correct either (?). Is it possible that its due to the insert size (-96)? At least this is were the mismatched bases are coming from, as the DNA fragment was shorter than the read length is (2x101bp)...
M stands for match or mismatch in the CIGAR

Zaag is offline   Reply With Quote
Old 02-10-2015, 08:33 AM   #11
svos
Member
 
Location: Germany

Join Date: Feb 2014
Posts: 16
Default

Quote:
Originally Posted by Zaag View Post
M stands for match or mismatch in the CIGAR

Yes, you are right! M means alignment match, not sequence match... Quality values are good (>30).


Just to understand soft-clipping correctly: Every base at (both) ends of a read that does not match to the reference sequence anymore should be soft-clipped, right??
svos is offline   Reply With Quote
Old 02-10-2015, 08:42 AM   #12
Zaag
Senior Member
 
Location: Amsterdam

Join Date: Nov 2009
Posts: 112
Default

Not every base.

BWA gives every possible aligmments a score and I can imagine that having 4 high quality mismatches at the end of the read yields a higher score then clipping of the 4 bases;

if the quality is below 10 (or there are a lot of bases) it would really surpirse me if they don't get clipped.
Zaag is offline   Reply With Quote
Old 02-10-2015, 09:10 AM   #13
svos
Member
 
Location: Germany

Join Date: Feb 2014
Posts: 16
Default

Thank you Zaag for explaining this. I will pay attention to these bases and filter them out by manual alignment inspection!

As some default parameter values changed during the versions, this might be due to that...
svos is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:12 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO