SEQanswers

Go Back   SEQanswers > Applications Forums > Genomic Resequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
GATK Base Quality Recalibration ashuchawla Bioinformatics 5 02-09-2012 10:12 AM
GATK: Base quality score recalibration fabrice Bioinformatics 2 11-05-2011 03:54 AM
Gatk multiSample realignment and recalibration seq_GA Bioinformatics 5 06-15-2011 01:02 AM
GATK recalibration score and local realignment: what is the first step? m_elena_bioinfo Bioinformatics 0 01-24-2011 02:59 AM
BAQ and quality recalibration with GATK csoong Bioinformatics 4 01-01-2011 11:26 PM

Reply
 
Thread Tools
Old 08-22-2011, 09:19 PM   #1
trickytank
Member
 
Location: Melbourne

Join Date: Dec 2010
Posts: 19
Exclamation Novoalign with GATK recalibration

Hey everyone,

I've been using Novoalign to map Illumina reads (TruSeq capture, HiSeq paired-end sequencing), then using GATK base quality recalibration to hopefully get better results. But you strangely get both ends of reads with very high reported quality scores after GATK base quality recalibration. Novoalign features its own recalibration which does not show these same effects, but if you use GATK base quality recalibration then once again very high quality scores at both ends are observed.

These quality scores, particularly at the 3' end doesn't seem right for this data. In addition the same effects are not seen from a BWA alignment of the same data. I have seen the same effect on each dataset I have tried this out on so far. (All TruSeq, HiSeq) As the -H option was set in Novoalign many reads were trimmed (as much as leaving only 16 bases), and these effects are still observed after removing all trimmed reads.

Novoalign mapped reads (without Novoalign recalibration) before GATK recalibration


Novoalign mapped reads (without Novoalign recalibration) after GATK recalibration


Novoalign mapped reads (with Novoalign recalibration) before GATK recalibration


Novoalign mapped reads (with Novoalign recalibration) after GATK recalibration


BWA mapped reads before GATK recalibration


BWA mapped reads after GATK recalibration


Uploaded with ImageShack.us

My pipeline has been:
alignment, sort/order, FastQC, Duplicate removal (MarkDuplicates), GATK base quality recalibration, FastQC. I would've had the first FastQC step after but has been easier to implement in this case, and I'm not thinking it would be hiding anything (duplicate levels ~17%).

Any enlightenment would be appreciated.

Last edited by trickytank; 08-23-2011 at 04:16 PM. Reason: labels wrong on figures
trickytank is offline   Reply With Quote
Old 08-23-2011, 07:12 AM   #2
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Hi Trickytank,

We did a similar study but looked at dbSNP concordance rather than the FASTQC quality profile.

I think your figure legends (from top to bottom) in figures 2 and 4 probably mean "After" calibration. Is that right?

This is quite an interesting observation why GATK would do this.
zee is offline   Reply With Quote
Old 08-23-2011, 04:20 PM   #3
trickytank
Member
 
Location: Melbourne

Join Date: Dec 2010
Posts: 19
Default

Quote:
Originally Posted by zee View Post
I think your figure legends (from top to bottom) in figures 2 and 4 probably mean "After" calibration. Is that right?
Thanks, I've fixed that now.

Do you have a link/article for your study?
trickytank is offline   Reply With Quote
Old 08-23-2011, 06:58 PM   #4
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Tricktank,
There could be a simple explanation. Novoalign can clip alignments, trimming them back to the best local alignment. This means a mismatch in the last few bases is likely to be clipped.
Novoaligns quality calibration works on alignments before clipping so it won't show this affect.
Clipping is done to improve accuracy of SNP calling. With dynamic programming algorithms like Smith-Waterman and Needleman-Wunsch there are often suboptimal alignments that only differ slightly in score from the optimal alignment. This especially happens near the ends of alignments. For example an true indel of 1bp in the last few bp of a read may be aligned as mismatches. The clipping ensures there are enough matching bases after a SNP or Indel to ensure the alignment is optimum.
Clipping can be turned off with the option -o FULLNW

Colin
sparks is offline   Reply With Quote
Old 08-23-2011, 07:16 PM   #5
trickytank
Member
 
Location: Melbourne

Join Date: Dec 2010
Posts: 19
Default

Hey thanks for that. By the sounds of it I would be better off not using:
-o FULLNW

I found that the number of SNP variants changes very little <30 of ~100,000 conditioning on depth >4 when using GATK and having used Novoalign recalibration.
Using BWA that GATK changes the SNP variants by around 1,000~2,000. I'm thinking to just not use GATK recalibration on Novoalign runs.
trickytank is offline   Reply With Quote
Old 08-23-2011, 09:00 PM   #6
trickytank
Member
 
Location: Melbourne

Join Date: Dec 2010
Posts: 19
Default

I'm going to try the -o FULLNW option to see if it removes what I have observed.
trickytank is offline   Reply With Quote
Old 08-23-2011, 09:00 PM   #7
trickytank
Member
 
Location: Melbourne

Join Date: Dec 2010
Posts: 19
Default

I'm going to try the -o FULLNW option to see if it removes what I have observed. I'll post my results here.
trickytank is offline   Reply With Quote
Old 08-23-2011, 10:44 PM   #8
trickytank
Member
 
Location: Melbourne

Join Date: Dec 2010
Posts: 19
Default

to clarify, does this mean by default Novoalign clips mismatches at the ends of reads which are not seen in the reference index?
trickytank is offline   Reply With Quote
Old 08-24-2011, 01:34 AM   #9
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Yes, mismatches near the ends of the alignment will be clipped so that best local alignment is reported. It doesn't seem right if all we had was SNPs but if our sample includes indels and structural variations and these occur near the ends of the read then they may get aligned as mismatches. This can then cause erroneous SNP calls. Clipping avoids this problem and improves specificity of SNP & Indel calls but it may reduce sensitivity a bit.
It would be interesting to see effect of clipping on dbSNP concordance, we haven't done this yet.
sparks is offline   Reply With Quote
Old 09-05-2011, 10:14 PM   #10
trickytank
Member
 
Location: Melbourne

Join Date: Dec 2010
Posts: 19
Red face Using the -o FULLNW option

And with the -o FULLNW, the FastQC plots are no longer worrying.

Novoalign with recalibration and -o FULLNW option, before GATK recalibration BAM file:

By trickytank at 2011-09-05

Novoalign with recalibration and -o FULLNW option, after GATK recalibration BAM file:

By trickytank at 2011-09-05

I was under the impression that BAQ implemented in SAMtools is designed to overcome the problems of misalignments caused by indels near the ends of reads, and shouldn't effect sensitivity as much as clipping at the alignment stage? (Local realignment around indels also seems like an alternative too.)
trickytank is offline   Reply With Quote
Old 09-12-2011, 07:50 PM   #11
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Local realignment should help if you use -o FullNW, I haven't looked into this. It would be interesting to see effect on dbSNP concordance.
We added soft clipping before these tools were readily available.
sparks is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:48 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO