Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Amplicon Sequencing Quality control and downstream analysis Mucki0815 Bioinformatics 1 10-03-2012 11:05 PM
handy tool for exome sequencing quality control mrfox Bioinformatics 1 07-31-2012 01:48 PM
small rna sequencing sample quality control entomology RNA Sequencing 2 01-16-2012 03:59 PM
The quality control index or parameter of the whole genome de novo sequencing wave001 Bioinformatics 0 06-25-2010 08:28 AM
Volunteers wanted! Sequencing Quality Control Project (SEQC) Joann Events / Conferences 2 10-09-2009 03:24 AM

Thread Tools
Old 08-07-2012, 07:23 AM   #1
Junior Member
Location: Netherlands

Join Date: Feb 2012
Posts: 3
Default filtering strategy in exome sequencing and quality control

To filter exome sequence data and remove false positive I know read depth and Phred score are routinely applied.

But there are following items (related to quality) which I would like to know is there any threshold/cut off for them? and In which step of filtration strategy I should applied them?

GC = GC content within 20 bp +/- the variant

FS = Phred-scaled p-value using Fisher's exact test to detect strand bias. If the reference‐carrying reads are balanced betweenforward and reverse strands then the alternate‐carrying reads should be as well

HRun = Largest Contiguous Homopolymer Run of Variant Allele In Either Direction

HW = Phred-scaled p-value for Hardy-Weinberg violation. Extreme variations on heterozygous calls indicate a false positive call

HaplotypeScore = Consistency of the site with at most two segregating haplotypes (Probability that the reads in a window around the variant can be explained by at most two haplytopes)

MQ0Fraction = RMS (Root Mean Square, also known as quadratic mean) Mapping Quality. Regions of excessively low mapping quality are ambiguously mapped and variants called within are suspicious

MQRankSum = Z-score from Wilcoxon rank sum test of Alt vs. Ref read mapping qualities. If the alternate bases are more likely to be found on reads with lower MQ than reference bases then the site is likely mismapped

QD = Variant confidence/quality by depth

ReadPosRankSum = Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias. If the alternate bases are biased towards the beginning or end of the reads then the site is likely a mapping artifact

SB = Strand Bias

BaseQualityRankSumTest = The u-based z-approximation from the Mann-Whitney Rank Sum Test for base qualities (ref bases vs.bases of the alternate allele).
Groningen is offline   Reply With Quote
Old 03-12-2013, 12:40 PM   #2
Location: Canada

Join Date: Sep 2012
Posts: 21


If you're using GATK, you can check their bestpractices. They use "hard filters" like the ones you would be using for smaller experiments (in terms of either low depth or small # samples)

You should apply these hard filters after you've called your variants in either unified genotyper or haplotype caller.

For larger studies (30 or more exomes), you don't need to apply these "hard filters" but incorporate these annotations to perform variant quality score recalibration. You would reclaibrate variant scores after you've called your variants.
chongm is offline   Reply With Quote

exome sequencing, filtering

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 03:00 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO