SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Hiseq 2000 paired-end capture data analysis problem-too many variants! (http://seqanswers.com/forums/showthread.php?t=6364)

lazyworm 08-11-2010 09:44 AM

Hiseq 2000 paired-end capture data analysis problem-too many variants!
 
Hi,
We are trying to analysis a Hiseq 2000 paired-end whole exome capture sequencing data. The quality of the data is very good. We get an average depth of coverage around 120x. The fastq files looks perfect. We used bwa for paired end alignment. Picard to remove duplicates and Samtools for variant calling. The problem we have now is that there are too many SNV and indel variants from this data, around 140,000 SNVs and Indels after filtration (mapping quality>=45, read depth>=10 and standard varFilter in Samtools). I just wonder if somebody else on this board are doing similar data analysis. How many SNV and Indel you got? Can BWA and Samtools be used on Hiseq data? Or if there are some other software we should try? Any other information we should know about Hiseq data?

Thanks

Lee Sam 08-11-2010 10:03 AM

Have you done verification against dbSNP? Have you filtered down your candidates to just those within known exons after alignment (often times PE reads "splash over" into intronic regions where variation is likely more liberally tolerated)?

EDIT: I'm actually really curious to hear about how many reads and read length and how many lanes you ran the sample on. We just got our HiSeq2k installed last week and we're running our first samples though it. Details would be fantastic.


All times are GMT -8. The time now is 06:31 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.