SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Maximum possible coverage depth junfeng Bioinformatics 1 01-18-2012 08:54 AM
regarding depth and coverage of rna seq jyoshna.jo RNA Sequencing 0 11-18-2011 09:11 PM
depth of coverage outputs gatk m_elena_bioinfo Bioinformatics 0 09-07-2011 02:58 AM
About the read depth of coverage El Mariachi Illumina/Solexa 2 12-30-2010 12:22 AM
low 454 coverage combined with high solexa coverage strob Bioinformatics 7 10-07-2010 10:14 AM

Reply
 
Thread Tools
Old 11-17-2009, 07:15 AM   #1
knott76
Junior Member
 
Location: Florida

Join Date: Sep 2008
Posts: 2
Default Very high depth of coverage

I have done Illumina GAII sequencing that involved tiled long-range PCR products over a 200kb region of genomic DNA.
Even with multiplexing within lanes, the output of sequencing gives me an average of 1500X coverage of the region per individual (some regions up to 3000X).
What would be the best tool to do alignment and accurately call variants with this type of coverage?
I have used CLC Genomics Workbench, and alignment is OK, but during SNP calling many apparent false positive variants are detected (for example, in a 1000X coverage region 950 A calls, 50 C calls). 50 calls seems like a lot to be error, but independent data (SNP genotyping and Sanger sequencing) call the region homozygous.
Are there programs better equipped for this type of very deep coverage? Thanks.
knott76 is offline   Reply With Quote
Old 11-18-2009, 06:09 AM   #2
What_Da_Seq
Member
 
Location: RTP

Join Date: Jul 2008
Posts: 28
Default

It is of course a statistical problem. What if you adjust your read coverage (not your proportions) to a lower perhaps even consistent level - essentially taking read coverage out of the equation. Just speculating here.
What_Da_Seq is offline   Reply With Quote
Old 11-18-2009, 09:46 AM   #3
dwmohr
Junior Member
 
Location: Baltimore, MD

Join Date: Aug 2008
Posts: 6
Default

Have you tried filtering your sequences for duplicates? We find this essential when dealing with long range pcr libraries. We've used bwa/picard/samtools and the FASTX toolkit/CLC bio with success.
dwmohr is offline   Reply With Quote
Old 11-18-2009, 05:01 PM   #4
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by dwmohr View Post
Have you tried filtering your sequences for duplicates? We find this essential when dealing with long range pcr libraries. We've used bwa/picard/samtools and the FASTX toolkit/CLC bio with success.
How do you identify duplicates if you expect at least two reads to have the same starting position? Even when you enforce both ends must have the same starting position with >1500X coverage you would expect to have two reads have both ends have the same starting position.

Anybody have any other ideas to identify PCR duplicates on high coverage data? I don't think it is possible.
nilshomer is offline   Reply With Quote
Old 11-19-2009, 12:15 AM   #5
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by nilshomer View Post
Anybody have any other ideas to identify PCR duplicates on high coverage data? I don't think it is possible.
I suppose you'd have to take an observed/expected approach. If you know the number and size distribution of your sequences you can work out the likelyhood of exact ovelaps of different depths (assuming reads are randomly distributed). Anything falling too far from the expected range would be suspicious.

You could also maybe look at the ratio of exact overlaps to non-exact overlaps. If you have a region composed mostly of exact overlaps then that's not right for a randomly fragmented library. This should work even with unevenly distributed reads.

Neither of these are going to detect small PCR effects, but normally we'd expect that when the PCR goes wrong it often goes very wrong - and those are the problems we're more interested in sorting out.
simonandrews is offline   Reply With Quote
Old 11-19-2009, 12:27 AM   #6
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by simonandrews View Post
I suppose you'd have to take an observed/expected approach. If you know the number and size distribution of your sequences you can work out the likelyhood of exact ovelaps of different depths (assuming reads are randomly distributed). Anything falling too far from the expected range would be suspicious.

You could also maybe look at the ratio of exact overlaps to non-exact overlaps. If you have a region composed mostly of exact overlaps then that's not right for a randomly fragmented library. This should work even with unevenly distributed reads.

Neither of these are going to detect small PCR effects, but normally we'd expect that when the PCR goes wrong it often goes very wrong - and those are the problems we're more interested in sorting out.
That should work. I am also thinking about clonal reads for SOLiD data. In this case, it wont be as bad as when things go wrong with PCR in prep.
nilshomer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:30 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO