SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > SOLiD



Similar Threads
Thread Thread Starter Forum Replies Last Post
duplicate reads removal vasvale Bioinformatics 19 01-08-2015 01:59 AM
Duplicate Reads myronpeto Bioinformatics 7 03-07-2013 08:36 AM
WGA and Duplicate Reads Expo Illumina/Solexa 0 10-29-2011 05:35 AM
how were the duplicate reads generated? qiudao General 14 10-25-2011 09:16 AM
duplicate reads in ChIPSeq tec Illumina/Solexa 7 10-08-2009 05:23 AM

Reply
 
Thread Tools
Old 07-02-2010, 12:59 PM   #1
JohnK
Senior Member
 
Location: Los Angeles, China.

Join Date: Feb 2010
Posts: 106
Default Source of duplicate reads and possible ways of reducing:

Hi,

I was pondering some possible sources of duplicate reads as well as ways of reducing them if you were to expand the raw number of sequenced reads. I imagine it's related to the ePCR phase. It'd be nice to reduce the number of duplicates significantly. Any ideas? Thanks!

J
JohnK is offline   Reply With Quote
Old 07-02-2010, 02:01 PM   #2
pzumbo
Member
 
Location: NY

Join Date: Mar 2009
Posts: 11
Default

remove pcr entirely!
pzumbo is offline   Reply With Quote
Old 07-06-2010, 06:47 AM   #3
JohnK
Senior Member
 
Location: Los Angeles, China.

Join Date: Feb 2010
Posts: 106
Default

that's funny.
JohnK is offline   Reply With Quote
Old 07-09-2010, 02:47 PM   #4
snetmcom
Senior Member
 
Location: USA

Join Date: Oct 2008
Posts: 158
Default

it's more likely in your library construction, and not in ePCR. While you can get errors during ePCR, it doesnt make sense that you get over amplification. ePCR takes place inside a microreactor with 1 molecule....that would not explain over representation.

What library type and how much starting material?
snetmcom is offline   Reply With Quote
Old 07-10-2010, 07:09 AM   #5
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

I think pzumbo meant remove any pre-ePCR PCR steps. This should not be necessary, although reducing those PCR steps would probably help. Also look for any "bottlenecks" in the library prep. If the total number of input molecules drops drastically at any step, then you are also drastically reducing the complexity of your library. That is the thing about the pre-ePCR PCR steps--they make everything look okay, but once you sequence, you discover your library is terribly bottomed out...

--
Phillip
pmiguel is offline   Reply With Quote
Old 07-10-2010, 09:34 AM   #6
pzumbo
Member
 
Location: NY

Join Date: Mar 2009
Posts: 11
Default

'traditional' PCR is known to be a cause of duplicate sequences,

FRT-seq: amplification-free, strand-specific transcriptome sequencing (Lira Mamanova, Robert M Andrews, Keith D James, Elizabeth M Sheridan, Peter D Ellis, Cordelia F Langford, Tobias W B Ost, John E Collins & Daniel J Turner)

ePCR is *thought* to be able to remove biases typically associated with traditional PCR.
i'm sure, however, that there is a great divide between practice and theory. In fact, Prüfer et al report

"As previously described, emulsion PCR can produce a substantial number of clusters of identical fragments if a low concentration of DNA is used. We identify these emulsion PCR duplicates using the following algorithm: reads are sorted into buckets according to the first six positive flow values. A new cluster containing two reads from a bucket is formed if these reads have at least 89% sequence similarity over the full length of the shorter read including the 454 adapter sequence. A read is added to an existing cluster if the same condition is met by any one of the sequences in the cluster (single-linkage clustering). The algorithm identified 736,426 of a total of 2,796,944 reads, or 26%, to be duplicates of other sequences." ("Computational challenges in the analysis of ancient DNA")

as said, remove PCR, entirely!
pzumbo is offline   Reply With Quote
Old 07-12-2010, 05:13 AM   #7
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

The so-called duplicate reads in your Prüfer et al excerpt might not result from emPCR at all. They might just be repetitive DNA. Short repetitive elements, like SINEs compose a significant fraction of mammalian genomes.

That said, some of the duplicates probably did come from multiple beads in a single microreactor or background amplicon contamination of their lab.

I think most regard third generation sequencers to be defined by their ability to sequence single molecules. So, we are heading in the no PCR direction.

But for the moment, except for the few locations where single molecule sequencers are placed, limiting the number of cycles of pre-em/bridge PCR will suffice for most.

And there are plenty of biases likely to derive from the new single molecule technologies. So there is a "better the devil you know" argument to be made for using PCR judiciously.

--
Phillip
pmiguel is offline   Reply With Quote
Old 07-14-2010, 03:39 PM   #8
snetmcom
Senior Member
 
Location: USA

Join Date: Oct 2008
Posts: 158
Default

yes. the theoretical is fun to quote, but I live in the real world.
and in the real world , multiple templates in a microreactor result in a weak/mixed signal bead that is easily thrown out by software, and in no way makes repetitive elements.
snetmcom is offline   Reply With Quote
Old 07-14-2010, 03:45 PM   #9
paul z
Junior Member
 
Location: New York, NY

Join Date: Aug 2008
Posts: 7
Default

So, previously published results to the effect that, "emulsion PCR can produce a substantial number of clusters of identical fragments", are a lie? Interesting -- perhaps there are multiple "real" worlds, then?
paul z is offline   Reply With Quote
Old 07-14-2010, 03:49 PM   #10
paul z
Junior Member
 
Location: New York, NY

Join Date: Aug 2008
Posts: 7
Default

also, in the pretend world whereby emulsion PCR produces duplicate fragments, it is thought to be a result, not of multiple templates in a single microreactor, but of emulsion inclusions containing multiple beads.
paul z is offline   Reply With Quote
Old 07-14-2010, 07:48 PM   #11
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Quote:
Originally Posted by paul z View Post
also, in the pretend world whereby emulsion PCR produces duplicate fragments, it is thought to be a result, not of multiple templates in a single microreactor, but of emulsion inclusions containing multiple beads.
How many 1um beads can you get in a reactor that effciently amplify with limited primers/reagents? I would guess that results in far fewer duplicates than the bulk amplification.

See this paper if you haven't already...http://seqanswers.com/forums/showthread.php?t=1370
ECO is offline   Reply With Quote
Old 07-15-2010, 04:11 AM   #12
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Thumbs down

Quote:
Originally Posted by snetmcom View Post
yes. the theoretical is fun to quote, but I live in the real world.
and in the real world , multiple templates in a microreactor result in a weak/mixed signal bead that is easily thrown out by software, and in no way makes repetitive elements.
Trolling now snetmcom? Yes, that would we one way you could go...

Here is what I wrote:

Quote:
That said, some of the duplicates probably did come from multiple beads in a single microreactor or background amplicon contamination of their lab.
Nothing about multiple templates in a single microreactor.
Multiple beads in a single microreactor would create twin beads, identically templated.

--
Phillip
pmiguel is offline   Reply With Quote
Old 07-15-2010, 04:34 AM   #13
nidzee
Junior Member
 
Location: USA

Join Date: Jul 2010
Posts: 1
Default

use CD-hit program to remove duplicates
nidzee is offline   Reply With Quote
Old 07-15-2010, 04:44 AM   #14
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by ECO View Post
How many 1um beads can you get in a reactor that effciently amplify with limited primers/reagents? I would guess that results in far fewer duplicates than the bulk amplification.
It would depend on the diameter of the microreactor, obviously. Not much QC done upon creation of microreactors -- it is a numbers game. Some of the microreactors will be too small to template a single bead. Others will be large enough to hold several. Alternatively, microreactors could coalesce at some point during thermalcycling. Most adjacent microreactors will have no templates. So such a coalescence will frequently result in identically templated beads.

Another way to get a duplicate read, likely well understood by anyone running a sequencing core to produce 3730XL Sanger reads, is by signal bleed. I have heard of this occurring in 454 runs. You could look for these because SOLiD (and 454) reads have coordinates.

Quote:
Originally Posted by ECO View Post
See this paper if you haven't already...http://seqanswers.com/forums/showthread.php?t=1370
Okay, but 454 is already there. Most of their library construction protocols use no pre-emPCR amplification steps. Certainly the Neanderthal genome paper Pzumbo is invoking does not. And, they see duplicate reads.

Also, at the risk of confusing the issue, if you are talking RNA-seq using the Ambion Whole Transcriptome kit, there is another source of "duplicate" reads. That is the RNAase III digestion used to fragment the DNA. It will have strongly biased cleavage sites in most RNAs. But you would certainly not want to remove these reads if you were doing DGE -- it would throw your count off!

--
Phillip
pmiguel is offline   Reply With Quote
Old 07-15-2010, 02:27 PM   #15
VanessaS
Member
 
Location: Dallass

Join Date: Nov 2009
Posts: 49
Default

This is one of my libraries in question, maybe I can shed a little light. This was an Exon Capture library, my first. I followed the protocol, which asks for 12 cycles during the nick translation step. Regular fragment libraries ask for 2-10 depending on the amount of starting DNA, which in this case was 3ug. So, I think that might be a little overkill. You only need .5ug going into hybridization, we ended up with closer to 2ug after nick translation. There is also a 12 cycle post-hyb amplification. Since this was my first go at the exon-cap I did it by the book but for the next batch of samples we are going to reduce the post-hyb amplification for sure, and it will be interesting to see how much this reduces duplication. We ended up with between 10 and 20 ng/ul for each sample, which seems really, really high to me.
VanessaS is offline   Reply With Quote
Old 07-16-2010, 04:40 AM   #16
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Was this using Agilent SureSelect, or some other Exon Capture method?
Was there any DNA fragmentation (by sonication or any other method) at any stage?

--
Phillip
pmiguel is offline   Reply With Quote
Old 07-16-2010, 07:25 AM   #17
VanessaS
Member
 
Location: Dallass

Join Date: Nov 2009
Posts: 49
Default

Yes, its using Agilent SureSelect and the library is sonicated before library prep.
VanessaS is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO