SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
duplicate reads removal vasvale Bioinformatics 19 01-08-2015 12:59 AM
Duplicate Reads myronpeto Bioinformatics 7 03-07-2013 07:36 AM
WGA and Duplicate Reads Expo Illumina/Solexa 0 10-29-2011 04:35 AM
repetitive/duplicate reads cedance Bioinformatics 7 03-16-2011 01:12 AM
duplicate reads in ChIPSeq tec Illumina/Solexa 7 10-08-2009 04:23 AM

Reply
 
Thread Tools
Old 10-17-2011, 10:10 PM   #1
qiudao
Member
 
Location: TX

Join Date: May 2008
Posts: 23
Default how were the duplicate reads generated?

Hi Guys,
I am doing some chip-seq assay. I am wondering how the duplicate reads were generated.
I know the sonication will generate random fragments. However, there is a step of 18 cycles PCR to amplify the library DNA. So, for each fragment, there should be 18 copies. i.e. each fragments should be ideally duplicate 18 times. If this is the case, why we need to remove them? and why we generally can not see 18 copies of the same reads after mapping?

Thank you.

-Q
qiudao is offline   Reply With Quote
Old 10-17-2011, 10:42 PM   #2
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

You need to read up on PCR. Moving this to general.
ECO is offline   Reply With Quote
Old 10-18-2011, 06:15 AM   #3
qiudao
Member
 
Location: TX

Join Date: May 2008
Posts: 23
Default

ECO,
I have done PCR for years. But I still don't know what I was missing.
PCR was used to amplify template, and this is not a random priming; and of course, the final result will contain similar copies of template, i.e for every template which was amplified should have at least one duplicate read. Why people said we should see very small amount of duplicate reads?
Could you elaborate on this? Really confused.
Thank you.
-Q

Last edited by qiudao; 10-18-2011 at 07:42 AM.
qiudao is offline   Reply With Quote
Old 10-18-2011, 10:53 AM   #4
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

First, the "CR" in PCR stands for "Chain Reaction". That is each new product strand created in one cycle becomes a potential template strand for subsequent cycles. Hence, it is theoretically possible that the number of amplified molecules will double each cycle. So 18 cycles could result in 2^18 (two raised to the eighteenth power) fold increase in the amount of template initially present. That is roughly a 500 thousand fold increase in the initial amount of template. Not 18x.

Well, I'll just leave it that, for now.

--
Phillip
pmiguel is offline   Reply With Quote
Old 10-18-2011, 10:59 AM   #5
qiudao
Member
 
Location: TX

Join Date: May 2008
Posts: 23
Default

Hi Phillip,
thank you for your reply. That's part is what I am confused about. Since we have so many replicates of the original templates. I assume we should see a lot of duplicate reads for that amplified templates. Why do people always say, we should only see a small portion of duplicate reads?
Thanks.
-Q
qiudao is offline   Reply With Quote
Old 10-18-2011, 11:13 AM   #6
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Well how many total sequence reads did you generate?

--
Phillip
pmiguel is offline   Reply With Quote
Old 10-18-2011, 11:16 AM   #7
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

Quote:
Originally Posted by qiudao View Post
Hi Phillip,
thank you for your reply. That's part is what I am confused about. Since we have so many replicates of the original templates. I assume we should see a lot of duplicate reads for that amplified templates. Why do people always say, we should only see a small portion of duplicate reads?
Thanks.
-Q

As for why you shouldn't see a lot of duplicate reads, consider the size of your genome vs. the number of sequence tags. If you assume random shearing and equal recovery of every fragment, then you're sampling only a subset of the fragments (so duplicates should be rare).

PCR does generate duplicates; however, consider the number of molecules you're sequencing vs. the number of molecules in your library. Again, you're sampling only a small subset of the total.
HESmith is offline   Reply With Quote
Old 10-18-2011, 11:28 AM   #8
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

There are two different PCR reactions involved in most sequencing technologies. Figure out when they occur and you will have your answer.
westerman is offline   Reply With Quote
Old 10-18-2011, 11:35 AM   #9
qiudao
Member
 
Location: TX

Join Date: May 2008
Posts: 23
Default

Hi HESmith,

That's the answer I am looking for. Thank you very much.
Thank you pmiguel too.
the PCR actually increase the chance of sampling whole population (individual read).

-Q
qiudao is offline   Reply With Quote
Old 10-18-2011, 05:39 PM   #10
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Quote:
Originally Posted by qiudao View Post
.
the PCR actually increase the chance of sampling whole population (individual read).

-Q
No, that's not correct. If anything, the fewer fragments you have, the more likely you will sequence the whole population. What PCR does do is help ensure that you submit the correct amount of properly prepared DNA for the sequencing itself to function optimally, and for some applications it helps you enrich for the types of fragments that you specifically want to sequence.
Heisman is offline   Reply With Quote
Old 10-19-2011, 03:41 AM   #11
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by Heisman View Post
No, that's not correct. If anything, the fewer fragments you have, the more likely you will sequence the whole population. What PCR does do is help ensure that you submit the correct amount of properly prepared DNA for the sequencing itself to function optimally, and for some applications it helps you enrich for the types of fragments that you specifically want to sequence.
It also moves you out of the alien sub-nanogram "90% of my sample bound to the plastic in my pipette tip" realm into the more familiar microgram realm where I have plenty of sample to spare.

--
Phillip
pmiguel is offline   Reply With Quote
Old 10-19-2011, 06:10 AM   #12
qiudao
Member
 
Location: TX

Join Date: May 2008
Posts: 23
Default

Hesiman,
Sorry for the confusion, the "whole population", I actually mean the whole genome we tried to sequence or the target region we tried to sequence. Thank you for your clarification anyway.
qiudao is offline   Reply With Quote
Old 10-24-2011, 06:44 PM   #13
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 250
Default

Quote:
Originally Posted by qiudao View Post
Hi Guys,
I am doing some chip-seq assay.
Poor yielding reactions are the problem with Chip-seq(Chromatin Immuno-precipitation sequencing). So even if the genome is huge, the amount that was recovered from that reaction is probably small, which you subsequently amplified.

Its like taking a 16 bit number modulo 256 multiplying the remainder by 256, sure it has approximately the same magnitude at the end, but you lost something.
rskr is offline   Reply With Quote
Old 10-25-2011, 04:15 AM   #14
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by rskr View Post
Its like taking a 16 bit number modulo 256 multiplying the remainder by 256, sure it has approximately the same magnitude at the end, but you lost something.
This will be really helpful for the person unclear on the limitations imposed by limited starting sample on a sequence data set, but adept at modulo arithmetic. Thanks rskr for providing clarity to this often overlooked minority! (If they exist at all, I suppose they would be reading this site...)

--
Phillip
pmiguel is offline   Reply With Quote
Old 10-25-2011, 08:16 AM   #15
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 250
Default

Quote:
Originally Posted by pmiguel View Post
This will be really helpful for the person unclear on the limitations imposed by limited starting sample on a sequence data set, but adept at modulo arithmetic. Thanks rskr for providing clarity to this often overlooked minority! (If they exist at all, I suppose they would be reading this site...)

--
Phillip
Ah tough crowd. For my next topic I will address why log ratios amplify noise in the expressions near zero.
rskr is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:35 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO