SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
High duplicates in ChIP samples (lib prep issue?) vadoue Sample Prep / Library Generation 5 05-04-2015 06:52 AM
huge duplicates or high expression yuliu RNA Sequencing 1 08-16-2013 02:34 PM
High number of N in reads papori Illumina/Solexa 3 01-22-2013 05:21 AM
RNAseq - removing optical duplicates only BENGwins Bioinformatics 1 11-14-2012 08:37 AM
Maximum number of cycles from 300-cycle MiSeq kit... ECO Illumina/Solexa 13 11-09-2012 04:51 AM

Reply
 
Thread Tools
Old 02-05-2015, 03:01 AM   #1
mareen_engel
Junior Member
 
Location: europe

Join Date: Nov 2014
Posts: 3
Default High number of optical duplicates on MiSeq

Dear all,

we've run a Illumina TruSeq mRNA nonstranded library (mouse brain lower end of input recommendation) on a MiSeq for quality control with v3 Reagent Kit, 2*75 PE. The run has very good QCs, good cluster density and some 22 Mio reads.

After pairing the reads, Picard MarkDuplicates reports more then 3 Mio optical duplicates (i.e. duplicates less then 100pixel apart on the flow cell, similar results for 10pixel) next to roughly the same amount of "real" PCR duplicates (50% of QC20 reads after throwing the optical duplicates).
If analyzing the SE data without pairing, the number of optical duplicates is reduced to 0, thus it seems not to be a cluster-read failure.

For me, this indicates that there's 3 Mio clusters on the flowcell next to a cluster with their reverse complement strand which seems to be much above chance level even with a high PCR-duplicate RNA-Seq?

Potential explanations put forward by the representative contacted are incomplete library denaturation prior to loading (thus partially dsDNA library molecules hybridize to the flowcell and build two reverse complement clusters) or low complexity of the library (however, 3 Mio sounds really above chance level for me, the comeplexity of the library doesn't seem to be that bad?)

Has anybody ever seen this? Any ideas what is causing this high number of optical duplicates?
Or is this number of duplicates simply expected on a standard RNA-Seq on low input (~200ng)?

Any ideas or recommendations are very much appreciated!
Many thanks!

Regards,
Mareen



Edit1:
Ps.: Not sure which heading this thread should go to, please move if you have a better idea. Thanks!
Edit2:
library PCR was 12 cycles

Last edited by mareen_engel; 02-05-2015 at 03:13 AM.
mareen_engel is offline   Reply With Quote
Old 02-05-2015, 07:38 AM   #2
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

I don't know -- I don't really trust programs that claim to be able to find optical or PCR "duplicates". How do these programs know they are not just biological duplicates? RNA will have a highly non-normal distribution with the possibility of just a few transcripts making up a large percentage of the library. If there is any fragmentation/ligation bias in library construction then you will see lots of the same sequence popping up irrespective of any instrument/PCR duplication.

--
Phillip
pmiguel is offline   Reply With Quote
Old 02-05-2015, 01:04 PM   #3
mareen_engel
Junior Member
 
Location: europe

Join Date: Nov 2014
Posts: 3
Default

Thank you Phillip.
While I agree that we may sometimes tend to overrate the problem of duplicates in RNA-Seq, I think it's a slightly different thing here, where I do see 3 Mio duplicates exactly next to each other on the flowcell with overall duplicate numbers which in my eyes are not high enough to render this statistically possible? Usually we see some 0-low hundreds of those optical duplicates even in our low-complexity runs?
For me this sounds like a problem in library prep/denaturation/clustering, would you disagree?

Best, Mareen
mareen_engel is offline   Reply With Quote
Old 02-05-2015, 04:10 PM   #4
austinso
Member
 
Location: Bay area

Join Date: Jun 2012
Posts: 77
Default

Quote:
Originally Posted by mareen_engel View Post
Thank you Phillip.
While I agree that we may sometimes tend to overrate the problem of duplicates in RNA-Seq, I think it's a slightly different thing here, where I do see 3 Mio duplicates exactly next to each other on the flowcell with overall duplicate numbers which in my eyes are not high enough to render this statistically possible? Usually we see some 0-low hundreds of those optical duplicates even in our low-complexity runs?
For me this sounds like a problem in library prep/denaturation/clustering, would you disagree?

Best, Mareen
We've seen similar things, and tended to ignore dedup'ing for our particular application, which drove everybody (well...just the keyboard warriors) apesh*t.

Doing library denaturation with NaOH for 2 min at 96C as recommended for TruSeq libraries (http://support.illumina.com/content/...15027983-c.pdf, p.36) did the trick in removing these duplicates.

A.

Last edited by austinso; 02-05-2015 at 04:12 PM. Reason: clarification
austinso is offline   Reply With Quote
Old 02-05-2015, 11:34 PM   #5
mareen_engel
Junior Member
 
Location: europe

Join Date: Nov 2014
Posts: 3
Default

Thansk austinso,

I think denaturation at a higher temperature sounds like a very good idea to tackle denaturation problems. (we did the standard NaOH denaturation 5min at RT, then mix with cold HT1).
Just to be safe: for the TruSeq RNA libraries stored as dsDNA, you suggest to mix with NaOH, 5min RT, ice, mix with cold HT1, 2min 95deg, ice, pool, load.
Woudl you suggest doing the NaOH before storage in later library preps as they suggest to do with the SGP plate?

Thanks, Mareen
mareen_engel is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:25 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO