SEQanswers

Go Back   SEQanswers > Applications Forums > Sample Prep / Library Generation



Similar Threads
Thread Thread Starter Forum Replies Last Post
High number of optical duplicates on MiSeq mareen_engel Illumina/Solexa 4 02-06-2015 12:34 AM
On the optimal trimming of high-throughput mRNA sequence data peromhc Literature Watch 0 03-18-2014 04:05 AM
huge duplicates or high expression yuliu RNA Sequencing 1 08-16-2013 03:34 PM
ChIP-Seq: Enabling Data Analysis on High-Throughput Data in Large Data Depository Usi Newsbot! Literature Watch 0 12-15-2010 03:00 AM

Reply
 
Thread Tools
Old 05-14-2015, 05:38 PM   #1
PhDstudent
Junior Member
 
Location: Ottawa

Join Date: Aug 2014
Posts: 3
Unhappy High duplicates in mRNA-seq data

I extracted total RNA from drug and vehicle treated primary neurons (mouse) and used Kapa Stranded mRNA-Seq kit to generate libraries.

Goal is differential expression analysis - primarily looking at roughly 60 neuronal genes and also a more general effect of our drugs on transcriptional output of neuronal genes.

Input RNA: 1.5ug, PCR cycles - 8x - RNA RIN was always over 8 with good electropherogram trace

Sequencing info: Illumina HiSeq2100 - 5 libraries multiplexed into 1 lane.

So the problem: between 55-60% duplication rate for all libraries - very consistent across the board. The highest number of duplicates are from poly-A and poly-T tracts according to QC data from the sequencing core.

I could really use some advice here. Is this rate of duplication a problem for a DE experiment such as this? What rate of duplication would be more acceptable?

Thanks so much for any input, I'm really worried that my whole PhD project is toast...
PhDstudent is offline   Reply With Quote
Old 05-15-2015, 12:01 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,449
Default

This sounds fairly typical, one expects a high level of apparent duplications in RNAseq. Note that I wrote "apparent duplications", since these are likely not real PCR or optical duplicates. A bias toward the 3' end is also not that uncommon, at least if you did any polyA enrichment (I'm not familiar with the kapa kit).

BTW, it's a bit premature to worry that your PhD is toast after one experiment (hint, most experiments don't work).
dpryan is offline   Reply With Quote
Old 05-15-2015, 12:20 AM   #3
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 304
Default

Quote:
Originally Posted by dpryan View Post
Note that I wrote "apparent duplications", since these are likely not real PCR or optical duplicates
Slightly off-topic... I've been wondering why Illumina or any other company didn't commercialize a library prep kit where each read gets its own random barcode. In principle it shouldn't be that difficult to generate adapters with a random kmer long enough to distinguish millions of reads. Not saying that it's going to be easy in practice but this issue of what to do with positional duplicates recurs so often and it seems to me that any work around it is not ideal.
dariober is offline   Reply With Quote
Old 05-15-2015, 12:36 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,449
Default

In a sense that's what 10x is doing, but for whole genome sequencing, so presumably it's possible.
dpryan is offline   Reply With Quote
Old 05-15-2015, 03:18 AM   #5
nucacidhunter
Senior Member
 
Location: Iran

Join Date: Jan 2013
Posts: 889
Default

Quote:
Originally Posted by dariober View Post
Slightly off-topic... I've been wondering why Illumina or any other company didn't commercialize a library prep kit where each read gets its own random barcode. In principle it shouldn't be that difficult to generate adapters with a random kmer long enough to distinguish millions of reads. Not saying that it's going to be easy in practice but this issue of what to do with positional duplicates recurs so often and it seems to me that any work around it is not ideal.
At least there is a kit that has implemented molecular tagging but I can think of few reasons for less wide adaptation of this approach:
1- With majority of current kits, adapter ends that ligate to insert are double stranded thus using random sequences would result in less complementary ends and low ligation efficiency
2- It seems logical approach at first look but the practical value of such approach is questionable. For more info look at these: http://journals.plos.org/plosone/art...l.pone.0119123 and http://www.pnas.org/content/109/21/E1330.full
nucacidhunter is offline   Reply With Quote
Old 11-28-2016, 02:02 AM   #6
aleferna
Senior Member
 
Location: sweden

Join Date: Sep 2009
Posts: 121
Default % of duplicates per gene

One thing I've looked at is the % of duplicates per gene. If you have a high number of duplicates only in a few genes you should be fine, but if you have low expression genes with high duplication then you should look a bit more closely into this, you might have PCR amplification biases. This all is relative to PE and coverage but calculating the % of duplicates per gene (as opposed to library total) should help elucidate if you have a problem or not.

Check this out:
http://www.nature.com/articles/srep25533

By the way here they use the "random" barcode method mentioned above (better known as a UMI or unique molecular identifier)

Last edited by aleferna; 11-28-2016 at 02:05 AM.
aleferna is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:57 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO