SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
PCR duplicates questions slny Bioinformatics 8 06-07-2011 04:06 AM
PCR duplicates increase when excess of beads tdm SOLiD 10 03-31-2011 09:48 AM
Picard - MakeDuplicates (remove pcr duplicates) dmb Bioinformatics 2 03-16-2011 08:56 AM
Pico Green vs Real Time PCR crndy52 454 Pyrosequencing 5 02-09-2011 01:18 AM
how critical is the filtering of potential PCR duplicates? julien General 3 03-26-2010 10:24 AM

Reply
 
Thread Tools
Old 01-03-2011, 11:40 PM   #1
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 150
Arrow How to differentiate between PCR duplicates and real data?

Hi,

After IMHO intensive search in the internet for help, I must admit that i couldn't find anything too helpful, so I would like to ask here if there is a reasonable explanation or method to do that.

I am working with mRNA-Seq from Drosophila and after running the fastqc software got a very high duplication level.
I explaind it more specifically here (http://seqanswers.com/forums/showthr...1804#post31804), But as I think that it is an important question (at least for me), I would like to ask it separately again here.
It will be nice to get some data from both sides of the analysis. If someone has data of PCR duplications he/she can give, I would like to have a look at it.

Was it expected to have something like that in a RNA-Seq experiment?

How can someone reliably differentiate between the two cases?

Thanks for the help

Assa
frymor is offline   Reply With Quote
Old 09-13-2011, 08:00 PM   #2
son_nexg
Junior Member
 
Location: Australia

Join Date: Jul 2011
Posts: 8
Default

Hi Just to add to the above question - Could estimate amount of PCR duplication in the RNA-seq data ?

Thank you!
son_nexg is offline   Reply With Quote
Old 09-13-2011, 08:15 PM   #3
robs
Senior Member
 
Location: San Diego, CA

Join Date: May 2010
Posts: 116
Default

There are some papers on this topic if you search in Google Scholar and other posts at seqanswer that discussed this topic (use the search function).

The short answer is that you can't tell for sure if the read is artificial or real. It is dependent on a number of factors such as sequencing technology used, expected coverage, read length, etc. There are some approaches that make some assumptions to identify artificial duplicates (e.g. metagenomic reads starting with the same bases are assumed to be duplicates).

I see a similar number of duplicates for 454/Roche sequencing independent of the type of sample sequenced (metagenome, metatranscriptome, ...).

Maybe you can give some more details about your data.
robs is offline   Reply With Quote
Old 09-13-2011, 08:25 PM   #4
son_nexg
Junior Member
 
Location: Australia

Join Date: Jul 2011
Posts: 8
Default

Thanks for your reply 'robs'.

I will have a look at the literature on this.
I was just wondering about it ... so far I was dealing with the DNA seq data and I would expect roughly 10% duplicates in a typical run. But with RNA-seq the story is little different. We start with a very-2 low amount of starting ploy-A capture RNA and then have to amplify it many fold to get decent amount for the sequencing run. Which makes it more prune to having PCR duplicates in the final data.

I can see people are working on protocols for transcriptome data where you can do away with PCR amplification step (e.g. http://www.nature.com/nmeth/journal/...meth.1417.html) but as of now Illumina's protocols use PCR and we need to have reasonal filters to get some real information out of the sequence data.
son_nexg is offline   Reply With Quote
Old 09-15-2011, 12:47 PM   #5
james hadfield
Moderator
Cambridge, UK
Community Forum
 
Location: Cambridge, UK

Join Date: Feb 2008
Posts: 221
Default

you could add a 4bp random sequence in your barcode read or at the 5'end of your oligo for ligation. This way you can see if a read is a duplicate of PCR. You should not see the same random sequence, unless PCR has amplified it so.
james hadfield is offline   Reply With Quote
Old 09-15-2011, 01:05 PM   #6
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 250
Default

If 90% of the reads in your data is identical to one read, then they are probably duplicates
rskr is offline   Reply With Quote
Reply

Tags
duplicates, fastqc, pcr

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:51 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO