SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
What might cause the "Sequence Duplication Levels" failures in FastQC report? elrohir610 Bioinformatics 6 05-07-2012 09:38 PM
fastqc sequence duplication level fadista Bioinformatics 4 01-11-2012 09:17 AM
Apparent duplication levels incongruence between bismark and fastqc with BS-Seq data gcarbajosa Bioinformatics 2 12-13-2011 08:43 AM
Very high duplication of sequences in ChIP-Seq sequencing results OptimusBrien Epigenetics 8 09-15-2011 08:23 AM
Fastqc sequence duplication levels Bruce E Illumina/Solexa 1 07-29-2011 07:13 AM

Reply
 
Thread Tools
Old 09-08-2011, 06:10 AM   #1
flobpf
Member
 
Location: USA

Join Date: Apr 2010
Posts: 76
Default High duplication levels in FASTQC

Hi,

I was using FASTQC to QC my directional mRNA-Seq data obtained from suspension culture cells. I have about 30 million reads.

Although most of my QC stats are fine, I see a big uptick in the "Duplicate sequences" section of sequences with duplication levels > 10 (see below). Sequence Duplication Level >84.56%.

I was wondering what could be wrong. There were 2 possibilities I could think of:
1) Some amplification bias in PCR and/or
2) Since the RNA is not very diverse (its from suspension cells - same cell type) and sequenced to a high coverage, many sequences got sequenced multiple times.

Wonder if the second reason makes sense? If it is true, by extension, it also means that we have successfully sequenced even the very low abundance transcripts. However, if it was PCR bias, that wouldnt be true. Wonder if there is a way to distinguish between these two possibilities?

I'd appreciate any suggestions.

Thanks
flobpf is offline   Reply With Quote
Old 09-08-2011, 08:59 AM   #2
volks
Member
 
Location: hd.de

Join Date: Jun 2010
Posts: 81
Default

have a look at the alignments and you will know. generally i wouldnt trust fastqc duplication levels for mRNA seq too much ..
volks is offline   Reply With Quote
Old 09-08-2011, 11:48 PM   #3
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

The overall duplication level reported by FastQC needs to be taken in context with the shape of the profile you're seeing and also the results of the overrepresented sequence plot. There's a big difference between having a generally oversequenced sample (which often happens with RNA-Seq so you can see low expressed transcripts), and having a small number of sequences accounting for large chunks of your library.

What FastQC can't do is to put the duplication in any kind of context. For libraries with expected uneven coverage (such as RNA-Seq) you'd need to look at the positions of the mapped data to see if you were getting even coverage over highly duplicated regions, which would suggest you simply have really high coverage, or duplicated patchy coverage which would indicate a techinical problem.

If you haven't seen it already I wrote up a more detailed explanation of this on my blog since this is such a common thing to come up (the duplicate sequence plot is probably the least intuitive module to interpret in the FastQC output).
simonandrews is offline   Reply With Quote
Old 11-27-2013, 12:28 PM   #4
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default Fast QC Duplication

Hello.

I read your blog http://proteo.me.uk/2011/05/interpre...lot-in-fastqc/

and find it helpful. I have the same problem.

So at the end of the blog you mentioned to consider the per base quality plot to gain a realistic assessment of the duplication.

In my case: My per base sequence quality is great. but I have the same image posted above, what does this imply?

If my per base sequence quality passes, and I have a high sequence duplication levels, caused by the overrepresented sequence TrueSeq Adapter, can I then conclude that the quality is okay?

Thank you
arcolombo698 is offline   Reply With Quote
Reply

Tags
duplication, fastqc, illumina, mrna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:52 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO