Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
What might cause the "Sequence Duplication Levels" failures in FastQC report? elrohir610 Bioinformatics 6 05-07-2012 10:38 PM
fastqc sequence duplication level fadista Bioinformatics 4 01-11-2012 10:17 AM
Apparent duplication levels incongruence between bismark and fastqc with BS-Seq data gcarbajosa Bioinformatics 2 12-13-2011 09:43 AM
Very high duplication of sequences in ChIP-Seq sequencing results OptimusBrien Epigenetics 8 09-15-2011 09:23 AM
Fastqc sequence duplication levels Bruce E Illumina/Solexa 1 07-29-2011 08:13 AM

Thread Tools
Old 09-08-2011, 07:10 AM   #1
Location: USA

Join Date: Apr 2010
Posts: 76
Default High duplication levels in FASTQC


I was using FASTQC to QC my directional mRNA-Seq data obtained from suspension culture cells. I have about 30 million reads.

Although most of my QC stats are fine, I see a big uptick in the "Duplicate sequences" section of sequences with duplication levels > 10 (see below). Sequence Duplication Level >84.56%.

I was wondering what could be wrong. There were 2 possibilities I could think of:
1) Some amplification bias in PCR and/or
2) Since the RNA is not very diverse (its from suspension cells - same cell type) and sequenced to a high coverage, many sequences got sequenced multiple times.

Wonder if the second reason makes sense? If it is true, by extension, it also means that we have successfully sequenced even the very low abundance transcripts. However, if it was PCR bias, that wouldnt be true. Wonder if there is a way to distinguish between these two possibilities?

I'd appreciate any suggestions.

flobpf is offline   Reply With Quote
Old 09-08-2011, 09:59 AM   #2

Join Date: Jun 2010
Posts: 81

have a look at the alignments and you will know. generally i wouldnt trust fastqc duplication levels for mRNA seq too much ..
volks is offline   Reply With Quote
Old 09-09-2011, 12:48 AM   #3
Simon Andrews
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871

The overall duplication level reported by FastQC needs to be taken in context with the shape of the profile you're seeing and also the results of the overrepresented sequence plot. There's a big difference between having a generally oversequenced sample (which often happens with RNA-Seq so you can see low expressed transcripts), and having a small number of sequences accounting for large chunks of your library.

What FastQC can't do is to put the duplication in any kind of context. For libraries with expected uneven coverage (such as RNA-Seq) you'd need to look at the positions of the mapped data to see if you were getting even coverage over highly duplicated regions, which would suggest you simply have really high coverage, or duplicated patchy coverage which would indicate a techinical problem.

If you haven't seen it already I wrote up a more detailed explanation of this on my blog since this is such a common thing to come up (the duplicate sequence plot is probably the least intuitive module to interpret in the FastQC output).
simonandrews is offline   Reply With Quote
Old 11-27-2013, 01:28 PM   #4
Senior Member
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default Fast QC Duplication


I read your blog

and find it helpful. I have the same problem.

So at the end of the blog you mentioned to consider the per base quality plot to gain a realistic assessment of the duplication.

In my case: My per base sequence quality is great. but I have the same image posted above, what does this imply?

If my per base sequence quality passes, and I have a high sequence duplication levels, caused by the overrepresented sequence TrueSeq Adapter, can I then conclude that the quality is okay?

Thank you
arcolombo698 is offline   Reply With Quote

duplication, fastqc, illumina, mrna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 07:56 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO