Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
High duplication levels in FASTQC flobpf Bioinformatics 3 11-27-2013 01:28 PM
What might cause the "Sequence Duplication Levels" failures in FastQC report? elrohir610 Bioinformatics 6 05-07-2012 10:38 PM
Apparent duplication levels incongruence between bismark and fastqc with BS-Seq data gcarbajosa Bioinformatics 2 12-13-2011 09:43 AM
Fastqc sequence duplication levels Bruce E Illumina/Solexa 1 07-29-2011 08:13 AM
Duplication sequence 2007lab General 2 05-11-2010 03:48 PM

Thread Tools
Old 01-03-2012, 06:59 AM   #1
Location: Malmö

Join Date: Sep 2008
Posts: 37
Default fastqc sequence duplication level


I was wondering if it is normal to have a fastqc plot with sequence duplication level around 60% for a high depth human exome sequencing (82 461 815 reads, or 41230908*2 paired-end reads)
fadista is offline   Reply With Quote
Old 01-03-2012, 07:31 AM   #2
Senior Member
Location: St. Louis

Join Date: Dec 2010
Posts: 535

It depends on the number of PCR cycles done throughout the wet lab process (and the amount of starting input DNA). 60% is higher than desirable but definitely realistic if a lot of PCR cycles were done.
Heisman is offline   Reply With Quote
Old 01-04-2012, 12:17 AM   #3
Simon Andrews
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871

'High-depth' is the key here. If you're sequencing to a depth where you're expecting to see multiple reads starting at the same point by chance then you'll get higher duplication levels reported in this plot. Our PhiX lanes for example have horrific duplication levels, but this isn't a problem.

If you haven't already read this blog post then it might be worth a look as it goes through some of the details about how to interpret this plot.
simonandrews is offline   Reply With Quote
Old 01-04-2012, 04:00 AM   #4
Location: Malmö

Join Date: Sep 2008
Posts: 37

Thanks for the replies.

The input was 10pM and with that amount of reads I guess I would have a mean exome coverage of around 134 fold (assuming all PF reads align):
(82461815reads * 101bp)/62000000 enriched bp = 134

Just wondering if someone has a plot where we can see the fastqc sequence duplication level vs coverage (or sequenced reads/targeted regions).
fadista is offline   Reply With Quote
Old 01-11-2012, 10:17 AM   #5
Aeolus Huios
Junior Member
Location: hyderabad, India

Join Date: Nov 2011
Posts: 9

Hi, My question is not relevant. There were no other option so I would ask a question. I want to extract the exactly similar sequences from a fasta file but the headers are diferent. So for this there is any software or script which i can extract the exact similar sequences.

With Regards,
Aeolus Huios is offline   Reply With Quote

duplication, exome statistics, fastqc

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 07:36 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO