![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
High duplication levels in FASTQC | flobpf | Bioinformatics | 3 | 11-27-2013 01:28 PM |
What might cause the "Sequence Duplication Levels" failures in FastQC report? | elrohir610 | Bioinformatics | 6 | 05-07-2012 10:38 PM |
Apparent duplication levels incongruence between bismark and fastqc with BS-Seq data | gcarbajosa | Bioinformatics | 2 | 12-13-2011 09:43 AM |
Fastqc sequence duplication levels | Bruce E | Illumina/Solexa | 1 | 07-29-2011 08:13 AM |
Duplication sequence | 2007lab | General | 2 | 05-11-2010 03:48 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Malmö Join Date: Sep 2008
Posts: 37
|
![]()
Hi,
I was wondering if it is normal to have a fastqc plot with sequence duplication level around 60% for a high depth human exome sequencing (82 461 815 reads, or 41230908*2 paired-end reads) |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: St. Louis Join Date: Dec 2010
Posts: 535
|
![]()
It depends on the number of PCR cycles done throughout the wet lab process (and the amount of starting input DNA). 60% is higher than desirable but definitely realistic if a lot of PCR cycles were done.
|
![]() |
![]() |
![]() |
#3 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
'High-depth' is the key here. If you're sequencing to a depth where you're expecting to see multiple reads starting at the same point by chance then you'll get higher duplication levels reported in this plot. Our PhiX lanes for example have horrific duplication levels, but this isn't a problem.
If you haven't already read this blog post then it might be worth a look as it goes through some of the details about how to interpret this plot. |
![]() |
![]() |
![]() |
#4 |
Member
Location: Malmö Join Date: Sep 2008
Posts: 37
|
![]()
Thanks for the replies.
The input was 10pM and with that amount of reads I guess I would have a mean exome coverage of around 134 fold (assuming all PF reads align): (82461815reads * 101bp)/62000000 enriched bp = 134 Just wondering if someone has a plot where we can see the fastqc sequence duplication level vs coverage (or sequenced reads/targeted regions). |
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: hyderabad, India Join Date: Nov 2011
Posts: 9
|
![]()
Hi, My question is not relevant. There were no other option so I would ask a question. I want to extract the exactly similar sequences from a fasta file but the headers are diferent. So for this there is any software or script which i can extract the exact similar sequences.
With Regards, Aeolus |
![]() |
![]() |
![]() |
Tags |
duplication, exome statistics, fastqc |
Thread Tools | |
|
|