Seqanswers Leaderboard Ad

**gcarbajosa** · 12-13-2011, 08:37 AM

Something more about this. Going through the SEQanswers post related to fastqc I've found a link to this page:

Interpreting the duplicate sequence plot in FastQC |

http://proteo.me.uk/2011/05/interpreting-the-duplicate-sequence-plot-in-fastqc/

where Simon Andrews mentions that fastqc only uses the first 50bp of each sequence to search for duplicates. I guess that since the reads in my dataset are 100bp long they duplication levels can be boosted by only considering the first 50bp when looking for identical reads. So now I'm thinking that the correct answer is the 2nd possibility

**fkrueger** · 12-13-2011, 09:43 AM

Hi gcarbajosa,

As you mentioned, FastQC determines an approximate level of sequence duplication by storing the first 50bp of the first 200,000 different sequences it encounters in a sequencing file. These duplicated sequences may for example be be adapter contamination (which would not map at all in Bismark), but could also be duplicate reads that were amplified by PCR during the library construction. These reads might align perfectly well and uniquely to the genome even though they might be technical duplicates.

So essentially the number of reads mapping non-uniquely (which are being discarded) and duplicated reads is not the same thing, and Bismark does not specifically output anything regarding duplication levels. I hope this helps?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Apparent duplication levels incongruence between bismark and fastqc with BS-Seq data

Comment

Comment

Latest Articles

ad_right_rmr

News