We recently ran a sequencing experiment of tumor samples that where we needed to perform whole genome amplification (WGA) due to low amounts of tumor tissue. Our results show a very high rate of duplicates in the WGA samples compared to samples from unamplified dna. For instance, the average depth of coverage after duplicate removal was >150x for unamplified samples (n=2) and 15 – 20x for WGA samples (n=6). Here are some other facts about the experiment:
So the question I have, with all things being equal, why does WGA generate so many duplicate reads compared to unamplified DNA? Is there something we can do in our library construction or WGA prep to reduce the high rate of duplicates? If anyone has experience with WGA and high duplication please share any insights!! Thanks!
- WGA was performed using the Qiagen REPLI-g FFPE kit
- Target capture was performed using a custom Agilent Sureselect assay for 500 genes targeting an average coverage of 200x for the panel.
- Samples were run on Illumina HiSeq2000 using paired ends (2 x 100 bp)
- Duplicate reads are flagged (Picard) and removed post alignment; Pre and post duplication removal were compared.
So the question I have, with all things being equal, why does WGA generate so many duplicate reads compared to unamplified DNA? Is there something we can do in our library construction or WGA prep to reduce the high rate of duplicates? If anyone has experience with WGA and high duplication please share any insights!! Thanks!