I am having trouble getting my head round the issue of duplicate reads and in particular how this works with data derived from a amplicon generated PCR library. I appreciate that the greater error rates in NGS (in my case Illumina PE) could lead to generally non-identical reads from the same amplicon - but are they expected to be that different that if they are identical you assume they are duplicates.
I appoligis for asking what is probably a basic thing but I have looked every where for and explantion and not been able to get it.
Q1) When in the process (both PCR/amplicon based and capture based) are theses duplicates generated, at the PCR steps or at the cluster generation?
Q2) I don't understand the idea that identical reads are just duplicates - for an individual sample, the reads from a PCR/amplicon library can only be of two sorts (ie each of the two alleles they have), so wouldn't you just expect two types of read generated from the PCR based approach?
Q3) How do the duplicates from both capture and PCR/amplicon libraries differ? are you not just expecting that this is an inherent issue of the PCR approach. And how are they treated/detected with a PCR approach?
I appoligis for asking what is probably a basic thing but I have looked every where for and explantion and not been able to get it.
Q1) When in the process (both PCR/amplicon based and capture based) are theses duplicates generated, at the PCR steps or at the cluster generation?
Q2) I don't understand the idea that identical reads are just duplicates - for an individual sample, the reads from a PCR/amplicon library can only be of two sorts (ie each of the two alleles they have), so wouldn't you just expect two types of read generated from the PCR based approach?
Q3) How do the duplicates from both capture and PCR/amplicon libraries differ? are you not just expecting that this is an inherent issue of the PCR approach. And how are they treated/detected with a PCR approach?