Hello all,
We are currently carrying out analysis of a series of exomes prepared using the Illumina TruSeq exome selection kit, which targets 62 Mb of the genome. The exomes range between 35 and 75 million 100 bp read pairs, giving us coverages in the order of 50-150x.
Following alignment with BWA (with a mapping rate typically ~95-97%), local indel realignment with GATK and marking duplicates with picard, we are seeing extremely high rates of PCR duplicates (between 25-90%, but most commonly in the order of 40-60%). Obviously this means we are losing a huge amount of data if we persist with removing these duplicates.
The arguments about wheter to mark duplicates or not have obviously been done to death here, but since these are the first exomes we've handled, I'm trying to understand if these results are 'normal' in exome sequencing. Clearly there is a much greater chance of identical reads occurring given the comparatively small target region, but is it normal to see such high rates of duplicate reads, or does this look more likely to to be a wet-lab issue (overamplificaion?).
Thanks in advance for any advice anyone can give,
James
We are currently carrying out analysis of a series of exomes prepared using the Illumina TruSeq exome selection kit, which targets 62 Mb of the genome. The exomes range between 35 and 75 million 100 bp read pairs, giving us coverages in the order of 50-150x.
Following alignment with BWA (with a mapping rate typically ~95-97%), local indel realignment with GATK and marking duplicates with picard, we are seeing extremely high rates of PCR duplicates (between 25-90%, but most commonly in the order of 40-60%). Obviously this means we are losing a huge amount of data if we persist with removing these duplicates.
The arguments about wheter to mark duplicates or not have obviously been done to death here, but since these are the first exomes we've handled, I'm trying to understand if these results are 'normal' in exome sequencing. Clearly there is a much greater chance of identical reads occurring given the comparatively small target region, but is it normal to see such high rates of duplicate reads, or does this look more likely to to be a wet-lab issue (overamplificaion?).
Thanks in advance for any advice anyone can give,
James
Comment