Recently I am working on developing new methods for NGS PE library preparation. I am confused about PCR duplicates. It is obvious that duplicates will arise during PCR amplification dealing with low DNA input. How ever, when I think about PCR-free library preparation, one double-stranded DNA molecule is ligated 2 Y adapters at both left and right side. This results in 2 different single strand products:
1. 5'-P5 - plus strand insert ssDNA -P7'-3'
2. 3'-P7'- minus strand insert ssDNA -P5-5'
These 2 single strand products are actually duplicates since the insert ssDNA are fully complementary to each other and they could be both ligated to flow cell. This means even for PCR-free library, 50% of the reads are duplicates, theoretically both strand from one ds-DNA could be sequenced. however, when I dealt with fastq file after sequencing, the percentage of duplicates was much lower than 50%, as for PCR-free library, there was nearly no duplicates. Can anybody help me about this ?
1. 5'-P5 - plus strand insert ssDNA -P7'-3'
2. 3'-P7'- minus strand insert ssDNA -P5-5'
These 2 single strand products are actually duplicates since the insert ssDNA are fully complementary to each other and they could be both ligated to flow cell. This means even for PCR-free library, 50% of the reads are duplicates, theoretically both strand from one ds-DNA could be sequenced. however, when I dealt with fastq file after sequencing, the percentage of duplicates was much lower than 50%, as for PCR-free library, there was nearly no duplicates. Can anybody help me about this ?
Comment