Dear all,
I am performing QC to fastq files of Illumina 76bp length single end reads of RNAseq data. I keep getting indications that there are many PCR duplicates; Fastqc report indicates 74.1% sequence duplication level, but no overrepresented sequences list is given. When SAMtools Duplicates removal (rmdup) option is performed, 74.9% of the sequences were found to be duplicates. When comparing the mapping results before and after the duplicates removal I see that the highly expressed genes, has the highest fraction of duplicates (which were removed). This is not the first illumina single end short reads RNA seq experiment that I see this phenomena.
I keep wondering whether this is an experimental artifact (then, should we repeat on the experiment?) or just a possible valid result (in this scenario I must than believe that few different inserts which were generated from different copies of the same kind of RNA transcripts were cleaved at the same base, leading to identical 5’ end of an insert which are then sequenced).
I would be glad to know your opinion in this matter
Many thanks
Inbar
I am performing QC to fastq files of Illumina 76bp length single end reads of RNAseq data. I keep getting indications that there are many PCR duplicates; Fastqc report indicates 74.1% sequence duplication level, but no overrepresented sequences list is given. When SAMtools Duplicates removal (rmdup) option is performed, 74.9% of the sequences were found to be duplicates. When comparing the mapping results before and after the duplicates removal I see that the highly expressed genes, has the highest fraction of duplicates (which were removed). This is not the first illumina single end short reads RNA seq experiment that I see this phenomena.
I keep wondering whether this is an experimental artifact (then, should we repeat on the experiment?) or just a possible valid result (in this scenario I must than believe that few different inserts which were generated from different copies of the same kind of RNA transcripts were cleaved at the same base, leading to identical 5’ end of an insert which are then sequenced).
I would be glad to know your opinion in this matter
Many thanks
Inbar
Comment