Hi,
I am working on NGS data, on paired end reads fastq format files. I clipped the adapters (with max. of 1 mismatch) and trimmed for quality from the reads and then analyzed the original and clipped/quality trimmed files using fastQC. I see that there are about 20% of sequences that are present more than 10 times (in both the original and clipped/quality trimmed reads). Using ShortRead Bioconductor package, in R, I was able to see that some of these reads indeed occur 300+ times. In fastQC, the graph seems pretty nice with the number of sequences that occur 1, 2, etc.. 9 times gradually decreasing to about 0 and then for 10+ repeats it rises to about 20%.
Is this something to worry about? Or rather, how would one characterize this behavior? Because my understanding is that most sequences with adapters were the source of these repeats. So after removing the adapters (with 1 mismatch), there should be none or a significant reduction in the sequence repeats.
Thank you!
PS: Just to be clear, here I mean sequence repeats as the number of times the same sequence is found.
I am working on NGS data, on paired end reads fastq format files. I clipped the adapters (with max. of 1 mismatch) and trimmed for quality from the reads and then analyzed the original and clipped/quality trimmed files using fastQC. I see that there are about 20% of sequences that are present more than 10 times (in both the original and clipped/quality trimmed reads). Using ShortRead Bioconductor package, in R, I was able to see that some of these reads indeed occur 300+ times. In fastQC, the graph seems pretty nice with the number of sequences that occur 1, 2, etc.. 9 times gradually decreasing to about 0 and then for 10+ repeats it rises to about 20%.
Is this something to worry about? Or rather, how would one characterize this behavior? Because my understanding is that most sequences with adapters were the source of these repeats. So after removing the adapters (with 1 mismatch), there should be none or a significant reduction in the sequence repeats.
Thank you!
PS: Just to be clear, here I mean sequence repeats as the number of times the same sequence is found.
Comment