Hi,
I've just got Illumina NextSeq genomic DNA sequence data back and have several samples that include overrepresented sequences - long strings of G's and A's - in the middle of the reads. This also introduces problems into the Per base sequence content and the Per sequence G-C content in FastQC quality control. The contaminated reads only represent a small fraction of reads ~2-3 %, but I'm worried about how to filter these reads out to best make use of the remaining sequence data. Is this a common NextSeq error? Would it make sense to use cutadapt to filter out any reads with strings of repeated nucleotides?
Thank you!
Overrepresented sequences table:
Sequence Count Percentage Possible Source
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAA 8524 0.458029623619369 No Hit
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAA 8153 0.4380942657635753 No Hit
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAA 7458 0.4007490536078431 No Hit
GGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAA 5428 0.2916687936421791 No Hit
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG 4282 0.23008949417387825 No Hit
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAA 3288 0.17667778067344972 No Hit
I've just got Illumina NextSeq genomic DNA sequence data back and have several samples that include overrepresented sequences - long strings of G's and A's - in the middle of the reads. This also introduces problems into the Per base sequence content and the Per sequence G-C content in FastQC quality control. The contaminated reads only represent a small fraction of reads ~2-3 %, but I'm worried about how to filter these reads out to best make use of the remaining sequence data. Is this a common NextSeq error? Would it make sense to use cutadapt to filter out any reads with strings of repeated nucleotides?
Thank you!
Overrepresented sequences table:
Sequence Count Percentage Possible Source
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAA 8524 0.458029623619369 No Hit
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAA 8153 0.4380942657635753 No Hit
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAA 7458 0.4007490536078431 No Hit
GGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAA 5428 0.2916687936421791 No Hit
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG 4282 0.23008949417387825 No Hit
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAA 3288 0.17667778067344972 No Hit
Comment