Hello,
The first 20 bases of my MiSeq reads show abnormal %A, T, G and C, as evidenced by the 'per base sequence content' tab of the FastQC report (see the attached PNG). The per base GC content is similarly weird, but the quality of these bases is good.
The issue can be easily rectified by removing the first 20 bp of each read, but can anyone enlighten me as to what is causing this? I have used both CutAdapt and TagDust on these reads to get rid of adapter sequences. I thought maybe it was the Illumina barcodes, except the barcode sequence is usually contained within the fastq header, thus:
And searching for this sequence (i.e. TAAGGCGANAGATCGC as above) doesn't reveal it to be at the start of the read.
What is it?? And what's the best way of dealing with it? Simply chop the first 20bp off my reads or is it something that requires a bit more QC?
Thanks!
The first 20 bases of my MiSeq reads show abnormal %A, T, G and C, as evidenced by the 'per base sequence content' tab of the FastQC report (see the attached PNG). The per base GC content is similarly weird, but the quality of these bases is good.
The issue can be easily rectified by removing the first 20 bp of each read, but can anyone enlighten me as to what is causing this? I have used both CutAdapt and TagDust on these reads to get rid of adapter sequences. I thought maybe it was the Illumina barcodes, except the barcode sequence is usually contained within the fastq header, thus:
Code:
@MVM-RI-I124161:11:000000000-A3985:1:1101:18249:1757 1:N:0:TAAGGCGANAGATCGC
What is it?? And what's the best way of dealing with it? Simply chop the first 20bp off my reads or is it something that requires a bit more QC?
Thanks!
Comment