I've been using a combo of fastqc and cutadapt to qc some mate pair libraries we have.
This is a Truseq LT matepair library
I'm only interested in pairs that have evidence of the junction adapter.
This made two separate files that had the junction removed... perfect.
Now when I run fastqc, I expected to see some adapter to match (because I removed the junctions first).
I was surprised instead to see such a high kmer bias, none of the kmers seem to quite match an illumina adapter.
The question is, what explains this kmer bias? Should I just trim the first 45 bases? Is it because I'm dealing with a repetitive plant genome?
This is a Truseq LT matepair library
I'm only interested in pairs that have evidence of the junction adapter.
Code:
cutadapt -m 50 -B CTGTCTCTTATACACATCT -B AGATGTGTATAAGAGACAG -b CTGTCTCTTATACACATCT -b AGATGTGTATAAGAGACAG --trimmed-only -o r1.fastq -p r2.fastq A2YH4_R1.fastq A2YH4_R2.fastq
Now when I run fastqc, I expected to see some adapter to match (because I removed the junctions first).
I was surprised instead to see such a high kmer bias, none of the kmers seem to quite match an illumina adapter.
The question is, what explains this kmer bias? Should I just trim the first 45 bases? Is it because I'm dealing with a repetitive plant genome?
Comment