We are analyzing a paired-end Illumina library with 150bp reads and we are seeing weird kmer content results from FastQC.
There are several over-represented heptamers and they appear in the same positions in the R1 and R2 samples.
We used trimmomatic to remove adapter sequences and low quality bases, but this didn't seem to fix the issue. We went ahead with a de novo assembly using Abyss and this produced a highly fragmented assembly with more than 60 million short contigs.
We know something is wrong, but we can't explain it.
Are the results of our assembly because of this kmer problem? And if so, what can we do about it? And can anyone let us know what is causing it?
There are several over-represented heptamers and they appear in the same positions in the R1 and R2 samples.
We used trimmomatic to remove adapter sequences and low quality bases, but this didn't seem to fix the issue. We went ahead with a de novo assembly using Abyss and this produced a highly fragmented assembly with more than 60 million short contigs.
We know something is wrong, but we can't explain it.
Are the results of our assembly because of this kmer problem? And if so, what can we do about it? And can anyone let us know what is causing it?
Comment