Good day,
I need some advice on the Kmer content of my de novo project. I've sequenced the genome of a lovebird (parrot) species. Here are some details:
- We sequenced the offspring at 100x coverage and its parents at 30x coverage on Illumina Hiseq 2500
- The offspring had 3 PE libraries of 300, 550 and 750 bp, the parents 2 PE libraries of 300 and 550bp
- The offspring had 2 LJD MP libraries of 3 and 8 kb
- The read lengths were 125bp but after trimming by the service providers they were 30-125bp long
- The genome has a GC content of around 43%
- Overall the FastQC files look good and the only problem is the Kmer content
Here is the problem... It seems that there is a Kmer bias around 42-54 bp on all 3 the samples.
It looks if it is part of the Illumina TruSeq adapter, but it isn't given as an over represented sequence. The sequence is:
5 GATCGGAAGAGCACACGTCTGAACTCCAGTCAC‐NNNNNN-ATCTCGTATGCCGTCTTCTGCTTG 3
I have attached two screenshots from the Kmer contents here. Most of the FastQC reports look like this, for all 3 the birds.
We have discussed it with the service provider, but they feel we don't have to worry at all.
Has anybody experienced anything like this before? Can you offer some help please?
Thank you in advance!
Henriette
I need some advice on the Kmer content of my de novo project. I've sequenced the genome of a lovebird (parrot) species. Here are some details:
- We sequenced the offspring at 100x coverage and its parents at 30x coverage on Illumina Hiseq 2500
- The offspring had 3 PE libraries of 300, 550 and 750 bp, the parents 2 PE libraries of 300 and 550bp
- The offspring had 2 LJD MP libraries of 3 and 8 kb
- The read lengths were 125bp but after trimming by the service providers they were 30-125bp long
- The genome has a GC content of around 43%
- Overall the FastQC files look good and the only problem is the Kmer content
Here is the problem... It seems that there is a Kmer bias around 42-54 bp on all 3 the samples.
It looks if it is part of the Illumina TruSeq adapter, but it isn't given as an over represented sequence. The sequence is:
5 GATCGGAAGAGCACACGTCTGAACTCCAGTCAC‐NNNNNN-ATCTCGTATGCCGTCTTCTGCTTG 3
I have attached two screenshots from the Kmer contents here. Most of the FastQC reports look like this, for all 3 the birds.
We have discussed it with the service provider, but they feel we don't have to worry at all.
Has anybody experienced anything like this before? Can you offer some help please?
Thank you in advance!
Henriette
Comment