Hi,
I am hoping to gain some insight into my kmer spectra graphs, as many of them don't have the usual bi-modal distribution as I have seen before, with many low frequency kmers (presumed erroneous kmers), and then a main kmer coverage peak.
I was trying to use Quake to filter the reads, with Jellyfish to count kmers. When I visualize the distribution of the kmers frequencies, I get results similar to the attached pdf file.
For further information, these are different clinical E. coli strains (genome size ~5MB), many harbouring more than one plasmid, often on the order of 100kb. They were sequenced on a MiSeq for 150bp PE reads.
[Aside: I have had some issues with the assembly of them using Velvet, notably the sample from the top left hand corner. Here, I discovered that fragments of a previously sequenced 93kb plasmid from this sample had been incorporated into longer 'chromosomal' contigs. Any advice to avoid this issue that can also be applied to the other samples where I don't have any information about the plasmid content, would be very appreciated]
Thank you very much in advance for advice and comments!
Regards,
Heidi
I am hoping to gain some insight into my kmer spectra graphs, as many of them don't have the usual bi-modal distribution as I have seen before, with many low frequency kmers (presumed erroneous kmers), and then a main kmer coverage peak.
I was trying to use Quake to filter the reads, with Jellyfish to count kmers. When I visualize the distribution of the kmers frequencies, I get results similar to the attached pdf file.
For further information, these are different clinical E. coli strains (genome size ~5MB), many harbouring more than one plasmid, often on the order of 100kb. They were sequenced on a MiSeq for 150bp PE reads.
[Aside: I have had some issues with the assembly of them using Velvet, notably the sample from the top left hand corner. Here, I discovered that fragments of a previously sequenced 93kb plasmid from this sample had been incorporated into longer 'chromosomal' contigs. Any advice to avoid this issue that can also be applied to the other samples where I don't have any information about the plasmid content, would be very appreciated]
Thank you very much in advance for advice and comments!
Regards,
Heidi