View Single Post
Old 07-12-2017, 06:15 AM   #7
Location: Fargo

Join Date: Dec 2011
Posts: 64

Interesting. The genome size as estimated by flow cytometry was 2.1 Gb. The organism has been reasonably well studied cytogenetically (it is one of the worst invasive weeds in the great plains). It is most definitely a hexaploid - though the data suggests that the duplication of one of the genomes was very recent and the third genome is from a very closely related species. I ran a SNP analyses, and granted that although I was being very conservative, I only identified about 150K SNPs after mapping the fragments back to the assembly. Thus my first thought was that there was almost no heterozygosity, and the three genomes were so closely related that the kmers were collapsing the sequence. The Jellyfish method as I understand it is described here:

Essentially you just sum the non-error kmers and divide by the coverage.

The Jellyfish run was done with a kmer of 21, and the BBMAP with the default of 31(?). However, I am about to rerun the BBMAP using the same Kmer size (and with the ploidy level specified). It doesn't take long, so I will also run at a number of different Kmer sizes just to see what that does to the curve.

I found that the genome size estimates from BBMAP were between 840 Mb (K=21, ploidy=6) to 1.1Gb (default).

I guess my primary question is: Which method ((kmer count)/coverage ) or BBMAP) is the best for estimating genome size and why?
horvathdp is offline   Reply With Quote