Dear all,
I am running jelly fish (jellyfish-2.1.1) for first time to estimate genome size. Although I followed manual, i am bit confused to estimate genome size. Below are my steps for kmer 27. Did I get correct genome size estimation. If I want to try different kmers to get best kmer & genome size how I do plotting? If any body have script to plot for different kmers and find best kmer and genome size, please share with me.
less stats.txt
Unique: 659211049
Distinct: 2297173537
Total: 31359408599
Max_count: 16054234
(END)
less histogram.txt (first 10 rows)
0 0
1 659211049
2 94535838
3 109738065
4 125218564
5 126564348
6 117188987
7 103591231
8 90823407
9 80950377
10 74112334
Genome size estimation= totalnumber of distant kmers - distinct error kmers
Genome size estimation=31359408599 - 2297173537 = 31130235062
I am running jelly fish (jellyfish-2.1.1) for first time to estimate genome size. Although I followed manual, i am bit confused to estimate genome size. Below are my steps for kmer 27. Did I get correct genome size estimation. If I want to try different kmers to get best kmer & genome size how I do plotting? If any body have script to plot for different kmers and find best kmer and genome size, please share with me.
jellyfish count -m 27 -s 100M -t 10 -C sample.filtered.fastq
jellyfish histo -f mer_counts.jf > histogram.txt
jellyfish stats -v -o stats.txt mer_counts.jf
jellyfish histo -f mer_counts.jf > histogram.txt
jellyfish stats -v -o stats.txt mer_counts.jf
Unique: 659211049
Distinct: 2297173537
Total: 31359408599
Max_count: 16054234
(END)
less histogram.txt (first 10 rows)
0 0
1 659211049
2 94535838
3 109738065
4 125218564
5 126564348
6 117188987
7 103591231
8 90823407
9 80950377
10 74112334
Genome size estimation= totalnumber of distant kmers - distinct error kmers
Genome size estimation=31359408599 - 2297173537 = 31130235062