View Single Post
Old 11-06-2015, 08:12 AM   #1
syintel87
Member
 
Location: Universe

Join Date: Dec 2012
Posts: 81
Default Genome size estimation using BBMAP

Hi all,
I have 21,005,534 x 2 paired-end genome sequencing data.
To estimate the genome size, I used BBMAP.
Is there anyone who could help to interpret the output?

The command:
Code:
./kmercountexact.sh in1=read1.fastq in2=read2.fastq khist=khist.txt peaks=peaks.txt
The 'peaks.txt' output:
Quote:
#k 31
#unique_kmers 1432946339
#main_peak 31
#genome_size 285812914
#haploid_genome_size 95270971
#fold_coverage 31
#haploid_fold_coverage 112
#ploidy 3
#het_rate 0.03413
#percent_repeat 6.291
#start center stop max volume
13 31 58 6211343 100786874
58 65 71 73206 913749
71 112 170 1196232 41304813
170 224 285 19062 1199939
285 324 325 3981 138959
325 329 396 4084 201426
396 414 443 1692 75732
443 447 525 1481 92625
525 526 535 833 7869
535 539 541 822 4724
541 543 557 797 12164
557 565 1413 725 181119
Q1.
According to this post (http://seqanswers.com/forums/archive...p/t-48375.html),
to calculate the genome size, 100786874 + 913749/2 + 41304813/3 + ... + 12164/11 + 181119/12 = 144,919,993.
However, on the 4th line of the output, the genome size is 285,812,914.
What is the real genome size?

Q2.
Also, what does each line of the output mean (e.g. unique_kmers, haploid_genome_size, haploid_fold_coverage, ploidy, het_rate, percent_repeat)?

Any advice would be appreciated.
syintel87 is offline   Reply With Quote