Hi all,
I have 21,005,534 x 2 paired-end genome sequencing data.
To estimate the genome size, I used BBMAP.
Is there anyone who could help to interpret the output?
The command:
The 'peaks.txt' output:
Q1.
According to this post (http://seqanswers.com/forums/archive...p/t-48375.html),
to calculate the genome size, 100786874 + 913749/2 + 41304813/3 + ... + 12164/11 + 181119/12 = 144,919,993.
However, on the 4th line of the output, the genome size is 285,812,914.
What is the real genome size?
Q2.
Also, what does each line of the output mean (e.g. unique_kmers, haploid_genome_size, haploid_fold_coverage, ploidy, het_rate, percent_repeat)?
Any advice would be appreciated.
I have 21,005,534 x 2 paired-end genome sequencing data.
To estimate the genome size, I used BBMAP.
Is there anyone who could help to interpret the output?
The command:
Code:
./kmercountexact.sh in1=read1.fastq in2=read2.fastq khist=khist.txt peaks=peaks.txt
#k 31
#unique_kmers 1432946339
#main_peak 31
#genome_size 285812914
#haploid_genome_size 95270971
#fold_coverage 31
#haploid_fold_coverage 112
#ploidy 3
#het_rate 0.03413
#percent_repeat 6.291
#start center stop max volume
13 31 58 6211343 100786874
58 65 71 73206 913749
71 112 170 1196232 41304813
170 224 285 19062 1199939
285 324 325 3981 138959
325 329 396 4084 201426
396 414 443 1692 75732
443 447 525 1481 92625
525 526 535 833 7869
535 539 541 822 4724
541 543 557 797 12164
557 565 1413 725 181119
#unique_kmers 1432946339
#main_peak 31
#genome_size 285812914
#haploid_genome_size 95270971
#fold_coverage 31
#haploid_fold_coverage 112
#ploidy 3
#het_rate 0.03413
#percent_repeat 6.291
#start center stop max volume
13 31 58 6211343 100786874
58 65 71 73206 913749
71 112 170 1196232 41304813
170 224 285 19062 1199939
285 324 325 3981 138959
325 329 396 4084 201426
396 414 443 1692 75732
443 447 525 1481 92625
525 526 535 833 7869
535 539 541 822 4724
541 543 557 797 12164
557 565 1413 725 181119
According to this post (http://seqanswers.com/forums/archive...p/t-48375.html),
to calculate the genome size, 100786874 + 913749/2 + 41304813/3 + ... + 12164/11 + 181119/12 = 144,919,993.
However, on the 4th line of the output, the genome size is 285,812,914.
What is the real genome size?
Q2.
Also, what does each line of the output mean (e.g. unique_kmers, haploid_genome_size, haploid_fold_coverage, ploidy, het_rate, percent_repeat)?
Any advice would be appreciated.
Comment