Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Comparing Kmer distribution between samples jgibbons1 Genomic Resequencing 2 05-01-2014 12:13 PM
Plotting distribution of allelic frequency under different coverage values toots Bioinformatics 2 10-22-2012 11:57 AM
Unexpected kmer distribution pytheus Illumina/Solexa 0 05-28-2012 02:26 PM
Kmer Distribution Problem cyyuan Illumina/Solexa 3 05-05-2012 11:13 PM
frequency distribution plot alessandra85 Bioinformatics 5 01-19-2011 07:11 AM

Thread Tools
Old 04-23-2013, 11:17 PM   #1
Location: US

Join Date: Sep 2010
Posts: 14
Default Estimating heterozygosity from kmer frequency distribution

Is there a program that can estimate the heterozygosity of a sample using the kmer frequency distribution of the raw reads? I have whole genome, Illumina data (100bp PE reads, from 300bp fragments). The kmer frequency plot has a clear bimodal distribution, so I can get a rough estimate by eyeballing the areas under the curves for the two peaks. I am hoping to find a more robust method and more automated since I have over 100 samples.
MeganS is offline   Reply With Quote
Old 09-06-2013, 12:21 AM   #2
Location: Switzerland

Join Date: Aug 2013
Posts: 41
Default push

Actually I have no responds neither, I am afraid.
I am just asking myself the same question and wondered whether you were able to solve that question ?
ebioman is offline   Reply With Quote
Old 09-08-2013, 10:15 AM   #3
Senior Member
Location: Switzerland

Join Date: Aug 2008
Posts: 124

Perhaps you want to look into Ka/Ks estimation.
Melissa is offline   Reply With Quote
Old 09-03-2015, 09:25 AM   #4
Junior Member
Location: USA

Join Date: Jan 2013
Posts: 1

I just came across this paper on arxiv "Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects"
I have not tried their tool though! It is available at
WormSeeq is offline   Reply With Quote
Old 09-03-2015, 10:15 AM   #5
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

Hmmm, I wrote a program that does this. Well, two, actually. Their usage is about the same. in=reads.fq khist=khist.txt peaks=peaks.txt
or in=reads.fq khist=khist.txt peaks=peaks.txt

The first uses approximate counts, while the second uses exact counts (and thus potentially more memory). The peaks file header contains estimates of genome size and heterozygousity. You can also add the flag "ploidy=2" for diploid organisms, so that it won't need to autodetect the ploidy (and thus potentially make a mistake).

These are both distributed with BBTools.
Brian Bushnell is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 04:51 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO