SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Comparing Kmer distribution between samples jgibbons1 Genomic Resequencing 2 05-01-2014 11:13 AM
Plotting distribution of allelic frequency under different coverage values toots Bioinformatics 2 10-22-2012 10:57 AM
Unexpected kmer distribution pytheus Illumina/Solexa 0 05-28-2012 01:26 PM
Kmer Distribution Problem cyyuan Illumina/Solexa 3 05-05-2012 10:13 PM
frequency distribution plot alessandra85 Bioinformatics 5 01-19-2011 06:11 AM

Reply
 
Thread Tools
Old 04-23-2013, 10:17 PM   #1
MeganS
Member
 
Location: US

Join Date: Sep 2010
Posts: 14
Default Estimating heterozygosity from kmer frequency distribution

Is there a program that can estimate the heterozygosity of a sample using the kmer frequency distribution of the raw reads? I have whole genome, Illumina data (100bp PE reads, from 300bp fragments). The kmer frequency plot has a clear bimodal distribution, so I can get a rough estimate by eyeballing the areas under the curves for the two peaks. I am hoping to find a more robust method and more automated since I have over 100 samples.
MeganS is offline   Reply With Quote
Old 09-05-2013, 11:21 PM   #2
ebioman
Member
 
Location: Switzerland

Join Date: Aug 2013
Posts: 41
Default push

Actually I have no responds neither, I am afraid.
I am just asking myself the same question and wondered whether you were able to solve that question ?
ebioman is offline   Reply With Quote
Old 09-08-2013, 09:15 AM   #3
Melissa
Senior Member
 
Location: Switzerland

Join Date: Aug 2008
Posts: 124
Default

Perhaps you want to look into Ka/Ks estimation.
Melissa is offline   Reply With Quote
Old 09-03-2015, 08:25 AM   #4
WormSeeq
Junior Member
 
Location: USA

Join Date: Jan 2013
Posts: 1
Default

I just came across this paper on arxiv "Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects"
http://arxiv.org/abs/1308.2012
I have not tried their tool though! It is available at ftp://ftp.genomics.org.cn/pub/gce/
Best,
~wormSeeq.
WormSeeq is offline   Reply With Quote
Old 09-03-2015, 09:15 AM   #5
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Hmmm, I wrote a program that does this. Well, two, actually. Their usage is about the same.

khist.sh in=reads.fq khist=khist.txt peaks=peaks.txt
or
kmercountexact.sh in=reads.fq khist=khist.txt peaks=peaks.txt

The first uses approximate counts, while the second uses exact counts (and thus potentially more memory). The peaks file header contains estimates of genome size and heterozygousity. You can also add the flag "ploidy=2" for diploid organisms, so that it won't need to autodetect the ploidy (and thus potentially make a mistake).

These are both distributed with BBTools.
Brian Bushnell is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:20 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO