![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Comparing Kmer distribution between samples | jgibbons1 | Genomic Resequencing | 2 | 05-01-2014 12:13 PM |
Plotting distribution of allelic frequency under different coverage values | toots | Bioinformatics | 2 | 10-22-2012 11:57 AM |
Unexpected kmer distribution | pytheus | Illumina/Solexa | 0 | 05-28-2012 02:26 PM |
Kmer Distribution Problem | cyyuan | Illumina/Solexa | 3 | 05-05-2012 11:13 PM |
frequency distribution plot | alessandra85 | Bioinformatics | 5 | 01-19-2011 07:11 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: US Join Date: Sep 2010
Posts: 14
|
![]()
Is there a program that can estimate the heterozygosity of a sample using the kmer frequency distribution of the raw reads? I have whole genome, Illumina data (100bp PE reads, from 300bp fragments). The kmer frequency plot has a clear bimodal distribution, so I can get a rough estimate by eyeballing the areas under the curves for the two peaks. I am hoping to find a more robust method and more automated since I have over 100 samples.
|
![]() |
![]() |
![]() |
#2 |
Member
Location: Switzerland Join Date: Aug 2013
Posts: 41
|
![]()
Actually I have no responds neither, I am afraid.
I am just asking myself the same question and wondered whether you were able to solve that question ? |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Switzerland Join Date: Aug 2008
Posts: 124
|
![]()
Perhaps you want to look into Ka/Ks estimation.
|
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: USA Join Date: Jan 2013
Posts: 1
|
![]()
I just came across this paper on arxiv "Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects"
http://arxiv.org/abs/1308.2012 I have not tried their tool though! It is available at ftp://ftp.genomics.org.cn/pub/gce/ Best, ~wormSeeq. |
![]() |
![]() |
![]() |
#5 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Hmmm, I wrote a program that does this. Well, two, actually. Their usage is about the same.
khist.sh in=reads.fq khist=khist.txt peaks=peaks.txt or kmercountexact.sh in=reads.fq khist=khist.txt peaks=peaks.txt The first uses approximate counts, while the second uses exact counts (and thus potentially more memory). The peaks file header contains estimates of genome size and heterozygousity. You can also add the flag "ploidy=2" for diploid organisms, so that it won't need to autodetect the ploidy (and thus potentially make a mistake). These are both distributed with BBTools. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|