![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Estimating heterozygosity from kmer frequency distribution | MeganS | Bioinformatics | 4 | 09-03-2015 10:15 AM |
How to estimating the genome size | yanij | Bioinformatics | 18 | 09-10-2013 07:19 AM |
estimate genome size through kmer analysis | plantae | Bioinformatics | 0 | 07-05-2012 04:46 AM |
estimate genome size through kmer analysis | plantae | De novo discovery | 0 | 07-05-2012 04:36 AM |
Estimating genome size and coverage | newbie25 | 454 Pyrosequencing | 2 | 08-12-2010 10:34 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Sweden Join Date: Jan 2012
Posts: 45
|
![]()
Hi,
How to estimate the bacterial genome size (GC rich) when there was no close reference genome ? At first i tried jellyfish and generated the histogram plots (for all the avail kmers) and here the exact peak (what i guess) were identifed and calculated, but i am only getting less than half (too less ) off the genome size when compared to generated assemly produce from soapdenovo2. And then i tried kmergenie (for all different kmers) same i am not getting proper estimation.. * Illumina hiseq : Paired end data : Read length 100bps ; * GC perecent : 63 % ; (Read_1) * Duplicates in fastq : ->48% (Read_1) * Read_1 :10128605 (data from FastQC) Any Suggestions could be really greatfull.. Thank you very much..
__________________
Krishna Last edited by Krish_143; 06-05-2013 at 03:30 AM. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Boston area Join Date: Nov 2007
Posts: 747
|
![]()
#distinct kmers / 2 should be the genome size with a few important caveats
1) Including erroneous kmers will inflate the count, so typically would count only those kmers with a count of >=2 2) Repeat regions will be collapsed 3) regions that just don't show up will be missed, again underestimating. With high G+C genome, there may be regions simply missing from Illumina or with very low coverage. Ray produces the kmer statistics in a way that is easy to parse & generate these estimates. Assemblies are often a bit too large due to missed overlaps. If you convert these histograms to genome size estimates, how big a range is covered? Even without a reference, the taxonomy of the bug may suggest a range -- though you could well have something outside that range. |
![]() |
![]() |
![]() |
#3 |
Member
Location: Sweden Join Date: Jan 2012
Posts: 45
|
![]()
Hi krobison,
when i estimted the genome size using kmer information (histogram, kmer Peaks) ESti_Gsize: 2.8mb (at Kmer 31) Assembled Gsize using SoapDenovo : 5.7mb (Draft) I will check with the Ray and very thanks krobison for the quick response.
__________________
Krishna Last edited by Krish_143; 06-02-2013 at 02:06 AM. |
![]() |
![]() |
![]() |
#4 |
Member
Location: France Join Date: Jan 2013
Posts: 13
|
![]()
I sometimes observe that SOAPdenovo contigs (not scaffolds) tend to assemble more than the genome size. Did you run a Velvet assembly, and if so, what was the assembly size?
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|