Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Estimating the bacterial genome size using Kmer frequency

    Hi,

    How to estimate the bacterial genome size (GC rich) when there was no close reference genome ?

    At first i tried jellyfish and generated the histogram plots (for all the avail kmers) and here the exact peak (what i guess) were identifed and calculated, but i am only getting less than half (too less ) off the genome size when compared to generated assemly produce from soapdenovo2.

    And then i tried kmergenie (for all different kmers) same i am not getting proper estimation..

    * Illumina hiseq : Paired end data : Read length 100bps ;
    * GC perecent : 63 % ; (Read_1)
    * Duplicates in fastq : ->48% (Read_1)
    * Read_1 :10128605 (data from FastQC)


    Any Suggestions could be really greatfull..

    Thank you very much..
    Last edited by Krish_143; 06-05-2013, 02:30 AM.
    Krishna

  • #2
    #distinct kmers / 2 should be the genome size with a few important caveats

    1) Including erroneous kmers will inflate the count, so typically would count only those kmers with a count of >=2

    2) Repeat regions will be collapsed

    3) regions that just don't show up will be missed, again underestimating. With high G+C genome, there may be regions simply missing from Illumina or with very low coverage.

    Ray produces the kmer statistics in a way that is easy to parse & generate these estimates.

    Assemblies are often a bit too large due to missed overlaps. If you convert these histograms to genome size estimates, how big a range is covered?

    Even without a reference, the taxonomy of the bug may suggest a range -- though you could well have something outside that range.

    Comment


    • #3
      Hi krobison,

      when i estimted the genome size using kmer information (histogram, kmer Peaks)
      ESti_Gsize: 2.8mb (at Kmer 31)
      Assembled Gsize using SoapDenovo : 5.7mb (Draft)

      I will check with the Ray and very thanks krobison for the quick response.
      Last edited by Krish_143; 06-02-2013, 01:06 AM.
      Krishna

      Comment


      • #4
        I sometimes observe that SOAPdenovo contigs (not scaffolds) tend to assemble more than the genome size. Did you run a Velvet assembly, and if so, what was the assembly size?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X