Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Contig length, k-mer coverage, and differential expression

    I'm working with some data where I have a read count and k-mer coverage (Ck) for a set of contigs and scaffolds across different conditions. I've recently heard and read a few very confusing explanations of k-mer coverage, so would appreciate some clarification. From what I gather, Ck is directly related to base coverage. But, can the size of a contig be determined if I know the Ck value, read length, and read number for that specific contig? Or would this calculation not work for a de novo transcriptome where read coverage varies greatly between contigs and scaffolds?

    For example, here are my numbers for contig A:

    Read length = 75 b
    Read count = 185,600 reads
    Ck = 63
    hash length = 31

    When I plug all this into Ck = C*(rL-k+1)/rL where C=coverage (read length*reads/contig length (cL)) and rL = read length, I get a value for cL of about 127 kb. However, when I go back to the raw data and look at that contig's sequence, I find it to be only .823 kb. Not sure how the total reads for the run figure into this, but I have ~40 million reads for this condition.

    Because C depends on the read count, my best guess is that contigs and scaffolds that have relatively high or low expression over the mean will have Ck values unrepresentative of the contig length. But I feel clueless, and my partner appears to be only acting as if he knows. I have a feeling I'm misunderstanding something completely obvious.

    Any help on this matter would be greatly appreciated.

  • #2
    Hi, all
    I am new to denovo genome assembly. I have a fastq sequence data which i have to assemble using velvet. I used the velvet optimiser script with different hash length from 27 to 41 and it predicted best to be 37. The output file contigs.fa contains 260 contigs whereas log file predicts 283 nodes, where are the rest gone? Length given in contigs.fa is in k mers? how do i calculate it's actual nucleotide length in bp?. How do i understand whether the assembly is good or bad. FInal stat given after script running:
    Final graph has 283 nodes and n50 of 347, max 2336, total 68614, using 19064/50000 reads
    Why are the number of used reads so low?

    Comment


    • #3
      Contig length, k-mer coverage, and differential expression

      I'm pretty sure velvet has a cutoff value for the length of the contigs
      listed in the contigs.fa file, although I don't remember off the top of my head what that is. So the missing contigs are probably the very short ones.

      The formula for calculating kmer coverage from base coverage is
      given in the velvet manual. See



      As to whether the assembly is good, have a look at this Nature Methods article entitled ''De novo genome assembly: what every biologist should know"

      Comment


      • #4
        What is the twin node as specified in velvet? It says reverse of reverse complement k merss. How are contigs actually generated using paired end assembly with velvet? can someone show using an example?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-27-2024, 06:37 PM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-27-2024, 06:07 PM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        69 views
        0 likes
        Last Post seqadmin  
        Working...
        X