Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • VelvetOptimiser failed assembly

    First run through assembly of a mitochondrial genome sequenced with MiSeq (2 x 250) gave a pretty poor assembly with Kmers 25 through 31 (results shown for K31 - the best of the four). For comparison, Newbler gave two scaffolds which nearly covered the whole genome.

    ********************************************************
    Assembly id: 4
    Assembly score: 98
    Velveth timestamp: Jul 5 2015 01:30:15
    Velvetg timestamp: Jul 5 2015 01:46:28
    Velveth version: 1.2.02
    Velvetg version: 1.2.02
    Readfile(s): -longPaired -fastq SSFL14_mitochondrion/SSFL14-3_mitochondrial_reads.fastq
    Velveth parameter string: auto_data_31 31 -longPaired -fastq SSFL14_mitochondrion/SSFL14-3_mitochondrial_reads.fastq
    Velvetg parameter string: auto_data_31 -clean yes
    Assembly directory: /home/farman/auto_data_31
    Velvet hash value: 31
    Roadmap file size: 33431342
    Total number of contigs: 16825
    n50: 98
    length of longest contig: 275
    Total bases in contigs: 1568075
    Number of contigs > 1k: 0
    Total bases in contigs > 1k: 0
    **********************************************************

    Second round optimization with K31 failed to generate an assembly at all:

    Final optimised assembly details:
    ********************************************************
    Assembly id: 4
    Velveth timestamp: Jul 5 2015 01:30:15
    Velvetg timestamp: Jul 5 2015 02:09:33
    Velveth version: 1.2.02
    Velvetg version: 1.2.02
    Readfile(s): -longPaired -fastq SSFL14_mitochondrion/SSFL14-3_mitochondrial_reads.fastq
    Velveth parameter string: auto_data_31 31 -longPaired -fastq SSFL14_mitochondrion/SSFL14-3_mitochondrial_reads.fastq
    Velvetg parameter string: auto_data_31 -clean yes -exp_cov 3 -cov_cutoff 0.3502176
    Assembly directory: /home/farman/auto_data_31
    Velvet hash value: 31
    Roadmap file size: 33431342
    Total number of contigs: 0
    n50: 0
    length of longest contig: 0
    Total bases in contigs: 0
    Number of contigs > 1k: 0
    Total bases in contigs > 1k: 0
    Paired Library insert stats:
    **********************************************************

    Anyone know what's going on here? As an aside, I'm planning to increase max kmer size to run an optimization with k75 through k125 but I'm concerned that the assembly isn't working as it should to start off with. Note that I need to switch to a de Bruin assembler because most of our assemblies are too large for Newbler to handle and I'm using the mitochondrial data to optimize settings.
    Last edited by drdna; 07-05-2015, 05:36 AM.

  • #2
    It could be that the kmer size is too small.

    When you use a shorter kmer size the kmer coverage is higher, and
    velvet doesn't do as well if the kmer coverage is too high.

    See the velvet manual, and also velvetk to calculate
    a kmer length that would give you a kmer coverage in
    the optimal range.





    Comment


    • #3
      After adjusting maxkmerlength, VelvetOptimiser selected a kmer size of 131. Problem is, this was decided based on an assembly that had two nodes of 195 and 285 bp and are supposed to represent a 34 Mb mitochondrial genome. This is with the exact same dataset that Newbler assembled into 2 scaffolds (8 contigs) spanning 32 Mb.

      Comment


      • #4
        I don't know how well it works, but I heard about a kmer optimizer that can be used for Velvet. It is named kmergenie

        Comment


        • #5
          I am a seasoned bioinformatician with over 15 yrs experience and yet running these programs is ridiculously frustrating. kmergenie works fine for kmer discovery but the suggested cov_cutoff causes the assembly to fail (presumably a velvetg error):

          farman@imac:~$ velvetg Bm88324 -exp_cov 80 -cov_cutoff 3 -read_trkg yes -amos_file yes
          [0.000001] Reading graph file Bm88324/Graph2
          [0.000066] Graph has 187 nodes and 14000 sequences
          [0.023470] Reading read set file Bm88324/Sequences;
          [0.027264] 14000 sequences found
          [0.043557] Done
          [0.075096] Removing contigs with coverage < 3.000000...
          [0.078486] Concatenation...
          [0.078500] Renumbering nodes
          [0.078505] Initial node count 187
          [0.078512] Removed 187 null nodes
          [0.078517] Concatenation over!
          [0.078522] Concatenation...
          [0.078526] Renumbering nodes
          [0.078530] Initial node count 0
          [0.078534] Removed 0 null nodes
          [0.078539] Concatenation over!
          [0.078546] Clipping short tips off graph, drastic
          [0.078550] Concatenation...
          [0.078555] Renumbering nodes
          [0.078559] Initial node count 0
          [0.078564] Removed 0 null nodes
          [0.078574] Concatenation over!
          [0.078578] 0 nodes left
          [0.078584] Read coherency...
          [0.078588] Identifying unique nodes
          [0.078592] Done, 0 unique nodes counted
          [0.078603] Trimming read tips
          [0.078608] Confronted to 0 multiple hits and 0 null over 0
          [0.078612] Read coherency over!
          [0.078693] Concatenation...
          [0.078708] Renumbering nodes
          [0.078712] Initial node count 0
          [0.078717] Removed 0 null nodes
          [0.078721] Concatenation over!
          [0.078732] Removing reference contigs with coverage < 3.000000...
          [0.078738] Concatenation...
          [0.078748] Renumbering nodes
          [0.078752] Initial node count 0
          [0.078756] Removed 0 null nodes
          [0.078760] Concatenation over!
          [0.078886] Writing contigs into Bm88324/contigs.fa...
          [0.078922] Writing into stats file Bm88324/stats.txt...
          [0.097408] Writing into graph file Bm88324/LastGraph...
          [0.097510] Writing into AMOS file Bm88324/velvet_asm.afg...
          [0.151006] EMPTY GRAPH
          Final graph has 0 nodes and n50 of 0, max 0, total 0, using 0/14000 reads

          Yet if I set cut_cutoff at auto, it works just fine - although the assembly is nowhere close to being as good as what I can achieve with Newbler : 46 velvet nodes versus one Newbler contig spanning the entire mitochondrial genome.
          Last edited by drdna; 07-06-2015, 05:24 AM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin


            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
            Yesterday, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          45 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Working...
          X