Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mandar.bobade60
    Member
    • Jun 2013
    • 14

    de novo assembly using velvet and Amos

    Hi all,

    I am newbie in NGS data analysis and first task in it come to me is de novo assembly. I have plant mitochondrial genome to assemble for which I have almost 24GB of data with R1 and R2 each.
    However, I have through with QC analysis and velvet output. The output with velvet I have got is contigs.fa files for multiple kmers, as 55, 95, 10. The read length is 101.

    I got statistics for kmer 85 contig.fa using quast which is as follows:

    Assembly contigs
    # contigs (>= 0 bp) 2933
    # contigs (>= 1000 bp) 274
    Total length (>= 0 bp) 2145071
    Total length (>= 1000 bp) 1433182
    # contigs 441
    Largest contig 62822
    Total length 1548880
    GC (%) 45.35
    N50 7528
    N75 3479
    L50 51
    L75 129
    # N's per 100 kbp 0.00

    But I am stuck here now, since I am not getting idea how to say this is good to proceed with or bad to go with something else. Also, it would be of great help if anyone suggest me further steps to be taken to arrive at well assembled genome.

    Regards,
    Mandar
  • WhatsOEver
    Senior Member
    • Apr 2012
    • 215

    #2
    On a first glance, the mitochondrium seems quite enormous in size with a really low GC-content.
    I would therefore assume, that you have whole genome sequencing data and that you didn't filter your reads in any way, did you?
    Can you tell from which organism this is?
    Does "24GB of data" mean you have 2x12GB fastq read files or you have 24Gbp of sequence information?

    The following paper on mitochondrial genome assembly from WGS might also be of interest for you:

    Comment

    • mandar.bobade60
      Member
      • Jun 2013
      • 14

      #3
      Thank you WhatsOEver for your paper link.

      It's only mitochondrial data with 24GB for each end, so collectively 48GB. But coverage is huge thats why data is too much. The only filtering are done using FASTQC and FastUniq.

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        I highly recommend subsampling that data; you have way too much to get a good assembly. Hard to say how much you need since mito vary in size. I'd start by subsampling by a factor of 200 and assembling again to get a better idea of how big the genome is (or you could estimate the size from a kmer frequency plot). Then, if you want to assemble with Velvet, subsample again or normalize to around 40x coverage.

        You can subsample paired reads with my reformat tool, which will keep the pairing intact.

        Comment

        • mandar.bobade60
          Member
          • Jun 2013
          • 14

          #5
          Subsampling

          Dear Brian Bushnell,
          I did subsampling and after subsmapling N50 value is getting substantially increased.
          I have 101300000 reads with expected mitochondrial genome size of 715000 base pairs.
          But problem persists even after picking file with less contig numbers (around 90-100) with good N50 is that the alignment result with raw reads to its contig file is horrible (almost 91% failure).

          Can anyone let me know further processing? Since genome is mitochondrial, I don't have much options also for multiple seq alignment with related fasta files.

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #6
            You still have ~14000x coverage which is way too high. Like I said, you need to target closer to 40x coverage, or at least, no more than 100x.

            BLAST your contigs to see what they are, and blast a few unaligned reads to see what those are. You could have massive contamination. And anyway, it seems unlikely that you have 24GB of data on a mitochondria. Why would anyone do that? It's very wasteful experimental design.

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM
            • SEQadmin2
              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
              by SEQadmin2


              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


              Introduction

              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
              05-22-2026, 06:42 AM
            • SEQadmin2
              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
              by SEQadmin2

              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
              05-06-2026, 09:04 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, Yesterday, 08:59 AM
            0 responses
            13 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 12:03 PM
            0 responses
            21 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 11:40 AM
            0 responses
            18 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 05-28-2026, 11:40 AM
            0 responses
            31 views
            0 reactions
            Last Post SEQadmin2  
            Working...