Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • de novo assembly using velvet and Amos

    Hi all,

    I am newbie in NGS data analysis and first task in it come to me is de novo assembly. I have plant mitochondrial genome to assemble for which I have almost 24GB of data with R1 and R2 each.
    However, I have through with QC analysis and velvet output. The output with velvet I have got is contigs.fa files for multiple kmers, as 55, 95, 10. The read length is 101.

    I got statistics for kmer 85 contig.fa using quast which is as follows:

    Assembly contigs
    # contigs (>= 0 bp) 2933
    # contigs (>= 1000 bp) 274
    Total length (>= 0 bp) 2145071
    Total length (>= 1000 bp) 1433182
    # contigs 441
    Largest contig 62822
    Total length 1548880
    GC (%) 45.35
    N50 7528
    N75 3479
    L50 51
    L75 129
    # N's per 100 kbp 0.00

    But I am stuck here now, since I am not getting idea how to say this is good to proceed with or bad to go with something else. Also, it would be of great help if anyone suggest me further steps to be taken to arrive at well assembled genome.

    Regards,
    Mandar

  • #2
    On a first glance, the mitochondrium seems quite enormous in size with a really low GC-content.
    I would therefore assume, that you have whole genome sequencing data and that you didn't filter your reads in any way, did you?
    Can you tell from which organism this is?
    Does "24GB of data" mean you have 2x12GB fastq read files or you have 24Gbp of sequence information?

    The following paper on mitochondrial genome assembly from WGS might also be of interest for you:

    Comment


    • #3
      Thank you WhatsOEver for your paper link.

      It's only mitochondrial data with 24GB for each end, so collectively 48GB. But coverage is huge thats why data is too much. The only filtering are done using FASTQC and FastUniq.

      Comment


      • #4
        I highly recommend subsampling that data; you have way too much to get a good assembly. Hard to say how much you need since mito vary in size. I'd start by subsampling by a factor of 200 and assembling again to get a better idea of how big the genome is (or you could estimate the size from a kmer frequency plot). Then, if you want to assemble with Velvet, subsample again or normalize to around 40x coverage.

        You can subsample paired reads with my reformat tool, which will keep the pairing intact.

        Comment


        • #5
          Subsampling

          Dear Brian Bushnell,
          I did subsampling and after subsmapling N50 value is getting substantially increased.
          I have 101300000 reads with expected mitochondrial genome size of 715000 base pairs.
          But problem persists even after picking file with less contig numbers (around 90-100) with good N50 is that the alignment result with raw reads to its contig file is horrible (almost 91% failure).

          Can anyone let me know further processing? Since genome is mitochondrial, I don't have much options also for multiple seq alignment with related fasta files.

          Comment


          • #6
            You still have ~14000x coverage which is way too high. Like I said, you need to target closer to 40x coverage, or at least, no more than 100x.

            BLAST your contigs to see what they are, and blast a few unaligned reads to see what those are. You could have massive contamination. And anyway, it seems unlikely that you have 24GB of data on a mitochondria. Why would anyone do that? It's very wasteful experimental design.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            7 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            7 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            66 views
            0 likes
            Last Post seqadmin  
            Working...
            X