Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Independent assemblies from NextSeq FASTQs

    Haven't found a thread answering this question yet, apologies if it already exists somewhere.

    I'm assembling some very large metagenomes in SPades from NextSeq data. I understand that the four FASTQ files from each of the flowcell lanes is typically concatenated to make a single file. However, SPades is running out of memory on my server mid assembly.

    My question is this: Is there any technical reason for concatenating the FASTQ prior to analysis, rather than doing four assemblies and merging the scaffolds later? Doing the latter would save me memory but don't want to do it if it's bad form.

    Still learning this stuff so any pointers welcome...

    Cheers,
    Nathan

  • #2
    Try assembling less data first... Use MiSeq 2x250 or 2x300...

    First I would try assembling less data, and see what are the most abundant species in the datasets... Than filter it out and repeat with more data...

    Also I would use 4 channel Illumina sequences in 2x250bp mode (Miseq or Hiseq 2500) which have 3-4 times less raw reads errors than 2 channel Nextseq.

    The amount of RAM/CPU used by most de novo assemblers can grow exponentially from increased raw reads error rates... Also high coverage noisy data is much more resource demanding than low coverage good quality data.

    Nextseq should be used a REsequencing platform, not as a de novo sequencing one...

    While the data from the above platforms is more expensive than Nexteseq on £/Gbp basis, but an extra sequencing cost of a good quality input dataset is usually way less than the cost of wasted scientists/experiments time/reagents analysing bad assembly results...

    Comment


    • #3
      Thanks for your thoughts. Unfortunately our sequencing centre has seen fit the swap their HiSeq 2500 for a NextSeq some am stuck with it. Funnily enough I had no problems when I was working with HiSeq data....

      Comment


      • #4
        Spurious kmers increase memory consumption; you can get rid of a lot of these via preprocessing: adapter-trimming, error-correction, discarding reads with singleton kmers, normalization, overlap-based read merging, and so forth. If SPAdes still runs out of memory, you can try Megahit instead. Don't assemble the lanes independently and try to merge them; that won't be beneficial.

        NextSeq has a much higher error rate than HiSeq 2500. You may want to try FilterByTile to get rid of the lowest-quality reads by flowcell position.

        Comment


        • #5
          Thanks Brian, that's very helpful. Was actually about to try normalizing with bbnorm to see if that improved things.

          Comment


          • #6
            Even though NextSeq has 4 "lanes" that are optically distinct they share the same fluidic path. If you were going to normalize the data then do it on all 4 "lanes" at the same time.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM
            • seqadmin
              The Impact of AI in Genomic Medicine
              by seqadmin



              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
              02-26-2024, 02:07 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-14-2024, 06:13 AM
            0 responses
            33 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-08-2024, 08:03 AM
            0 responses
            72 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-07-2024, 08:13 AM
            0 responses
            80 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-06-2024, 09:51 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X