Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • cyanoevo
    Member
    • Jan 2015
    • 16

    Independent assemblies from NextSeq FASTQs

    Haven't found a thread answering this question yet, apologies if it already exists somewhere.

    I'm assembling some very large metagenomes in SPades from NextSeq data. I understand that the four FASTQ files from each of the flowcell lanes is typically concatenated to make a single file. However, SPades is running out of memory on my server mid assembly.

    My question is this: Is there any technical reason for concatenating the FASTQ prior to analysis, rather than doing four assemblies and merging the scaffolds later? Doing the latter would save me memory but don't want to do it if it's bad form.

    Still learning this stuff so any pointers welcome...

    Cheers,
    Nathan
  • Markiyan
    Senior Member
    • Sep 2010
    • 126

    #2
    Try assembling less data first... Use MiSeq 2x250 or 2x300...

    First I would try assembling less data, and see what are the most abundant species in the datasets... Than filter it out and repeat with more data...

    Also I would use 4 channel Illumina sequences in 2x250bp mode (Miseq or Hiseq 2500) which have 3-4 times less raw reads errors than 2 channel Nextseq.

    The amount of RAM/CPU used by most de novo assemblers can grow exponentially from increased raw reads error rates... Also high coverage noisy data is much more resource demanding than low coverage good quality data.

    Nextseq should be used a REsequencing platform, not as a de novo sequencing one...

    While the data from the above platforms is more expensive than Nexteseq on £/Gbp basis, but an extra sequencing cost of a good quality input dataset is usually way less than the cost of wasted scientists/experiments time/reagents analysing bad assembly results...

    Comment

    • cyanoevo
      Member
      • Jan 2015
      • 16

      #3
      Thanks for your thoughts. Unfortunately our sequencing centre has seen fit the swap their HiSeq 2500 for a NextSeq some am stuck with it. Funnily enough I had no problems when I was working with HiSeq data....

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        Spurious kmers increase memory consumption; you can get rid of a lot of these via preprocessing: adapter-trimming, error-correction, discarding reads with singleton kmers, normalization, overlap-based read merging, and so forth. If SPAdes still runs out of memory, you can try Megahit instead. Don't assemble the lanes independently and try to merge them; that won't be beneficial.

        NextSeq has a much higher error rate than HiSeq 2500. You may want to try FilterByTile to get rid of the lowest-quality reads by flowcell position.

        Comment

        • cyanoevo
          Member
          • Jan 2015
          • 16

          #5
          Thanks Brian, that's very helpful. Was actually about to try normalizing with bbnorm to see if that improved things.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Even though NextSeq has 4 "lanes" that are optically distinct they share the same fluidic path. If you were going to normalize the data then do it on all 4 "lanes" at the same time.

            Comment

            Latest Articles

            Collapse

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            12 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            23 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 12:03 PM
            0 responses
            28 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 11:40 AM
            0 responses
            22 views
            0 reactions
            Last Post SEQadmin2  
            Working...