Seqanswers Leaderboard Ad

**Markiyan** · 07-13-2017, 05:21 AM

Try assembling less data first... Use MiSeq 2x250 or 2x300...

First I would try assembling less data, and see what are the most abundant species in the datasets... Than filter it out and repeat with more data...

Also I would use 4 channel Illumina sequences in 2x250bp mode (Miseq or Hiseq 2500) which have 3-4 times less raw reads errors than 2 channel Nextseq.

The amount of RAM/CPU used by most de novo assemblers can grow exponentially from increased raw reads error rates... Also high coverage noisy data is much more resource demanding than low coverage good quality data.

Nextseq should be used a REsequencing platform, not as a de novo sequencing one...

While the data from the above platforms is more expensive than Nexteseq on £/Gbp basis, but an extra sequencing cost of a good quality input dataset is usually way less than the cost of wasted scientists/experiments time/reagents analysing bad assembly results...

**cyanoevo** · 07-13-2017, 06:38 AM

Thanks for your thoughts. Unfortunately our sequencing centre has seen fit the swap their HiSeq 2500 for a NextSeq some am stuck with it. Funnily enough I had no problems when I was working with HiSeq data....

**Brian Bushnell** · 07-13-2017, 08:01 AM

Spurious kmers increase memory consumption; you can get rid of a lot of these via preprocessing: adapter-trimming, error-correction, discarding reads with singleton kmers, normalization, overlap-based read merging, and so forth. If SPAdes still runs out of memory, you can try Megahit instead. Don't assemble the lanes independently and try to merge them; that won't be beneficial.

NextSeq has a much higher error rate than HiSeq 2500. You may want to try FilterByTile to get rid of the lowest-quality reads by flowcell position.

**cyanoevo** · 07-13-2017, 08:08 AM

Thanks Brian, that's very helpful. Was actually about to try normalizing with bbnorm to see if that improved things.

**GenoMax** · 07-13-2017, 08:12 AM

Even though NextSeq has 4 "lanes" that are optically distinct they share the same fluidic path. If you were going to normalize the data then do it on all 4 "lanes" at the same time.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Independent assemblies from NextSeq FASTQs

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News