Seqanswers Leaderboard Ad

**Brian Bushnell** · 07-13-2016, 02:54 PM

You will not get an optimal assembly if you split reads into multiple subsets and assembly them independently. In fact, you'll get a mess. If you run out of memory, you need to use a computer that has more memory, or a different algorithm.

**ronaldrcutler** · 07-13-2016, 06:29 PM

Okay thanks for the important info.

Do you know if I put in two paired-end files and use the paired-end option on both of them, Newbler will recognize this as paired end files? Or is using a merged file of the two paired-end reads a better approach?

**Brian Bushnell** · 07-13-2016, 06:44 PM

Sorry, I have never used Newbler, so I don't know its idiosyncrasies... but hopefully someone else does!

Typically, if you have overlapping reads, an OLC assembler will perform best with merged reads. Flash does not perform well in my tests, though. Bearing in mind that I am biased, being the developer, I recommend BBMerge for joining paired reads prior to assembly.

What was your merge rate? The best procedure depends on that... if the insert size was too long to merge a substantial fraction of the reads, it's better to skip merging.

**ronaldrcutler** · 07-14-2016, 03:27 AM

The max read length is 250 bp, which I used as the maxOverlap parameter in flash. The results of this merge:

Code:

[FLASH] Read combination statistics:
[FLASH]  Total pairs: 8576138
[FLASH]  Combined pairs:  6056207
[FLASH]  Uncombined pairs: 2519931
[FLASH]  Percent combined: 70.62%

Note that when I adjusted maxOverlap to be 225, I was getting a warning that a high proportion overlapped by more than 225 bp. Which is why I stuck with 250. Although this may not be the best option since my max read length is 250 bp?

The max read length was calculated by using this command and looking through all the reads to determine a max read length:

Code:

awk '{if(NR%4==2) print NR"\t"$0"\t"length($0)}' <read> > <output.txt>

**GenoMax** · 07-14-2016, 03:47 AM

Is this 454 data? Is that the reason for using newbler?

**ronaldrcutler** · 07-14-2016, 03:51 AM

No this is fastq data.

**GenoMax** · 07-14-2016, 03:58 AM

From which platform? How big is the genome expected to be? What is the read length?

**ronaldrcutler** · 07-14-2016, 07:47 AM

Illumina I believe.

The merged paired-end file (using flash) has 6056207 sequences, 1451352720 bp
The mate1 paired-end file has 8576138 sequences, 1798377920 bp

The read lengths are 250 bp

Topics	Statistics	Last Post
Bacterial Timeline Study Suggests Oxygen Use Preceded Photosynthesis by seqadmin Started by seqadmin, Today, 12:59 PM	0 responses 6 views 0 reactions	Last Post by seqadmin Today, 12:59 PM
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 8 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 60 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM

Seqanswers Leaderboard Ad

Newblar (GS de novo assembler) paired end input

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News