SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Illumina Paired End reads are complementary mchandra Illumina/Solexa 3 07-21-2014 09:57 AM
simulating illumina paired end reads Splinter479 Bioinformatics 3 10-29-2013 05:59 AM
command for making an interleave file morning latte Bioinformatics 2 10-28-2013 06:55 AM
Illumina paired-end reads... naragam General 3 06-28-2012 05:51 AM
Limiting Illumina Paired-End Reads cryptic_star Bioinformatics 1 06-21-2010 05:30 AM

Reply
 
Thread Tools
Old 05-12-2015, 04:49 PM   #1
Zapages
Member
 
Location: NJ

Join Date: Oct 2012
Posts: 94
Default Newbler Illumina Paired End Reads - Interleave or not to interleave?

Hi Everyone,

Thew Newbler documentation has been very bare bone from what I have been able to gather. I have been able to successfully install Newbler with GUI on our Ubuntu (Bio Linux 8) workstation.

A summary for any future researchers is to use the following scripts. A huge thanks Jeff Wintersinger and dsenalik for their posts.

Code:
# Install 32-bit version of libs needed for JRE packaged with Newbler - do this as root
apt-get install libxi6:i386 libxtst6:i386

# Extract assembler archive downloaded from 454
tar xvzf DataAnalysis_2.8_All_20120731_2108.tgz
cd DataAnalysis_2.8_All/packages/

# Extract RPMs - Do not do this as root.
for foo in *.rpm; do rpm2cpio $foo | cpio -idmv; done
cd opt/454/apps

# Run assembler
assembly/bin/gsAssembler

#Optional, if you have trouble with importing your FASTQ, SFF, or FASTA files into the GUI of Newbler
cd /opt/454/apps/assembly/config
for file in ../../gsSeqTools/config/* ; do sudo ln -s $file ; done
Credit:: http://jeff.wintersinger.org/posts/2...n-ubuntu-1204/ and http://seqanswers.com/forums/showpos...1&postcount=22

I am working with 3 types of data sets:

Have two sets of Illumina Pair End reads (275 bp). On top of this, I have two sets of Ion PGM data sets (both SFF and FASTQ - longest read is about 600 bp). Finally I have a Fasta (Sanger) data set.

In the future, I hope to do a hybrid assembly with Newbler.

I was wondering do I have interleave the FASTQ File for the Illumina data sets before adding them to the Newbler GUI or do I leave them as they should be? I have been Interleaving the files for Ray and Velvet Assemblers (via command lines).

I know FASTQ Format is based on Sanger Quality (Illumina 1.8+). Also do I have to play around with the FastQ files to make it more acceptable for Newbler?

Should I play around with the settings? Should I leave default settings for the minimum overlap length (40) and minimum overlap identity (90)? Also any suggestions for the all contig threshold and longest contig threshold for bacterial and viral genomes? Does enabling low end coverage help? I was thinking of 50 for all contigs and 65K for the largest contig.

Thank you in advance.

-Zapages

Last edited by Zapages; 05-12-2015 at 04:51 PM.
Zapages is offline   Reply With Quote
Old 05-13-2015, 05:22 AM   #2
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Quote:
Originally Posted by Zapages View Post
I was wondering do I have interleave the FASTQ File for the Illumina data sets before adding them to the Newbler GUI or do I leave them as they should be? I have been Interleaving the files for Ray and Velvet Assemblers (via command lines).
Newbler determines pair based on the sequence ID's. I think it does not matter whether the reads are interleaved or not. But you need to check the 454PairStatus file(s) to be sure.

Quote:
Originally Posted by Zapages View Post
I know FASTQ Format is based on Sanger Quality (Illumina 1.8+). Also do I have to play around with the FastQ files to make it more acceptable for Newbler?
Maybe. Get them into the standard Sanger format.

Quote:
Originally Posted by Zapages View Post
Should I play around with the settings? Should I leave default settings for the minimum overlap length (40) and minimum overlap identity (90)? Also any suggestions for the all contig threshold and longest contig threshold for bacterial and viral genomes? Does enabling low end coverage help? I was thinking of 50 for all contigs and 65K for the largest contig.
With enough coverage (>20x) the default settings should be OK.

Note, though, that newbler does not really understand the Ion Torrent error model, although it is very close to the 454 one.

Good luck!
flxlex is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:19 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO