Newbler Illumina Paired End Reads - Interleave or not to interleave?

Zapages

Member

Join Date: Oct 2012

Posts: 98
- Share
- Tweet
#1

Newbler Illumina Paired End Reads - Interleave or not to interleave?

05-12-2015, 04:49 PM

Hi Everyone,

Thew Newbler documentation has been very bare bone from what I have been able to gather. I have been able to successfully install Newbler with GUI on our Ubuntu (Bio Linux 8) workstation.

A summary for any future researchers is to use the following scripts. A huge thanks Jeff Wintersinger and dsenalik for their posts.

Code:

# Install 32-bit version of libs needed for JRE packaged with Newbler - do this as root apt-get install libxi6:i386 libxtst6:i386 # Extract assembler archive downloaded from 454 tar xvzf DataAnalysis_2.8_All_20120731_2108.tgz cd DataAnalysis_2.8_All/packages/ # Extract RPMs - Do not do this as root. for foo in *.rpm; do rpm2cpio $foo | cpio -idmv; done cd opt/454/apps # Run assembler assembly/bin/gsAssembler #Optional, if you have trouble with importing your FASTQ, SFF, or FASTA files into the GUI of Newbler cd /opt/454/apps/assembly/config for file in ../../gsSeqTools/config/* ; do sudo ln -s $file ; done

Credit:: http://jeff.wintersinger.org/posts/2...n-ubuntu-1204/ and http://seqanswers.com/forums/showpos...1&postcount=22

I am working with 3 types of data sets:

Have two sets of Illumina Pair End reads (275 bp). On top of this, I have two sets of Ion PGM data sets (both SFF and FASTQ - longest read is about 600 bp). Finally I have a Fasta (Sanger) data set.

In the future, I hope to do a hybrid assembly with Newbler.

I was wondering do I have interleave the FASTQ File for the Illumina data sets before adding them to the Newbler GUI or do I leave them as they should be? I have been Interleaving the files for Ray and Velvet Assemblers (via command lines).

I know FASTQ Format is based on Sanger Quality (Illumina 1.8+). Also do I have to play around with the FastQ files to make it more acceptable for Newbler?

Should I play around with the settings? Should I leave default settings for the minimum overlap length (40) and minimum overlap identity (90)? Also any suggestions for the all contig threshold and longest contig threshold for bacterial and viral genomes? Does enabling low end coverage help? I was thinking of 50 for all contigs and 65K for the largest contig.

Thank you in advance.

-Zapages

Last edited by Zapages; 05-12-2015, 04:51 PM.
Tags: None
flxlex

Moderator

Join Date: Nov 2008

Posts: 414
- Share
- Tweet
#2

05-13-2015, 05:22 AM

Originally posted by Zapages View Post

I was wondering do I have interleave the FASTQ File for the Illumina data sets before adding them to the Newbler GUI or do I leave them as they should be? I have been Interleaving the files for Ray and Velvet Assemblers (via command lines).

Newbler determines pair based on the sequence ID's. I think it does not matter whether the reads are interleaved or not. But you need to check the 454PairStatus file(s) to be sure.

Originally posted by Zapages View Post

I know FASTQ Format is based on Sanger Quality (Illumina 1.8+). Also do I have to play around with the FastQ files to make it more acceptable for Newbler?

Maybe. Get them into the standard Sanger format.

Originally posted by Zapages View Post

Should I play around with the settings? Should I leave default settings for the minimum overlap length (40) and minimum overlap identity (90)? Also any suggestions for the all contig threshold and longest contig threshold for bacterial and viral genomes? Does enabling low end coverage help? I was thinking of 50 for all contigs and 65K for the largest contig.

With enough coverage (>20x) the default settings should be OK.

Note, though, that newbler does not really understand the Ion Torrent error model, although it is very close to the 454 one.

Good luck!
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
- Channel: Articles
Yesterday, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 52 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Newbler Illumina Paired End Reads - Interleave or not to interleave?

Comment

Latest Articles

ad_right_rmr

News