Hi Everyone,
Thew Newbler documentation has been very bare bone from what I have been able to gather. I have been able to successfully install Newbler with GUI on our Ubuntu (Bio Linux 8) workstation.
A summary for any future researchers is to use the following scripts. A huge thanks Jeff Wintersinger and dsenalik for their posts.
Credit:: http://jeff.wintersinger.org/posts/2...n-ubuntu-1204/ and http://seqanswers.com/forums/showpos...1&postcount=22
I am working with 3 types of data sets:
Have two sets of Illumina Pair End reads (275 bp). On top of this, I have two sets of Ion PGM data sets (both SFF and FASTQ - longest read is about 600 bp). Finally I have a Fasta (Sanger) data set.
In the future, I hope to do a hybrid assembly with Newbler.
I was wondering do I have interleave the FASTQ File for the Illumina data sets before adding them to the Newbler GUI or do I leave them as they should be? I have been Interleaving the files for Ray and Velvet Assemblers (via command lines).
I know FASTQ Format is based on Sanger Quality (Illumina 1.8+). Also do I have to play around with the FastQ files to make it more acceptable for Newbler?
Should I play around with the settings? Should I leave default settings for the minimum overlap length (40) and minimum overlap identity (90)? Also any suggestions for the all contig threshold and longest contig threshold for bacterial and viral genomes? Does enabling low end coverage help? I was thinking of 50 for all contigs and 65K for the largest contig.
Thank you in advance.
-Zapages
Thew Newbler documentation has been very bare bone from what I have been able to gather. I have been able to successfully install Newbler with GUI on our Ubuntu (Bio Linux 8) workstation.
A summary for any future researchers is to use the following scripts. A huge thanks Jeff Wintersinger and dsenalik for their posts.
Code:
# Install 32-bit version of libs needed for JRE packaged with Newbler - do this as root apt-get install libxi6:i386 libxtst6:i386 # Extract assembler archive downloaded from 454 tar xvzf DataAnalysis_2.8_All_20120731_2108.tgz cd DataAnalysis_2.8_All/packages/ # Extract RPMs - Do not do this as root. for foo in *.rpm; do rpm2cpio $foo | cpio -idmv; done cd opt/454/apps # Run assembler assembly/bin/gsAssembler #Optional, if you have trouble with importing your FASTQ, SFF, or FASTA files into the GUI of Newbler cd /opt/454/apps/assembly/config for file in ../../gsSeqTools/config/* ; do sudo ln -s $file ; done
I am working with 3 types of data sets:
Have two sets of Illumina Pair End reads (275 bp). On top of this, I have two sets of Ion PGM data sets (both SFF and FASTQ - longest read is about 600 bp). Finally I have a Fasta (Sanger) data set.
In the future, I hope to do a hybrid assembly with Newbler.
I was wondering do I have interleave the FASTQ File for the Illumina data sets before adding them to the Newbler GUI or do I leave them as they should be? I have been Interleaving the files for Ray and Velvet Assemblers (via command lines).
I know FASTQ Format is based on Sanger Quality (Illumina 1.8+). Also do I have to play around with the FastQ files to make it more acceptable for Newbler?
Should I play around with the settings? Should I leave default settings for the minimum overlap length (40) and minimum overlap identity (90)? Also any suggestions for the all contig threshold and longest contig threshold for bacterial and viral genomes? Does enabling low end coverage help? I was thinking of 50 for all contigs and 65K for the largest contig.
Thank you in advance.
-Zapages
Comment