SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Ion Torrent



Similar Threads
Thread Thread Starter Forum Replies Last Post
Ion Torrent $1000 Genome!? Benchtop Ion Proton Sequencer aeonsim Ion Torrent 88 10-28-2012 04:50 AM
Ion Torrent Services? RealMD Ion Torrent 14 11-23-2011 01:00 PM
ion torrent herrroaa Introductions 5 07-25-2011 05:36 AM
Ion Torrent de novo assembly results nickloman Ion Torrent 2 05-11-2011 09:27 PM
Ion Torrent through the roof... james hadfield Ion Torrent 14 03-21-2011 09:34 AM

Reply
 
Thread Tools
Old 05-15-2012, 08:12 AM   #1
jdilts
Member
 
Location: Richmond, VA

Join Date: May 2012
Posts: 10
Default Ion Torrent Reference Assembly

What is the best reference assembler to use with ion torrent data? I can only seem to find information on Ion Torrent de novo assemblies, which is not what I'm looking for. Thanks in advance!
jdilts is offline   Reply With Quote
Old 05-15-2012, 09:16 AM   #2
jdilts
Member
 
Location: Richmond, VA

Join Date: May 2012
Posts: 10
Default

Is anyone using Newbler, DNASTAR, MIRA?
jdilts is offline   Reply With Quote
Old 07-05-2012, 10:52 AM   #3
hengnck
Junior Member
 
Location: New Zealand

Join Date: Aug 2009
Posts: 9
Default

Hi - I'm working with a small bacterial genome ~2.0 Mbp but de novo (new species). Got data from an Ion 318 chip, about 480 Mbp. Ran it through Newbler 2.3 - 3,000+ contigs. Set up default MIRA assembly *six* days ago and it's still going. :-( I wouldn't use DNA* - ridiculously expensive for what it does. Roche RefMapper is OK for some of our other known bacterial genomes.
hengnck is offline   Reply With Quote
Old 07-05-2012, 11:50 AM   #4
RonanC
Junior Member
 
Location: USA

Join Date: Jul 2012
Posts: 8
Default

I just noticed that CLC bio are offering a free 6 month trial of their CLC genomics workbench to users with a benchtop NGS (i.e. 454 GS Jr, MiSeq or IonTorrent PGM). Anybody have any experience with the CLC software?
RonanC is offline   Reply With Quote
Old 07-05-2012, 02:26 PM   #5
IonTorrent
Member
 
Location: Guilford, CT and S.F., CA

Join Date: Jan 2010
Posts: 64
Default

Quote:
Originally Posted by hengnck View Post
Hi - I'm working with a small bacterial genome ~2.0 Mbp but de novo (new species). Got data from an Ion 318 chip, about 480 Mbp. Ran it through Newbler 2.3 - 3,000+ contigs. Set up default MIRA assembly *six* days ago and it's still going. :-( I wouldn't use DNA* - ridiculously expensive for what it does. Roche RefMapper is OK for some of our other known bacterial genomes.
Hi hengnck,

Are you using all 480 Mbp of data in the assembly or are you downsampling? I ask because many software packages (like those you mention) will grossly underperform with excessive coverage, and are reported to work best in the 30X to 50X range (and if this is DNA from pure culture you're at ~240X). Are you a Torrent Suite user and if so are you using the MIRA plugin? The newest (v2.2) version allows you to specify the amount of coverage to use (best results are typically see at ~50X):

http://lifetech-it.hosted.jivesoftwa.../docs/DOC-2572

Some have commented that they use Newbler at around 30X coverage for de novo assembly.
IonTorrent is offline   Reply With Quote
Old 07-06-2012, 04:42 AM   #6
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

I concur with IT's comments about excessive coverage. You really need to scale back your input to ~30X. Also why Newbler 2.3? That is a very old version. Get version 2.6, they have made several improvements in the assembler.
kmcarr is offline   Reply With Quote
Old 07-06-2012, 12:19 PM   #7
hengnck
Junior Member
 
Location: New Zealand

Join Date: Aug 2009
Posts: 9
Default Downsampling ion data

Hi All,

Thanks for your comments - you can all probably see that I'm more comfortable in the Sanger era. Unfortunately in my Faculty, I'm the "bioinformatics team".

My default SOP is to use all the data - the more the merrier - but I can see now that I've got way too much data than required. How do I downsample 480 Mbp of essentially random reads down to 100-150 Mbp?

All advice is greatly appreciated.
hengnck is offline   Reply With Quote
Old 07-06-2012, 12:23 PM   #8
jdilts
Member
 
Location: Richmond, VA

Join Date: May 2012
Posts: 10
Default

What size do you want the reads? You could cut out all the smaller reads. (Maybe 50bp or less?) To do this you would need to write up a script of some sort. Either perl or python to get the length of each read and then output the reads that "qualify" into an outfile.
jdilts is offline   Reply With Quote
Old 07-06-2012, 12:28 PM   #9
hengnck
Junior Member
 
Location: New Zealand

Join Date: Aug 2009
Posts: 9
Default Re: Downsampling

Quote:
Originally Posted by jdilts View Post
What size do you want the reads? You could cut out all the smaller reads. (Maybe 50bp or less?) To do this you would need to write up a script of some sort. Either perl or python to get the length of each read and then output the reads that "qualify" into an outfile.
@jdilts - OK, I understand - I need to go see a perl or python programmer. The max read length was 398 but average was 177. My collaborator who did the sequencing did not specify the type of kit used but they're all shotgun reads. I will have to play around with min and max reads.
hengnck is offline   Reply With Quote
Old 07-06-2012, 12:34 PM   #10
jdilts
Member
 
Location: Richmond, VA

Join Date: May 2012
Posts: 10
Default Not too complicated

The script isn't too complicated. It would be something similar to this. I hope this can be of some assistance.
Code:
#/usr/bin/perl

use strict;
use warnings;

my $infile ="readFILE";
my $outfile = "quality_readsFILE";

#opens file with reads
open (IN,<,$infile) || die $!;
my @reads = <IN>; #stores each line in the file into an arrary
close (IN); #don't need the file anymore, close it


open (OUT,>,$outfile) || die $!; #open the out going file

my $j = 0; #array index
my $read_name;

#iterate through array
foreach my $i (@reads){
	if ($j%2 = 1) && (length($i)>=75){
	print OUT "$read_name\n$i\n";}
	}
	else{ $read_name = $i;} #stores read name
}
close (OUT);
jdilts is offline   Reply With Quote
Old 07-06-2012, 12:40 PM   #11
hengnck
Junior Member
 
Location: New Zealand

Join Date: Aug 2009
Posts: 9
Default

Quote:
Originally Posted by jdilts View Post
The script isn't too complicated. It would be something similar to this. I hope this can be of some assistance.
Code:
#/usr/bin/perl

use strict;
use warnings;

my $infile ="readFILE";
my $outfile = "quality_readsFILE";

#opens file with reads
open (IN,<,$infile) || die $!;
my @reads = <IN>; #stores each line in the file into an arrary
close (IN); #don't need the file anymore, close it


open (OUT,>,$outfile) || die $!; #open the out going file

my $j = 0; #array index
my $read_name;

#iterate through array
foreach my $i (@reads){
	if ($j%2 = 1) && (length($i)>=75){
	print OUT "$read_name\n$i\n";}
	}
	else{ $read_name = $i;} #stores read name
}
close (OUT);
Thanks, jdilts. Will try things out - as soon as I kill the MIRA assembly (7 days and counting).
hengnck is offline   Reply With Quote
Old 07-06-2012, 12:42 PM   #12
jdilts
Member
 
Location: Richmond, VA

Join Date: May 2012
Posts: 10
Default in 7 days

In the future me know if you have any programming issues. I'd be glad to help you out.
jdilts is offline   Reply With Quote
Old 07-08-2012, 07:14 AM   #13
genseq
Member
 
Location: Russia

Join Date: Dec 2007
Posts: 88
Default

http://flxlexblog.wordpress.com/2012...substr-mg1655/

Ion Torrent Mate Pairs and a single scaffold for E coli K12 substr. MG1655
The de novo assembly approach Ion Torrent chose, using sff_extract, MIRA and SSPACE, seems to be giving quite long contigs, with almost all genes complete. However, newbler outperfoms SSPACE in scaffolding.
genseq is offline   Reply With Quote
Old 07-09-2012, 03:29 AM   #14
BenjaminL
Junior Member
 
Location: Hartford, CT

Join Date: Sep 2010
Posts: 5
Default

Length limiting is a Great idea jdilts.
One quick note on that quick perl script...

The concept for length checking is a good one, but this script fetches and measures each line as a read. If you are using a specific file type of the sequencer's reads the it will depend on the format.
e.g. fastq uses (at least) 4 lines for each section; including name, sequence, quality and one optional line. This is assuming that the sequence is all one line. A complete read may be longer than one section of the fastq as well.

To parse a specific file type (as opposed to one that has one line per read) then I recommend you either write a new function/method or use a prewritten library that does that. I know that bioperl and biopython have packages that read many file types, fastq being just one of them.

-Benjamin-
BenjaminL is offline   Reply With Quote
Old 07-09-2012, 06:59 AM   #15
jonathanjacobs
Member
 
Location: Rockville, MD

Join Date: Apr 2011
Posts: 23
Smile

Quote:
Originally Posted by RonanC View Post
I just noticed that CLC bio are offering a free 6 month trial of their CLC genomics workbench to users with a benchtop NGS (i.e. 454 GS Jr, MiSeq or IonTorrent PGM). Anybody have any experience with the CLC software?
@Ronan: We use GALAXY and CLCbio Genome Workbench in our shop and have a MiSeq and IonTorrent in house. we're mainly doing microbial and viral resequencing, but it seems as though de novo assembly keeps creeping in as well. In any case - CLCbio is --extremely-- fast (minutes) to do a 30x coverage of a 5MB genome for read mapping. The added benefit is that is also does true hybrid assembly of both PE and single read NGS data from both MiSeq and PGM at the same time. The quality/accuracy has also been as good, if not better, than some of the open source solutions we are running as well (with GALAXY). It's pricey (~$5K/license) - but the time savings in both setup and running is worth it. Don't get me wrong though - GALAXY is also very very good, but it took a while for me to get some of the tools we're using to install "right" to work with GALAXY.

POSTEDIT: The original post mentioed "reference assembly" - perhaps I've crossed some wires. I thinking "read mapping." For de novo assembly - CLCbio is also very fast and accurate. We routinely get down to the sub-100 contigs with single Ion318 or MiSeq PE runs for a 5MB genome. (N50 is on average around 190K)

Last edited by jonathanjacobs; 07-09-2012 at 08:19 AM.
jonathanjacobs is offline   Reply With Quote
Old 07-09-2012, 12:11 PM   #16
slm1816
Junior Member
 
Location: Baltimore

Join Date: Jul 2012
Posts: 6
Default

I am also curious about the CLC software....I'm not exactly sure in how to use it.
slm1816 is offline   Reply With Quote
Old 07-09-2012, 10:58 PM   #17
BenjaminL
Junior Member
 
Location: Hartford, CT

Join Date: Sep 2010
Posts: 5
Default

Quick start for clcbio is to check their docs.
Their support page is at: http://www.clcbio.com/index.php?id=615 It has links to FAQ, tutorials, screencasts, etc.
Open up the app, see what it looks like, see how much is intuitive. There are many many more features than any one sequencing lab or bioinf group will use, so my usual way is to investigate the app, then check docs, then investigate, check docs, ask forum, investigate, repeat.

This clc-specific topic (if continued) is probably better as new thread.

-B-
__________________
Benjamin
Jackson Laboratory for Genomic Medicine
BenjaminL is offline   Reply With Quote
Old 07-27-2012, 11:52 AM   #18
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Thumbs up

Thanks for your info and advice.

Quote:
Originally Posted by BenjaminL View Post
Quick start for clcbio is to check their docs.
Their support page is at: http://www.clcbio.com/index.php?id=615 It has links to FAQ, tutorials, screencasts, etc.
Open up the app, see what it looks like, see how much is intuitive. There are many many more features than any one sequencing lab or bioinf group will use, so my usual way is to investigate the app, then check docs, then investigate, check docs, ask forum, investigate, repeat.

This clc-specific topic (if continued) is probably better as new thread.

-B-
byou678 is offline   Reply With Quote
Old 09-26-2012, 11:30 AM   #19
DNASTAR
Registered Vendor
 
Location: Madison, WI

Join Date: Aug 2010
Posts: 48
Default

Hi All,

DNASTAR software fully supports reference-guided assemblies (as well as de novo) for Ion Torrent data. If you check out the Ion Torrent page on our website, you can see videos of many of the Ion Torrent project types we handle (Ion AmpliSeq Cancer Panel, Paired-End Assembly with a Reference, etc.), as well as benchmarks and other resources.

Also, feel free to download a fully-functional free trial of Lasergene Genomics Suite to try it out for yourself.

If you have questions, just give us a call or send us an email.
866-511-5090
support@dnastar.com

Thanks,
Anne
DNASTAR is offline   Reply With Quote
Old 03-01-2013, 06:51 AM   #20
aguffanti
Member
 
Location: Milano, Italy

Join Date: Dec 2008
Posts: 29
Smile Referenced assembly with Ion Torrent - Mosaik from fastq files

Hi - has anybody tried referenced or de novo assembly from the .sam (or .fastq) files and ion Torrent datasets ? I downloaded the following data sets from the ion Community:

B7-143
B7-295
C19-543

And I do have .bam (.sam) and .fastq files

Velvet de novo assembly technically worked, but left me with many thousands of small contigs, so it is useless

(*) Velvet

All trials of referenced assembly with the Columbus extension to Velvet and Mosaik Assembler (using sorted sam files) and Mosaik Assembler (MosaikBuild) failed, apparently for serious inconsistencies on the data set or data format incompatibility:
[0.000000] Reading FastA file NC_010473.C19-543.genome.fasta;
[59.568244] 1 sequences found
[59.568247] Done
[59.568619] Reading SAM file C19-543.sorted.sam
[355.404283] 6906611 reads found.
[355.404285] Done
[355.404286] Reference mapping counters
[355.404287] Name Read mappings
[355.404288] gi|170079663|ref|NC_010473.1| 19688393
[356.782277] Reading read set file C19-543/Sequences;
[363.732809] 6906612 sequences found
[363.733517] Read 1 of length 32794, longer than limit 32767
[363.733519] You should modify recompile with the LONGSEQUENCES option (cf. manual)

(*) MosaikBuild

MosaikBuild -q B7-295.fastq.gz -out B7-295.reads.dat -st 454
------------------------------------------------------------------------------
MosaikBuild 2.1.73 2012-11-08
Michael Stromberg & Wan-Ping Lee Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- setting read group ID to: ZKON5B26EGE
- setting sample name to: unknown
- setting sequencing technology to: 454

- parsing FASTQ file:
reads: 2,281,414 \ERROR: The number of qualities (45) do not match the number of bases (385) in 9IKNG:01351:01857.

Is everybody using newbler ? do you find the same problems of data inconsistency on the Ion Torrent fastq or bam/sam converted formats ?

Keep in touch and thanks in advance !

Alessandro
aguffanti is offline   Reply With Quote
Reply

Tags
assembly, ion, reference, torrent

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:07 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO