![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Ion Torrent $1000 Genome!? Benchtop Ion Proton Sequencer | aeonsim | Ion Torrent | 88 | 10-28-2012 05:50 AM |
Ion Torrent Services? | RealMD | Ion Torrent | 14 | 11-23-2011 02:00 PM |
ion torrent | herrroaa | Introductions | 5 | 07-25-2011 06:36 AM |
Ion Torrent de novo assembly results | nickloman | Ion Torrent | 2 | 05-11-2011 10:27 PM |
Ion Torrent through the roof... | james hadfield | Ion Torrent | 14 | 03-21-2011 10:34 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Richmond, VA Join Date: May 2012
Posts: 10
|
![]()
What is the best reference assembler to use with ion torrent data? I can only seem to find information on Ion Torrent de novo assemblies, which is not what I'm looking for. Thanks in advance!
|
![]() |
![]() |
![]() |
#2 |
Member
Location: Richmond, VA Join Date: May 2012
Posts: 10
|
![]()
Is anyone using Newbler, DNASTAR, MIRA?
|
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: New Zealand Join Date: Aug 2009
Posts: 9
|
![]()
Hi - I'm working with a small bacterial genome ~2.0 Mbp but de novo (new species). Got data from an Ion 318 chip, about 480 Mbp. Ran it through Newbler 2.3 - 3,000+ contigs. Set up default MIRA assembly *six* days ago and it's still going. :-( I wouldn't use DNA* - ridiculously expensive for what it does. Roche RefMapper is OK for some of our other known bacterial genomes.
|
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: USA Join Date: Jul 2012
Posts: 8
|
![]()
I just noticed that CLC bio are offering a free 6 month trial of their CLC genomics workbench to users with a benchtop NGS (i.e. 454 GS Jr, MiSeq or IonTorrent PGM). Anybody have any experience with the CLC software?
|
![]() |
![]() |
![]() |
#5 | |
Member
Location: Guilford, CT and S.F., CA Join Date: Jan 2010
Posts: 64
|
![]() Quote:
Are you using all 480 Mbp of data in the assembly or are you downsampling? I ask because many software packages (like those you mention) will grossly underperform with excessive coverage, and are reported to work best in the 30X to 50X range (and if this is DNA from pure culture you're at ~240X). Are you a Torrent Suite user and if so are you using the MIRA plugin? The newest (v2.2) version allows you to specify the amount of coverage to use (best results are typically see at ~50X): http://lifetech-it.hosted.jivesoftwa.../docs/DOC-2572 Some have commented that they use Newbler at around 30X coverage for de novo assembly. |
|
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]()
I concur with IT's comments about excessive coverage. You really need to scale back your input to ~30X. Also why Newbler 2.3? That is a very old version. Get version 2.6, they have made several improvements in the assembler.
|
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: New Zealand Join Date: Aug 2009
Posts: 9
|
![]()
Hi All,
Thanks for your comments - you can all probably see that I'm more comfortable in the Sanger era. ![]() My default SOP is to use all the data - the more the merrier - but I can see now that I've got way too much data than required. How do I downsample 480 Mbp of essentially random reads down to 100-150 Mbp? All advice is greatly appreciated. |
![]() |
![]() |
![]() |
#8 |
Member
Location: Richmond, VA Join Date: May 2012
Posts: 10
|
![]()
What size do you want the reads? You could cut out all the smaller reads. (Maybe 50bp or less?) To do this you would need to write up a script of some sort. Either perl or python to get the length of each read and then output the reads that "qualify" into an outfile.
|
![]() |
![]() |
![]() |
#9 | |
Junior Member
Location: New Zealand Join Date: Aug 2009
Posts: 9
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#10 |
Member
Location: Richmond, VA Join Date: May 2012
Posts: 10
|
![]()
The script isn't too complicated. It would be something similar to this. I hope this can be of some assistance.
Code:
#/usr/bin/perl use strict; use warnings; my $infile ="readFILE"; my $outfile = "quality_readsFILE"; #opens file with reads open (IN,<,$infile) || die $!; my @reads = <IN>; #stores each line in the file into an arrary close (IN); #don't need the file anymore, close it open (OUT,>,$outfile) || die $!; #open the out going file my $j = 0; #array index my $read_name; #iterate through array foreach my $i (@reads){ if ($j%2 = 1) && (length($i)>=75){ print OUT "$read_name\n$i\n";} } else{ $read_name = $i;} #stores read name } close (OUT); |
![]() |
![]() |
![]() |
#11 | |
Junior Member
Location: New Zealand Join Date: Aug 2009
Posts: 9
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#12 |
Member
Location: Richmond, VA Join Date: May 2012
Posts: 10
|
![]()
In the future me know if you have any programming issues. I'd be glad to help you out.
|
![]() |
![]() |
![]() |
#13 |
Member
Location: Russia Join Date: Dec 2007
Posts: 88
|
![]()
http://flxlexblog.wordpress.com/2012...substr-mg1655/
Ion Torrent Mate Pairs and a single scaffold for E coli K12 substr. MG1655 The de novo assembly approach Ion Torrent chose, using sff_extract, MIRA and SSPACE, seems to be giving quite long contigs, with almost all genes complete. However, newbler outperfoms SSPACE in scaffolding. |
![]() |
![]() |
![]() |
#14 |
Junior Member
Location: Hartford, CT Join Date: Sep 2010
Posts: 5
|
![]()
Length limiting is a Great idea jdilts.
One quick note on that quick perl script... The concept for length checking is a good one, but this script fetches and measures each line as a read. If you are using a specific file type of the sequencer's reads the it will depend on the format. e.g. fastq uses (at least) 4 lines for each section; including name, sequence, quality and one optional line. This is assuming that the sequence is all one line. A complete read may be longer than one section of the fastq as well. To parse a specific file type (as opposed to one that has one line per read) then I recommend you either write a new function/method or use a prewritten library that does that. I know that bioperl and biopython have packages that read many file types, fastq being just one of them. -Benjamin- |
![]() |
![]() |
![]() |
#15 | |
Member
Location: Rockville, MD Join Date: Apr 2011
Posts: 23
|
![]() Quote:
POSTEDIT: The original post mentioed "reference assembly" - perhaps I've crossed some wires. I thinking "read mapping." For de novo assembly - CLCbio is also very fast and accurate. We routinely get down to the sub-100 contigs with single Ion318 or MiSeq PE runs for a 5MB genome. (N50 is on average around 190K) Last edited by jonathanjacobs; 07-09-2012 at 09:19 AM. |
|
![]() |
![]() |
![]() |
#16 |
Junior Member
Location: Baltimore Join Date: Jul 2012
Posts: 6
|
![]()
I am also curious about the CLC software....I'm not exactly sure in how to use it.
|
![]() |
![]() |
![]() |
#17 |
Junior Member
Location: Hartford, CT Join Date: Sep 2010
Posts: 5
|
![]()
Quick start for clcbio is to check their docs.
![]() Their support page is at: http://www.clcbio.com/index.php?id=615 It has links to FAQ, tutorials, screencasts, etc. Open up the app, see what it looks like, see how much is intuitive. There are many many more features than any one sequencing lab or bioinf group will use, so my usual way is to investigate the app, then check docs, then investigate, check docs, ask forum, investigate, repeat. This clc-specific topic (if continued) is probably better as new thread. -B-
__________________
Benjamin Jackson Laboratory for Genomic Medicine |
![]() |
![]() |
![]() |
#18 | |
Member
Location: Maryland Join Date: Aug 2011
Posts: 52
|
![]()
Thanks for your info and advice.
Quote:
|
|
![]() |
![]() |
![]() |
#19 |
Registered Vendor
Location: Madison, WI Join Date: Aug 2010
Posts: 48
|
![]()
Hi All,
DNASTAR software fully supports reference-guided assemblies (as well as de novo) for Ion Torrent data. If you check out the Ion Torrent page on our website, you can see videos of many of the Ion Torrent project types we handle (Ion AmpliSeq Cancer Panel, Paired-End Assembly with a Reference, etc.), as well as benchmarks and other resources. Also, feel free to download a fully-functional free trial of Lasergene Genomics Suite to try it out for yourself. If you have questions, just give us a call or send us an email. 866-511-5090 support@dnastar.com Thanks, Anne |
![]() |
![]() |
![]() |
#20 |
Member
Location: Milano, Italy Join Date: Dec 2008
Posts: 29
|
![]()
Hi - has anybody tried referenced or de novo assembly from the .sam (or .fastq) files and ion Torrent datasets ? I downloaded the following data sets from the ion Community:
B7-143 B7-295 C19-543 And I do have .bam (.sam) and .fastq files Velvet de novo assembly technically worked, but left me with many thousands of small contigs, so it is useless (*) Velvet All trials of referenced assembly with the Columbus extension to Velvet and Mosaik Assembler (using sorted sam files) and Mosaik Assembler (MosaikBuild) failed, apparently for serious inconsistencies on the data set or data format incompatibility: [0.000000] Reading FastA file NC_010473.C19-543.genome.fasta; [59.568244] 1 sequences found [59.568247] Done [59.568619] Reading SAM file C19-543.sorted.sam [355.404283] 6906611 reads found. [355.404285] Done [355.404286] Reference mapping counters [355.404287] Name Read mappings [355.404288] gi|170079663|ref|NC_010473.1| 19688393 [356.782277] Reading read set file C19-543/Sequences; [363.732809] 6906612 sequences found [363.733517] Read 1 of length 32794, longer than limit 32767 [363.733519] You should modify recompile with the LONGSEQUENCES option (cf. manual) (*) MosaikBuild MosaikBuild -q B7-295.fastq.gz -out B7-295.reads.dat -st 454 ------------------------------------------------------------------------------ MosaikBuild 2.1.73 2012-11-08 Michael Stromberg & Wan-Ping Lee Marth Lab, Boston College Biology Department ------------------------------------------------------------------------------ - setting read group ID to: ZKON5B26EGE - setting sample name to: unknown - setting sequencing technology to: 454 - parsing FASTQ file: reads: 2,281,414 \ERROR: The number of qualities (45) do not match the number of bases (385) in 9IKNG:01351:01857. Is everybody using newbler ? do you find the same problems of data inconsistency on the Ion Torrent fastq or bam/sam converted formats ? Keep in touch and thanks in advance ! Alessandro |
![]() |
![]() |
![]() |
Tags |
assembly, ion, reference, torrent |
Thread Tools | |
|
|