Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jdilts
    Member
    • May 2012
    • 10

    Ion Torrent Reference Assembly

    What is the best reference assembler to use with ion torrent data? I can only seem to find information on Ion Torrent de novo assemblies, which is not what I'm looking for. Thanks in advance!
  • jdilts
    Member
    • May 2012
    • 10

    #2
    Is anyone using Newbler, DNASTAR, MIRA?

    Comment

    • hengnck
      Junior Member
      • Aug 2009
      • 9

      #3
      Hi - I'm working with a small bacterial genome ~2.0 Mbp but de novo (new species). Got data from an Ion 318 chip, about 480 Mbp. Ran it through Newbler 2.3 - 3,000+ contigs. Set up default MIRA assembly *six* days ago and it's still going. :-( I wouldn't use DNA* - ridiculously expensive for what it does. Roche RefMapper is OK for some of our other known bacterial genomes.

      Comment

      • RonanC
        Junior Member
        • Jul 2012
        • 8

        #4
        I just noticed that CLC bio are offering a free 6 month trial of their CLC genomics workbench to users with a benchtop NGS (i.e. 454 GS Jr, MiSeq or IonTorrent PGM). Anybody have any experience with the CLC software?

        Comment

        • IonTorrent
          Member
          • Jan 2010
          • 64

          #5
          Originally posted by hengnck View Post
          Hi - I'm working with a small bacterial genome ~2.0 Mbp but de novo (new species). Got data from an Ion 318 chip, about 480 Mbp. Ran it through Newbler 2.3 - 3,000+ contigs. Set up default MIRA assembly *six* days ago and it's still going. :-( I wouldn't use DNA* - ridiculously expensive for what it does. Roche RefMapper is OK for some of our other known bacterial genomes.
          Hi hengnck,

          Are you using all 480 Mbp of data in the assembly or are you downsampling? I ask because many software packages (like those you mention) will grossly underperform with excessive coverage, and are reported to work best in the 30X to 50X range (and if this is DNA from pure culture you're at ~240X). Are you a Torrent Suite user and if so are you using the MIRA plugin? The newest (v2.2) version allows you to specify the amount of coverage to use (best results are typically see at ~50X):



          Some have commented that they use Newbler at around 30X coverage for de novo assembly.

          Comment

          • kmcarr
            Senior Member
            • May 2008
            • 1181

            #6
            I concur with IT's comments about excessive coverage. You really need to scale back your input to ~30X. Also why Newbler 2.3? That is a very old version. Get version 2.6, they have made several improvements in the assembler.

            Comment

            • hengnck
              Junior Member
              • Aug 2009
              • 9

              #7
              Downsampling ion data

              Hi All,

              Thanks for your comments - you can all probably see that I'm more comfortable in the Sanger era. Unfortunately in my Faculty, I'm the "bioinformatics team".

              My default SOP is to use all the data - the more the merrier - but I can see now that I've got way too much data than required. How do I downsample 480 Mbp of essentially random reads down to 100-150 Mbp?

              All advice is greatly appreciated.

              Comment

              • jdilts
                Member
                • May 2012
                • 10

                #8
                What size do you want the reads? You could cut out all the smaller reads. (Maybe 50bp or less?) To do this you would need to write up a script of some sort. Either perl or python to get the length of each read and then output the reads that "qualify" into an outfile.

                Comment

                • hengnck
                  Junior Member
                  • Aug 2009
                  • 9

                  #9
                  Re: Downsampling

                  Originally posted by jdilts View Post
                  What size do you want the reads? You could cut out all the smaller reads. (Maybe 50bp or less?) To do this you would need to write up a script of some sort. Either perl or python to get the length of each read and then output the reads that "qualify" into an outfile.
                  @jdilts - OK, I understand - I need to go see a perl or python programmer. The max read length was 398 but average was 177. My collaborator who did the sequencing did not specify the type of kit used but they're all shotgun reads. I will have to play around with min and max reads.

                  Comment

                  • jdilts
                    Member
                    • May 2012
                    • 10

                    #10
                    Not too complicated

                    The script isn't too complicated. It would be something similar to this. I hope this can be of some assistance.
                    Code:
                    #/usr/bin/perl
                    
                    use strict;
                    use warnings;
                    
                    my $infile ="readFILE";
                    my $outfile = "quality_readsFILE";
                    
                    #opens file with reads
                    open (IN,<,$infile) || die $!;
                    my @reads = <IN>; #stores each line in the file into an arrary
                    close (IN); #don't need the file anymore, close it
                    
                    
                    open (OUT,>,$outfile) || die $!; #open the out going file
                    
                    my $j = 0; #array index
                    my $read_name;
                    
                    #iterate through array
                    foreach my $i (@reads){
                    	if ($j%2 = 1) && (length($i)>=75){
                    	print OUT "$read_name\n$i\n";}
                    	}
                    	else{ $read_name = $i;} #stores read name
                    }
                    close (OUT);

                    Comment

                    • hengnck
                      Junior Member
                      • Aug 2009
                      • 9

                      #11
                      Originally posted by jdilts View Post
                      The script isn't too complicated. It would be something similar to this. I hope this can be of some assistance.
                      Code:
                      #/usr/bin/perl
                      
                      use strict;
                      use warnings;
                      
                      my $infile ="readFILE";
                      my $outfile = "quality_readsFILE";
                      
                      #opens file with reads
                      open (IN,<,$infile) || die $!;
                      my @reads = <IN>; #stores each line in the file into an arrary
                      close (IN); #don't need the file anymore, close it
                      
                      
                      open (OUT,>,$outfile) || die $!; #open the out going file
                      
                      my $j = 0; #array index
                      my $read_name;
                      
                      #iterate through array
                      foreach my $i (@reads){
                      	if ($j%2 = 1) && (length($i)>=75){
                      	print OUT "$read_name\n$i\n";}
                      	}
                      	else{ $read_name = $i;} #stores read name
                      }
                      close (OUT);
                      Thanks, jdilts. Will try things out - as soon as I kill the MIRA assembly (7 days and counting).

                      Comment

                      • jdilts
                        Member
                        • May 2012
                        • 10

                        #12
                        in 7 days

                        In the future me know if you have any programming issues. I'd be glad to help you out.

                        Comment

                        • BenjaminL
                          Junior Member
                          • Sep 2010
                          • 5

                          #13
                          Length limiting is a Great idea jdilts.
                          One quick note on that quick perl script...

                          The concept for length checking is a good one, but this script fetches and measures each line as a read. If you are using a specific file type of the sequencer's reads the it will depend on the format.
                          e.g. fastq uses (at least) 4 lines for each section; including name, sequence, quality and one optional line. This is assuming that the sequence is all one line. A complete read may be longer than one section of the fastq as well.

                          To parse a specific file type (as opposed to one that has one line per read) then I recommend you either write a new function/method or use a prewritten library that does that. I know that bioperl and biopython have packages that read many file types, fastq being just one of them.

                          -Benjamin-
                          Benjamin
                          Jackson Laboratory for Genomic Medicine

                          Comment

                          • jonathanjacobs
                            Member
                            • Apr 2011
                            • 23

                            #14
                            Originally posted by RonanC View Post
                            I just noticed that CLC bio are offering a free 6 month trial of their CLC genomics workbench to users with a benchtop NGS (i.e. 454 GS Jr, MiSeq or IonTorrent PGM). Anybody have any experience with the CLC software?
                            @Ronan: We use GALAXY and CLCbio Genome Workbench in our shop and have a MiSeq and IonTorrent in house. we're mainly doing microbial and viral resequencing, but it seems as though de novo assembly keeps creeping in as well. In any case - CLCbio is --extremely-- fast (minutes) to do a 30x coverage of a 5MB genome for read mapping. The added benefit is that is also does true hybrid assembly of both PE and single read NGS data from both MiSeq and PGM at the same time. The quality/accuracy has also been as good, if not better, than some of the open source solutions we are running as well (with GALAXY). It's pricey (~$5K/license) - but the time savings in both setup and running is worth it. Don't get me wrong though - GALAXY is also very very good, but it took a while for me to get some of the tools we're using to install "right" to work with GALAXY.

                            POSTEDIT: The original post mentioed "reference assembly" - perhaps I've crossed some wires. I thinking "read mapping." For de novo assembly - CLCbio is also very fast and accurate. We routinely get down to the sub-100 contigs with single Ion318 or MiSeq PE runs for a 5MB genome. (N50 is on average around 190K)
                            Last edited by jonathanjacobs; 07-09-2012, 08:19 AM.
                            @bioinformer
                            http://www.linkedin.com/in/jonathanjacobs

                            Comment

                            • slm1816
                              Junior Member
                              • Jul 2012
                              • 6

                              #15
                              I am also curious about the CLC software....I'm not exactly sure in how to use it.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              24 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              30 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              23 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...