Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Would PacBio RII reads be useful at 1X?

    Hi All, I'm planning on sequencing a highly repetitive vertebrate genome (~1.6 GB genome, ~42% repetitive) using mostly Illumina reads (50X short insert and 50X 3kb insert, PE 150bpx2 reads). I was looking into PacBio reads-- it would cost about $4k to do the genome at 1x, assuming ~200MB of good reads per SMRTcell. I know that it's recommended to have 10-15X PacBio coverage for gap filling and scaffolding, but the cost is a bit too high for me to do that. So, would it even be worth it to get a PacBio run done with 1X coverage? Thanks for any thoughts,

    jwag

  • #2
    First, you’ll get more for your money if you spread out that 50x 3kb insert to be more like 15x 3kb, 15x 6kb, 15x 10kb. Its more library preps, but at 50x you’re going to have a ton of duplicates in the 3kb library (unless you make 3-4 libraries anyway). You should aim to cover each mate pair library to 10-20x and having a distribution of sizes is preferable. Also, if you’re doing 150x2 reads, you should aim to having a lot of them overlapping. So make 30x coverage of say 250bp library. Then have 20-30x coverage of a 500-600bp library.

    Personally, I wouldn’t bother with the PacBio at such a low coverage. The pipelines to integrate it into an illumina assembly aren’t great. You’d first want to correct it based on illumina data, then do gap filling. Which might help some, but scaffolding with it plus high coverage PE and mate pair illumina data isn’t well worked out yet.

    Now for how to spend a little extra money, I’d do a first round of sequencing with just the purposed illumina stuff, then run a few draft assemblies to see what you need. If your contigs aren’t up to par, create another PE library and sequence another 30x (should be about $4K). If your scaffolds suck, make another couple mate pair libraries and get another 20x out of them. If you have a lot of gaps, look into 454 instead of pacbio. Gapfilling with 454 is just better supported than pacbio and most gaps in an illumina assembly are going to be <1000bp anyway.

    Now, if everything is looking pretty good, consider finding someone to pay for a fosmid 60kbp library and sequence it to about 1x. Maybe you could try what’s been done here http://genome.cshlp.org/content/22/11/2241.long, or ask them to do it if you pay them to and offer authorship.

    If you don’t want to go that route, are you doing any RNA-seq? A genome assembly is relatively worthless if you can’t accurately find genes and RNA-seq will help that. Plus, some RNA-seq may help to address a more interesting biological question than the genome alone. There is also RNA-seq scaffolding that can make substantial improvements in N50 if you’re comfortable with some of the issues it can create (i.e. having little idea of how big gaps are or if there is missing sequence in that gap somewhere else in your assembly).

    Comment


    • #3
      Hi Wallysb01, thanks for your reply. We are planning on using Allpaths LG to assemble our genome, which has specific library requirements. From what I can tell, there doesn't seem to be an assembler that can handle complex, repetitive vertebrate genomes as well as Allpaths LG. It requires at least 1 small-insert library (which for us will be 50X of ~270bp insert 150bpx2 PE Illumina) and 1 jumping mate-pair library at 50X (average 3kb insert but a larger standard deviation is OK). Unfortunately, Allpaths LG doesn't support 454 or IonTorrent reads. Though I'm just going off the manual for these requirements.

      I think you are correct that getting more of a variety of insert sizes in mate-pair libraries might be a better route than PacBio. I was just thinking that, after kmer correction, high coverage of PacBio might not be necessary, and might help more accurately resolve gaps using the longer split reads.

      We do have a bunch of RNAseq data-- but I'm still trying to figure out how useful it would be for aiding in assembly, since because of genomic introns, there might be a quite a bit of dependency between the transcriptome and genome.

      Thanks for your insight. I'm getting the notion that PacBio technology isn't quite ready for large vertebrate genomes.

      Comment


      • #4
        Originally posted by jwag View Post
        I'm getting the notion that PacBio technology isn't quite ready for large vertebrate genomes.
        It is the only option available to sequence large genes as single molecules: http://www.nature.com/nbt/journal/v3.../nbt.2705.html

        Ultimately the compromise you will end up making, in terms of choice of sequencing methods/libraries, will depend on how "finished" you want the genome to be.

        Comment


        • #5
          For a highly repetitive genome, where it is difficult to connect the islands of unique sequence, one option is a high-density genetic map. Markers in the map that align to your contigs would allow the contigs to be ordered along the chromosome without physically connecting them by assembly. See http://www.genetics.org/content/188/4/799.full.pdf
          Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

          Comment


          • #6
            You may also want to look into optical mapping & related -- Nabsys has not yet launched their system, but OpGen and BioNano Genomics do have optical mapping solutions that you can access as a service (either from the company or a core lab/service provider). Optical maps may allow you to tie together your sequence scaffolds into much longer superscaffolds.

            Or just wait; even by the end of this year it is a reasonable bet that PacBio efficiencies will be better & there are likely to be more technologies in play.

            Ion Torrent really isn't useful for genome assembly, especially large ones. 454 has been pretty well blown away by PacBio & MiSeq -- price it out, but I'd be shocked if you can't get essentially the same data cheaper on those two platforms (using PacBio CCS as equivalent to 454).

            Moleculo is another on-the-horizon option (you may be able to get it from Illumina as a service) which you should compare to 5kb mate pairs in terms of price -- a 5kb Moleculo library will give you much more information.

            You should also ping BGI to see if they offer their LFR technology for non-human genomes; this is another long-range information technology that is promising.

            There are some clever techniques recently published for using DNA packing in the eukaryotic nucleus to get information -- see http://www.ncbi.nlm.nih.gov/pubmed/24270850 and http://www.ncbi.nlm.nih.gov/pubmed/24185095

            Comment


            • #7
              hey, even though I am new at this thing but I am also working on illumina reads assembly...my problem is that I got my files in fasta format... and I dont wether it's correct or not...

              and I dont understand that why people and even u are talking about insert size?? why it is used??
              I am using velvet and geneious software...and I guess all I need reads -> trim ends-> perform de novo assembly

              please correct me if I am wrong??

              Comment


              • #8
                I am planning to sequence 3GB mammalian genome. Currently i am having approach to generate sequence of 1X (3 GB) Pac bio with XL chemistry, Illumina 20 X normal PE, 10X with 5 kb insert size and 10X with 10 Kb insert size. Using 1X pacbio will help in assembly ?
                However, It is wild relative of the available reference genome. suggest the most accurate and less expensive way to assemble denovo and then compare with reference.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-27-2024, 06:37 PM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-27-2024, 06:07 PM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                69 views
                0 likes
                Last Post seqadmin  
                Working...
                X