SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
Filtering PacBio reads ebioman Pacific Biosciences 30 04-20-2015 05:26 PM
Find large indels from PacBio reads? metheuse Pacific Biosciences 1 07-01-2013 03:08 PM
PacBio tutorials samanta General 1 06-28-2013 10:46 AM
Have a genome assembly, what should I do with 15x Pacbio reads? lemur2 Pacific Biosciences 2 10-25-2012 10:06 PM
Pacbio C2 chemistry dongyongdong Pacific Biosciences 3 11-15-2011 02:34 AM

Reply
 
Thread Tools
Old 02-04-2014, 01:02 PM   #1
jwag
Member
 
Location: USA

Join Date: Apr 2013
Posts: 42
Default Would PacBio RII reads be useful at 1X?

Hi All, I'm planning on sequencing a highly repetitive vertebrate genome (~1.6 GB genome, ~42% repetitive) using mostly Illumina reads (50X short insert and 50X 3kb insert, PE 150bpx2 reads). I was looking into PacBio reads-- it would cost about $4k to do the genome at 1x, assuming ~200MB of good reads per SMRTcell. I know that it's recommended to have 10-15X PacBio coverage for gap filling and scaffolding, but the cost is a bit too high for me to do that. So, would it even be worth it to get a PacBio run done with 1X coverage? Thanks for any thoughts,

jwag
jwag is offline   Reply With Quote
Old 02-04-2014, 02:43 PM   #2
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

First, you’ll get more for your money if you spread out that 50x 3kb insert to be more like 15x 3kb, 15x 6kb, 15x 10kb. Its more library preps, but at 50x you’re going to have a ton of duplicates in the 3kb library (unless you make 3-4 libraries anyway). You should aim to cover each mate pair library to 10-20x and having a distribution of sizes is preferable. Also, if you’re doing 150x2 reads, you should aim to having a lot of them overlapping. So make 30x coverage of say 250bp library. Then have 20-30x coverage of a 500-600bp library.

Personally, I wouldn’t bother with the PacBio at such a low coverage. The pipelines to integrate it into an illumina assembly aren’t great. You’d first want to correct it based on illumina data, then do gap filling. Which might help some, but scaffolding with it plus high coverage PE and mate pair illumina data isn’t well worked out yet.

Now for how to spend a little extra money, I’d do a first round of sequencing with just the purposed illumina stuff, then run a few draft assemblies to see what you need. If your contigs aren’t up to par, create another PE library and sequence another 30x (should be about $4K). If your scaffolds suck, make another couple mate pair libraries and get another 20x out of them. If you have a lot of gaps, look into 454 instead of pacbio. Gapfilling with 454 is just better supported than pacbio and most gaps in an illumina assembly are going to be <1000bp anyway.

Now, if everything is looking pretty good, consider finding someone to pay for a fosmid 60kbp library and sequence it to about 1x. Maybe you could try what’s been done here http://genome.cshlp.org/content/22/11/2241.long, or ask them to do it if you pay them to and offer authorship.

If you don’t want to go that route, are you doing any RNA-seq? A genome assembly is relatively worthless if you can’t accurately find genes and RNA-seq will help that. Plus, some RNA-seq may help to address a more interesting biological question than the genome alone. There is also RNA-seq scaffolding that can make substantial improvements in N50 if you’re comfortable with some of the issues it can create (i.e. having little idea of how big gaps are or if there is missing sequence in that gap somewhere else in your assembly).
Wallysb01 is offline   Reply With Quote
Old 02-04-2014, 05:48 PM   #3
jwag
Member
 
Location: USA

Join Date: Apr 2013
Posts: 42
Default

Hi Wallysb01, thanks for your reply. We are planning on using Allpaths LG to assemble our genome, which has specific library requirements. From what I can tell, there doesn't seem to be an assembler that can handle complex, repetitive vertebrate genomes as well as Allpaths LG. It requires at least 1 small-insert library (which for us will be 50X of ~270bp insert 150bpx2 PE Illumina) and 1 jumping mate-pair library at 50X (average 3kb insert but a larger standard deviation is OK). Unfortunately, Allpaths LG doesn't support 454 or IonTorrent reads. Though I'm just going off the manual for these requirements.

I think you are correct that getting more of a variety of insert sizes in mate-pair libraries might be a better route than PacBio. I was just thinking that, after kmer correction, high coverage of PacBio might not be necessary, and might help more accurately resolve gaps using the longer split reads.

We do have a bunch of RNAseq data-- but I'm still trying to figure out how useful it would be for aiding in assembly, since because of genomic introns, there might be a quite a bit of dependency between the transcriptome and genome.

Thanks for your insight. I'm getting the notion that PacBio technology isn't quite ready for large vertebrate genomes.
jwag is offline   Reply With Quote
Old 02-04-2014, 06:47 PM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,994
Default

Quote:
Originally Posted by jwag View Post
I'm getting the notion that PacBio technology isn't quite ready for large vertebrate genomes.
It is the only option available to sequence large genes as single molecules: http://www.nature.com/nbt/journal/v3.../nbt.2705.html

Ultimately the compromise you will end up making, in terms of choice of sequencing methods/libraries, will depend on how "finished" you want the genome to be.
GenoMax is offline   Reply With Quote
Old 02-04-2014, 08:30 PM   #5
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 512
Default

For a highly repetitive genome, where it is difficult to connect the islands of unique sequence, one option is a high-density genetic map. Markers in the map that align to your contigs would allow the contigs to be ordered along the chromosome without physically connecting them by assembly. See http://www.genetics.org/content/188/4/799.full.pdf
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 02-05-2014, 04:58 AM   #6
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

You may also want to look into optical mapping & related -- Nabsys has not yet launched their system, but OpGen and BioNano Genomics do have optical mapping solutions that you can access as a service (either from the company or a core lab/service provider). Optical maps may allow you to tie together your sequence scaffolds into much longer superscaffolds.

Or just wait; even by the end of this year it is a reasonable bet that PacBio efficiencies will be better & there are likely to be more technologies in play.

Ion Torrent really isn't useful for genome assembly, especially large ones. 454 has been pretty well blown away by PacBio & MiSeq -- price it out, but I'd be shocked if you can't get essentially the same data cheaper on those two platforms (using PacBio CCS as equivalent to 454).

Moleculo is another on-the-horizon option (you may be able to get it from Illumina as a service) which you should compare to 5kb mate pairs in terms of price -- a 5kb Moleculo library will give you much more information.

You should also ping BGI to see if they offer their LFR technology for non-human genomes; this is another long-range information technology that is promising.

There are some clever techniques recently published for using DNA packing in the eukaryotic nucleus to get information -- see http://www.ncbi.nlm.nih.gov/pubmed/24270850 and http://www.ncbi.nlm.nih.gov/pubmed/24185095
krobison is offline   Reply With Quote
Old 02-05-2014, 05:49 AM   #7
paa6
Member
 
Location: south korea

Join Date: Feb 2014
Posts: 68
Default

hey, even though I am new at this thing but I am also working on illumina reads assembly...my problem is that I got my files in fasta format... and I dont wether it's correct or not...

and I dont understand that why people and even u are talking about insert size?? why it is used??
I am using velvet and geneious software...and I guess all I need reads -> trim ends-> perform de novo assembly

please correct me if I am wrong??
paa6 is offline   Reply With Quote
Old 04-21-2014, 07:11 AM   #8
sarwar
Member
 
Location: delhi , india

Join Date: Apr 2010
Posts: 14
Default

I am planning to sequence 3GB mammalian genome. Currently i am having approach to generate sequence of 1X (3 GB) Pac bio with XL chemistry, Illumina 20 X normal PE, 10X with 5 kb insert size and 10X with 10 Kb insert size. Using 1X pacbio will help in assembly ?
However, It is wild relative of the available reference genome. suggest the most accurate and less expensive way to assemble denovo and then compare with reference.
sarwar is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:45 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO