SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to use bambus for scaffolding? elisadouzi Bioinformatics 3 05-31-2013 03:07 PM
Scaffolding suggestion? qqsmallfrog Bioinformatics 13 05-29-2013 08:50 PM
SOAPdenovo Scaffolding kaboroevich De novo discovery 0 09-04-2012 08:44 PM
Scaffolding problem Autotroph Bioinformatics 16 06-28-2011 08:25 PM
Scaffolding tool glacerda Bioinformatics 0 08-04-2010 04:54 PM

Reply
 
Thread Tools
Old 02-14-2013, 06:32 AM   #1
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default Pacbio scaffolding

I am surprised to not be able to find any scaffolders for pacbio data. I am looking for something that SSPACE does (surprised to see that SSPACE doesn't accept pacbio as input). I have a bunch of contigs generated by velvet, and now I want to link these contigs by using LONG reads, which are pacbio corrected for error.

Honestly, I tried reading how bambus works, they say that it accepts any input from any assembler but they have made it so complicated......
AdrianP is offline   Reply With Quote
Old 02-14-2013, 10:03 AM   #2
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Would PBJelly work?

http://sourceforge.net/p/pb-jelly/wiki/Home/

"PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes. Each step in PBJelly’s workflow can be run on a cluster, thus parallelizing the gap filling process for rapid turn around, even for very large eukaryotic genomes."
krobison is offline   Reply With Quote
Old 02-14-2013, 10:16 AM   #3
mchaisso
Member
 
Location: Seattle, WA

Join Date: Apr 2008
Posts: 84
Default

Also: A hybrid approach for the automated finishing of bacterial genomes
http://www.nature.com/nbt/journal/v3.../nbt.2288.html

PBJelly likely does a better job at getting correct sequences in the gaps, but the hybrid assembler was designed to handle identification of tricky repeat regions and to not misassemble them. The utility may be limited to bacterial sized genomes though.

A snippet from the paper:
To produce the hybrid assembly, we first generated a consensus CDC contig set. Given the clonal nature of the CDC isolates (Supplementary Results), we split contigs from the minimal CDC assembly that were inconsistent with the remaining two isolates. If the split resulted in a subcontig of <1 kb in length, the subcontig was eliminated. We input the resulting 97 contigs in this set, along with 94,526 single-molecule reads from the PacBio RS with an average accuracy of 82.9% (Supplementary Fig. 2), into our hybrid assembly pipeline (Supplementary Fig. 3).
mchaisso is offline   Reply With Quote
Old 02-14-2013, 11:58 AM   #4
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default

The way I understand it, is that pb-jelly closes gaps that already exist "NNNNN", but I am interested in building new contigs, or reducing their number.
AdrianP is offline   Reply With Quote
Old 02-15-2013, 04:38 AM   #5
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

"AHA (A Hybrid Assembler) uses PacBio's exceptionally long reads to improve existing assemblies and fill in gaps." http://www.pacificbiosciences.com/pr...re/algorithms/. I guess part of smrtpipe software...
flxlex is offline   Reply With Quote
Old 02-16-2013, 09:10 AM   #6
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default

This is part of SMRT-pipe. Anyone have any idea how to download that package? Never doing pacbio again..... closed source...
AdrianP is offline   Reply With Quote
Old 02-17-2013, 01:51 PM   #7
jbingham
Member
 
Location: Silicon Valley

Join Date: Jul 2011
Posts: 24
Default

PacBio's software is all open source, BSD license. See pacbiodevnet.com for downloads and links to GitHub projects.
jbingham is offline   Reply With Quote
Old 02-17-2013, 02:15 PM   #8
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default

Quote:
Originally Posted by jbingham View Post
PacBio's software is all open source, BSD license. See pacbiodevnet.com for downloads and links to GitHub projects.
Seems you are right. I gave it a shot, and oh my god... why in the world are they doing this. I mean seriously? In order to use one of their tools I need to download a 1 GB file and go through extensive installation instructions outlined here:

http://pacb.com/devnet/files/softwar...8v1.4.0%29.pdf

?

Can anyone please tell me why don't they just have generic executable for some of their software that is part of that pipeline? Why do I have to spend a day installing this? This should be simpler. Sorry for my rant, I just don't get it.

I don't want all their fancy tools, I don't need to login via a web interface to see what's up... oh well...
AdrianP is offline   Reply With Quote
Old 02-17-2013, 02:32 PM   #9
jbingham
Member
 
Location: Silicon Valley

Join Date: Jul 2011
Posts: 24
Default

Maybe the Amazon image is what you need. Nothing to install, just boot up a VM.

Agree that it's a big download to get everything. The aligner and variant caller (blasr and quiver) are what you requested: separate installs from GitHub. See pacbiodevnet.com for links on the Compatible Software page.
jbingham is offline   Reply With Quote
Old 02-17-2013, 02:41 PM   #10
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default

Nah dude, it's that what I want:

AHA: a hybrid assembler to scaffold existing contigs and fill gaps. Available only in SMRT Analysis. Since v1.0

I want to link my contigs with long reads, that are sometimes even 1x in coverage.
AdrianP is offline   Reply With Quote
Old 02-17-2013, 02:43 PM   #11
jbingham
Member
 
Location: Silicon Valley

Join Date: Jul 2011
Posts: 24
Default

In that case, you will need either the Amazon VM or the full install. Sorry!
jbingham is offline   Reply With Quote
Old 02-17-2013, 02:44 PM   #12
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default

Quote:
Originally Posted by jbingham View Post
In that case, you will need either the Amazon VM or the full install. Sorry!
I will try the Amazon VM, thank you very much for your help!
AdrianP is offline   Reply With Quote
Old 02-18-2013, 03:47 AM   #13
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Quote:
Originally Posted by AdrianP View Post
I am surprised to not be able to find any scaffolders for pacbio data. I am looking for something that SSPACE does (surprised to see that SSPACE doesn't accept pacbio as input). I have a bunch of contigs generated by velvet, and now I want to link these contigs by using LONG reads, which are pacbio corrected for error.
We at BaseClear (developers of SSPACE) have developed a modified version of SSPACE which accepts PacBio long reads. The method gives very nice results, but at this moment we offer this only as an internal service since the algorithm itself is still in a testing phase. Official release might follow later this year, but has not been decided yet. If you are interested in BaseClear's assembly-service please write to info@baseclear.com

Kind Regards,
Boetsie
boetsie is offline   Reply With Quote
Old 02-18-2013, 05:21 AM   #14
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default

Quote:
Originally Posted by boetsie View Post
We at BaseClear (developers of SSPACE) have developed a modified version of SSPACE which accepts PacBio long reads. The method gives very nice results, but at this moment we offer this only as an internal service since the algorithm itself is still in a testing phase. Official release might follow later this year, but has not been decided yet. If you are interested in BaseClear's assembly-service please write to info@baseclear.com

Kind Regards,
Boetsie
I am aware of SSPACE and I used it on a different genome project that is still in process and I like how it works. I was surprised that it accepts matepairs as input data but not pacbio reads. pacbio reads are similar in the sense that they are long range information but as opposed to mate pairs they have a definite length and would not necessarily fill the gap with NNNN.
AdrianP is offline   Reply With Quote
Old 02-18-2013, 07:53 AM   #15
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Quote:
Originally Posted by AdrianP View Post
I am aware of SSPACE and I used it on a different genome project that is still in process and I like how it works. I was surprised that it accepts matepairs as input data but not pacbio reads. pacbio reads are similar in the sense that they are long range information but as opposed to mate pairs they have a definite length and would not necessarily fill the gap with NNNN.
Well, in general they are the same. But the type of data is rather different. I think you should be well aware of the fact that PacBio has a high error rate, which makes it difficult for the alignment process since it leads to false positive alignments. This can of course result into erroneous scaffolds.
In addition, since the alignment is based on the whole PacBio read, the pacbio read can contain multiple contigs on a single read, while the matepair spans at most two contigs. Because of this, the algorithm for SSPACE should be changed and that's why the addition of PacBio reads is not so simple as you think.

For now, you can ofcourse make 'fake' paired-reads of the pacbio reads and put these into SSPACE.

Regards,
Boetsie
boetsie is offline   Reply With Quote
Old 02-18-2013, 07:55 AM   #16
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default

The pacbio that I have are filtered through a pipeline with illumina reads. Most pacbio reads were "junked", but the rest were corrected to be HQ reads, so it should be much better in error rate and so on.
AdrianP is offline   Reply With Quote
Old 02-19-2013, 01:45 AM   #17
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

At the moment you can simply align the pacbio reads to the contigs with a tool like MUMmer, and either make scaffolds yourself or feed pairing information to SSPACE or Bambus.

Quote:
Originally Posted by AdrianP View Post
The pacbio that I have are filtered through a pipeline with illumina reads. Most pacbio reads were "junked", but the rest were corrected to be HQ reads, so it should be much better in error rate and so on.
boetsie is offline   Reply With Quote
Old 03-25-2015, 04:26 AM   #18
manjari.deshmukh
Member
 
Location: India

Join Date: Mar 2015
Posts: 11
Default

Hi all,
Need help. when doing pacbio assembly with SMRT 2.3.0 portal with 10 SMRT cells using HGAP got 245 contigs which is very high. I want to know how to reduce this number to 1 or 2.
manjari.deshmukh is offline   Reply With Quote
Old 03-27-2015, 02:02 PM   #19
gconcepcion
Member
 
Location: Menlo Park

Join Date: Dec 2010
Posts: 68
Default

Quote:
Originally Posted by manjari.deshmukh View Post
Hi all,
Need help. when doing pacbio assembly with SMRT 2.3.0 portal with 10 SMRT cells using HGAP got 245 contigs which is very high. I want to know how to reduce this number to 1 or 2.
With the the amount of information that you've provided so far, the best help I can give you is that, "You need to tweak some parameters."

If you would like help with an assembly, you would be better off posting a new thread(rather than continue a 2 year old stale thread) with much more information about what you've tried so far.

What is the organism?
What is the expected genome size?
Is it diploid or haploid?
Approximately how much coverage of the genome did 10 SMRTCells get you?
How was the library prepared?
What sequencing chemistry did you use?
Which protocols have you tried running so far?
With what parameters?
gconcepcion is offline   Reply With Quote
Old 03-29-2015, 09:11 PM   #20
manjari.deshmukh
Member
 
Location: India

Join Date: Mar 2015
Posts: 11
Default

Hi,
Yes, i should haave posted this in a new thread.
Anyways, i am working on Bacteria whose genome size is approx 6MB. The coverage provided by 10 SMRT cells is 64X. I am using HGAP 3 with mainly default parameters. only changing Genome size and fiddling with subread length.


Thanks and regards,

Manjari
manjari.deshmukh is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:18 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO