Seqanswers Leaderboard Ad

**krobison** · 02-14-2013, 10:03 AM

Would PBJelly work?

http://sourceforge.net/p/pb-jelly/wiki/Home/

"PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes. Each step in PBJelly’s workflow can be run on a cluster, thus parallelizing the gap filling process for rapid turn around, even for very large eukaryotic genomes."

**mchaisso** · 02-14-2013, 10:16 AM

Also: A hybrid approach for the automated finishing of bacterial genomes

http://www.nature.com/nbt/journal/v30/n7/full/nbt.2288.html

PBJelly likely does a better job at getting correct sequences in the gaps, but the hybrid assembler was designed to handle identification of tricky repeat regions and to not misassemble them. The utility may be limited to bacterial sized genomes though.

A snippet from the paper:
To produce the hybrid assembly, we first generated a consensus CDC contig set. Given the clonal nature of the CDC isolates (Supplementary Results), we split contigs from the minimal CDC assembly that were inconsistent with the remaining two isolates. If the split resulted in a subcontig of <1 kb in length, the subcontig was eliminated. We input the resulting 97 contigs in this set, along with 94,526 single-molecule reads from the PacBio RS with an average accuracy of 82.9% (Supplementary Fig. 2), into our hybrid assembly pipeline (Supplementary Fig. 3).

**AdrianP** · 02-14-2013, 11:58 AM

The way I understand it, is that pb-jelly closes gaps that already exist "NNNNN", but I am interested in building new contigs, or reducing their number.

**flxlex** · 02-15-2013, 04:38 AM

"AHA (A Hybrid Assembler) uses PacBio's exceptionally long reads to improve existing assemblies and fill in gaps." http://www.pacificbiosciences.com/pr...re/algorithms/. I guess part of smrtpipe software...

**AdrianP** · 02-16-2013, 09:10 AM

This is part of SMRT-pipe. Anyone have any idea how to download that package? Never doing pacbio again..... closed source...

**jbingham** · 02-17-2013, 01:51 PM

PacBio's software is all open source, BSD license. See pacbiodevnet.com for downloads and links to GitHub projects.

**AdrianP** · 02-17-2013, 02:15 PM

Originally posted by jbingham View Post

PacBio's software is all open source, BSD license. See pacbiodevnet.com for downloads and links to GitHub projects.

Seems you are right. I gave it a shot, and oh my god... why in the world are they doing this. I mean seriously? In order to use one of their tools I need to download a 1 GB file and go through extensive installation instructions outlined here:

404 Not Found

http://pacb.com/devnet/files/software/smrtanalysis/1.4/doc/SMRT%20Analysis%20Software%20Installation%20%28v1.4.0%29.pdf

?

Can anyone please tell me why don't they just have generic executable for some of their software that is part of that pipeline? Why do I have to spend a day installing this? This should be simpler. Sorry for my rant, I just don't get it.

I don't want all their fancy tools, I don't need to login via a web interface to see what's up... oh well...

**jbingham** · 02-17-2013, 02:32 PM

Maybe the Amazon image is what you need. Nothing to install, just boot up a VM.

Agree that it's a big download to get everything. The aligner and variant caller (blasr and quiver) are what you requested: separate installs from GitHub. See pacbiodevnet.com for links on the Compatible Software page.

**AdrianP** · 02-17-2013, 02:41 PM

Nah dude, it's that what I want:

AHA: a hybrid assembler to scaffold existing contigs and fill gaps. Available only in SMRT Analysis. Since v1.0

I want to link my contigs with long reads, that are sometimes even 1x in coverage.

**jbingham** · 02-17-2013, 02:43 PM

In that case, you will need either the Amazon VM or the full install. Sorry!

**AdrianP** · 02-17-2013, 02:44 PM

Originally posted by jbingham View Post

In that case, you will need either the Amazon VM or the full install. Sorry!

I will try the Amazon VM, thank you very much for your help!

**boetsie** · 02-18-2013, 03:47 AM

Originally posted by AdrianP View Post

I am surprised to not be able to find any scaffolders for pacbio data. I am looking for something that SSPACE does (surprised to see that SSPACE doesn't accept pacbio as input). I have a bunch of contigs generated by velvet, and now I want to link these contigs by using LONG reads, which are pacbio corrected for error.

We at BaseClear (developers of SSPACE) have developed a modified version of SSPACE which accepts PacBio long reads. The method gives very nice results, but at this moment we offer this only as an internal service since the algorithm itself is still in a testing phase. Official release might follow later this year, but has not been decided yet. If you are interested in BaseClear's assembly-service please write to [email protected]

Kind Regards,
Boetsie

**AdrianP** · 02-18-2013, 05:21 AM

Originally posted by boetsie View Post

We at BaseClear (developers of SSPACE) have developed a modified version of SSPACE which accepts PacBio long reads. The method gives very nice results, but at this moment we offer this only as an internal service since the algorithm itself is still in a testing phase. Official release might follow later this year, but has not been decided yet. If you are interested in BaseClear's assembly-service please write to [email protected]

Kind Regards,
Boetsie

I am aware of SSPACE and I used it on a different genome project that is still in process and I like how it works. I was surprised that it accepts matepairs as input data but not pacbio reads. pacbio reads are similar in the sense that they are long range information but as opposed to mate pairs they have a definite length and would not necessarily fill the gap with NNNN.

**boetsie** · 02-18-2013, 07:53 AM

Originally posted by AdrianP View Post

I am aware of SSPACE and I used it on a different genome project that is still in process and I like how it works. I was surprised that it accepts matepairs as input data but not pacbio reads. pacbio reads are similar in the sense that they are long range information but as opposed to mate pairs they have a definite length and would not necessarily fill the gap with NNNN.

Well, in general they are the same. But the type of data is rather different. I think you should be well aware of the fact that PacBio has a high error rate, which makes it difficult for the alignment process since it leads to false positive alignments. This can of course result into erroneous scaffolds.
In addition, since the alignment is based on the whole PacBio read, the pacbio read can contain multiple contigs on a single read, while the matepair spans at most two contigs. Because of this, the algorithm for SSPACE should be changed and that's why the addition of PacBio reads is not so simple as you think.

For now, you can ofcourse make 'fake' paired-reads of the pacbio reads and put these into SSPACE.

Regards,
Boetsie

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Pacbio scaffolding

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News