Seqanswers Leaderboard Ad

**GenoMax** · 03-14-2015, 05:05 AM

Do you have original data for the SMRTcell(s) you are trying to import?

Here is a list of minimal files you need (*.metadata.xml file and all *.bax.h5 and *.bas.h5): https://github.com/PacificBioscience...to-SMRT-Portal

BTW: See this if you are specifically looking for CeleraAssembler. http://seqanswers.com/forums/showthread.php?t=49846

**tonybert** · 03-14-2015, 03:01 PM

Yes, all the xml, .bas.h5, .bax.h5 files are all there. Its the short Illumina data that I'm trying to place, or figure out where to place within the context of the SMRT Portal. I can run HGAP on the smrt-cells, but we really want to do a hybrid assebly. I can navigate to the Manage and Import section, select managed protocols and select the RS_CeleraAssembler. I then copy this, and save as RS_CeleraAssemblerModified. In the Protocol Preset Details the FASTQ to Correct with is Empty. I have tried entering the name of the Fastq file that I have uploaded to /opt/smrtanalysis/common/references_dropbox, but it will not recognize this file. So, i am guessing that the path it is looking for is different than this. I'll look back at the git-hub post again. Thanks.

**GenoMax** · 03-14-2015, 07:39 PM

Here is a post from Dr. Hall who works at PacBio from the thread I linked above: http://seqanswers.com/forums/showpos...46&postcount=5

Hybrid assembly is no longer supported in SMRTportal v.2.3. You may have to install/use an older version of SMRTportal if you want to use hybrid assembly.

**rhall** · 03-16-2015, 08:07 AM

To add, the hybrid assembly support in SMRT Analysis was never great, I would not recommend going back to a previous version. Hybrid assembly as implemented in PBcR , ECTools, or dbg2olc is going to be much more straightforward and give better results.

**tonybert** · 03-18-2015, 12:25 PM

Greetings, I have been trying to run the celera assembler using STAR-cluster on amazon-EC2 on some PacBio long reads (1 smrt cell) and paired end Illumina data (100 bp PE). The genomes we are trying to assemble are ~1.6-2.0 Mbp. I am 99% sure I have installed the assembler correctly, as I was able to perform one of the example/tutorial assemblies of a small virus (A006, 35kbp mock genome). I have been able produce the .frg files for the Illumina data, and I have filtered_subreads.fastq from the sequencing center. However, when I run ./pacBioToCA I keep getting errors that I believe have something to do with SGE conditions (the full command/output is attached, command_line_output);

qsub: illegal -p value
qsub: illegal -c value ""

I was under the impression that these values would be defined in the pacbio.spec file (see attached), but they are not, and I am not sure how to modify these. I'm pretty new to the CeleraAssembler and running SGE jobs on STAR, so any comments/suggestions/hints are welcome. I

Attached Files

**gconcepcion** · 03-18-2015, 12:31 PM

Originally posted by tonybert View Post

qsub: illegal -p value
qsub: illegal -c value ""

Hi Tony,

I'm not familiar with STAR cluster, or how/which job schedulers it is configured for, however I can tell you that particular error message is because CeleraAssembler is configured to run on SGE, however for whatever reason, the binaries that are in your path are for PBS.

This is a confusing issue as both SGE && PBS have some similar binary names, which can result in the user being confused what job scheduler is actually installed.

**gconcepcion** · 03-18-2015, 12:35 PM

Originally posted by gconcepcion View Post

Hi Tony,

I'm not familiar with STAR cluster, or how/which job schedulers it is configured for, however I can tell you that particular error message is because CeleraAssembler is configured to run on SGE, however for whatever reason, the binaries that are in your path are for PBS.

This is a confusing issue as both SGE && PBS have some similar binary names, which can result in the user being confused what job scheduler is actually installed.

To illustrate what I mean, see this example:

### SGE is configured as default here (this should fail because I don't actually have your script on hand)
-bash-3.2$ qsub -A assembly -pe threads 4 -cwd -N "pBcR_asm" -j y -o /home/ubuntu/wgs-8.3rc1/Linux-amd64/bin//tempec_pacbio
Unable to read script file because of error: no input read from stdin

### When I add the PBS job scheduler binaries to my PATH (ahead of our SGE binaries) you see the message that you referenced:
-bash-3.2$ export PATH=/opt/pbs/bin:$PATH
-bash-3.2$ qsub -A assembly -pe threads 4 -cwd -N "pBcR_asm" -j y -o /home/ubuntu/wgs-8.3rc1/Linux-amd64/bin//tempec_pacbio
qsub: illegal -p value
qsub: illegal -c value ""
usage: qsub [-a date_time] [-A account_string] [-b secs]
[-c [ none | { enabled | periodic | shutdown |
depth=<int> | dir=<path> | interval=<minutes>}... ]
[-C directive_prefix] [-d path] [-D path]
[-e path] [-h] [-I] [-j oe] [-k {oe}] [-l resource_list] [-m n|{abe}]
[-M user_list] [-N jobname] [-o path] [-p priority] [-P proxy_user] [-q queue]
[-r y|n] [-S path] [-t number_to_submit] [-T type] [-u user_list] [-w] path
[-W additional_attributes] [-v variable_list] [-V ] [-x] [-X] [-z] [script]

**rhall** · 03-18-2015, 12:42 PM

Out of interest, why are you trying to use illumina reads for a hybrid assembly? For such a small genome 1 SMRT Cell of data should be more than enough to assembly the PacBio data denovo.

**tonybert** · 03-18-2015, 12:44 PM

Thanks for the prompt response. So its more of an environment issue?

**tonybert** · 03-18-2015, 12:51 PM

Hi rhall, we tried HGAP with the smrt cell for the 3 genomes. We were expecting to have one contiguous, single contig after HGAP assembly. This was not the case. I have to look back at my notes, but I believe we had between 6-9 contigs for per genome. Additionally, we were under the impression that error-correcting with Illumina would lead to higher quality, higher accuracy genome draft genomes. That said, my initial HGAP assembly was run using the SMRT portal (v 1.3 i believe) about a year or more ago, i didn't really put to much effort to tuning the assembly parameters.

**gconcepcion** · 03-18-2015, 12:55 PM

Originally posted by tonybert View Post

Thanks for the prompt response. So its more of an environment issue?

For the error message that you indicated, yes, it is an environment thing, and from what I can tell you have PBS configured in your environment as the job scheduler instead of SGE.
Celera Assembler (last time I checked: http://wgs-assembler.sourceforge.net...ndex.php/RunCA) is only able to run on SGE or LSF, so PBS would pose a problem.

I would like to echo rhall's sentiment though, that if your genome size is only 1.8-2.0 Mbp, why are you even bothering with Hybrid assembly? If you have 1 SMRTCell of data, you should have PLENTY of excess coverage to run HGAP_3 and get a MUCH better result than any hybrid illumina-pacbio strategy.

Assemble with pacbio and use the Illumina short reads to validate the assembly.

**rhall** · 03-18-2015, 12:58 PM

It is extremely unlikely that a hybrid assembly will give better results than all PacBio, particularly if the Pacbio assembly is not coverage limited (>40x per genome).
http://www.biomedcentral.com/content...-14-9-r101.pdf

**GenoMax** · 03-18-2015, 03:18 PM

I am in the "PacBio should be enough" camp but to be fair PacBio data tonybert has needs to be of good quality. The "sweet spot" for getting a good assembly appears to be library specific in our hands.

@tonybert: Can you post stats from a "RS_subreads" run for your SMRTcell?

**rhall** · 03-18-2015, 07:37 PM

I totally agree, the point I wanted to make was that if you do have plenty of coverage of PacBio (likely given the size of the genome), regardless of subread length, a hybrid approach will not improve the assembly as it does not add any long range information. It is the long range information that helps complete assemblies. A hybrid approach only helps when you have limited coverage.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Importing Illumina data onto SMRT portal

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News