Seqanswers Leaderboard Ad

**rhall** · 08-02-2013, 03:52 PM

and.. The *.metadata.xml is a requirement if the data is to be used in SMRT Portal.

**sagarutturkar** · 08-07-2013, 11:22 AM

Hi rhall,

Thanks for your reply. I am able to import the data into SMRTportal.

The RS || platform data looks excellent with average read_length of 6KB and longest reads of 25KB. I was able to assemble genome in single contig with HGAP method. This is amazing.

I also tried "RS_modification and motif analysis" protocol which generated consensus of 33 contigs. Does this protocol breaks the assembly at low confidence bases? Is it necessary to do the base-modification or I should go ahead with 1 contig assembly as high-confidence one?

Any suggestions about downstream analysis or any SMRTportal tools which will help me?

**rhall** · 08-07-2013, 11:44 AM

Great!
The output from SMRT Portal HGAP goes through correction using the quiver algorithm, with >40x coverage you can expect a base QV of > 50. For the coverage vs QV plot see:

SMRT Sequencing Overview

https://speakerdeck.com/pacbio/smrt-sequencing-overview

My normal workflow, given a single contig.
1. Check the degenerate file from the Celera Assembler stage (downloadable from the data section of SMRT Portal). It may contain a few to no sequences, blasting them is the quickest way to see if they are junk, or meaningful.. The reason I check this file is that Celera Assembler assumes even coverage, so any high copy number elements (in particular plasmids) will end up in this file not the final assembly.
2. Check the single contig for circularity using a dot plot:

Gepard 1.40 – Calculation of Dotplots even for Large Sequences – My Biosoftware – Bioinformatics Softwares Blog

http://www.mybiosoftware.com/sequence-analysis/6750

3. If the contig is circular, then circularize it. I found the easiest way to do this is to take the polished contig, arbitrarily introduce a break in the middle, then use the two contigs as input to minimus2:

AMOS

http://sourceforge.net/apps/mediawiki/amos/index.php?title=Minimus2

Download AMOS for free. AMOS is a collection of tools for genome assembly. AMOS is a collection of tools and class interfaces for the assembly of DNA reads. The package includes a robust infrastructure, modular assembly pipelines, and tools for overlapping, consensus generation, contigging, and assembly manipulation.

which should then output a single circular contig by overlapping the start of one contig with the end of the other.
4. Run the circularized contig through RAST, BaSys, or using the original data and the contig as a reference run a Base Modification analysis in SMRT Portal.

CIQAP

http://basys.ca/

RAST Server - RAST Annotation Server

http://rast.nmpdr.org/

Base Modification Overview

https://speakerdeck.com/pacbio/base-modification-overview

5. Publish.

**rhall** · 08-07-2013, 11:50 AM

The base modification pipeline will not break contigs. The number of contigs output should be the same as in the reference. If you use the reference that is one contig, by first downloading it from the HGAP job, then uploading it as a reference you should only get results for one contig.

**sagarutturkar** · 08-26-2013, 05:42 AM

HGAP Parameters

Hi rhall,

Thanks for your help. I am working on this single contig assembly and able to run base_modification correctly. I was trying to assemble another small genome with pacbio RS || data with 135x coverage. Through default HGAP method I got 3 contigs. What other HGAP parameters can be tweaked to get better (possibly 1 contig) assembly for these genome?

Also, what are the changes in the SMRTanalysis 2.0.1 version? Are there any changes that might improve the HGAP results from version 2.0.0?

Thanks
Sagar

**rhall** · 08-26-2013, 07:09 AM

Given that you are getting only 3 contigs, tweaking HGAP parameters will likely not improve the assembly. I would try to understand why / what the 3 contigs are. Are the 3 contigs all significant length? or does one contig make up most of the expected genome length? Looking at Dot plots of the contigs against one another will indicated if the contigs are being broken due to un-resolvable repeat structure / ambiguous overlap, in which case the only way to join the contigs will likely be producing longer reads / library. Also looking at the actual alignment of the raw reads to the contigs can indicate if the contig ends have broken due to repeats, indicated by mapQV=0 (red in SMRT View) reads mapped to the end of contigs.
Also Celera Assembler can often be quite conservative in overlapping. If the 3 contigs do actually unambiguously overlap, minimus2 can be used to complete the assembly.

**rhall** · 08-26-2013, 07:12 AM

2.0.1 will likely not improve HGAP results over 2.0.0 for old data, but recommended if you are going to process data from P4 enzyme.

**sagarutturkar** · 08-26-2013, 07:24 AM

The genome size is 4.3 MB while longest contig is 3.5 MB. smallest one is 12 KB. We were able to get this with default parameters while our external data_providers were able use same data and get down to 1 contig with HGAP method. I am waiting for their reply about how they tweak the parameters. However, I though meanwhile I could try some tweaking which could help me to improve the assembly. May be updating the mapping parameters would help?

I am trying to get update to v 2.0.1 with system admins.

Many Thanks
Sagar

**rhall** · 08-26-2013, 07:39 AM

If your PreAssembled yield is really good, and you are using an automatically calculated seed length, using a slightly longer seed, provided that the preassembled #bases is still ~15x may give better results. Also toggling Allow Partial Alignments will have a significant effect, which way depends on data. If the PreAssembled read yield is low, decreasing the RQ filter to 0.75 may improve results. The blasr mapping parameters will likely not have much of an impact in going from 3 contigs to 1.

**tfahland** · 08-30-2013, 01:10 PM

Can I ask a quick tad unrelated. I have an older linux machine running redhat version 5.3. The smartanalysis 2.01 for centos version 5.3 should work ok right?

**rhall** · 08-30-2013, 01:23 PM

It it definitely worth trying, but I'm not an expert on the subtle differences between centos and redhat.

**GenoMax** · 08-30-2013, 01:56 PM

Originally posted by tfahland View Post

Can I ask a quick tad unrelated. I have an older linux machine running redhat version 5.3. The smartanalysis 2.01 for centos version 5.3 should work ok right?

It does work. We are running 2.0.1 on Rocks v.5.3 and 5.4 clusters. There is a separate installer for CentOS 5.3 which you should use.

**sagarutturkar** · 09-05-2013, 07:23 AM

Version 2.0.1 generated improved assembly

Hi rhall,

Thanks for your reply. After update to SMRTanalysis version 2.0.1, I was able to use default HGAP method to assemble genome as single contig. Same genome was assembled to 3 contigs in SMRTanalysis version 2.0. There is definitely some improvement in HGAP algorithm. I have seen improvement in asembly stats for some other genomes too. I recommend update to version 2.0.1 to all users.

Initially, we tried to install SMRTanalysis on redhat system. But got lot of errors related to python and never able to get this running on redhat. I suggest installing it on recommended version rather than spending time for troubleshooting later.

Thanks

**zhoufan** · 10-17-2013, 05:45 PM

Hi everyone,
I am trying to install SMRT v2.0.1 on ubuntu 10.04 i386 server.
when Perform fresh installation with the
"./configure_smrtanalysis.sh"command, an error returned.The error message is :
"./configure_smrtanalysis.sh:line 29:/smrtanalysis-2.0.1/redist/python2.7/bin/python:cannot execute binary file".

can anyone help me out there? thanks in advance!

**rhall** · 10-18-2013, 10:24 AM

I believe SMRT Analysis requires a 64bit OS.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News