and.. The *.metadata.xml is a requirement if the data is to be used in SMRT Portal.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi rhall,
Thanks for your reply. I am able to import the data into SMRTportal.
The RS || platform data looks excellent with average read_length of 6KB and longest reads of 25KB. I was able to assemble genome in single contig with HGAP method. This is amazing.
I also tried "RS_modification and motif analysis" protocol which generated consensus of 33 contigs. Does this protocol breaks the assembly at low confidence bases? Is it necessary to do the base-modification or I should go ahead with 1 contig assembly as high-confidence one?
Any suggestions about downstream analysis or any SMRTportal tools which will help me?Last edited by sagarutturkar; 08-07-2013, 11:36 AM.
Comment
-
Great!
The output from SMRT Portal HGAP goes through correction using the quiver algorithm, with >40x coverage you can expect a base QV of > 50. For the coverage vs QV plot see:
My normal workflow, given a single contig.
1. Check the degenerate file from the Celera Assembler stage (downloadable from the data section of SMRT Portal). It may contain a few to no sequences, blasting them is the quickest way to see if they are junk, or meaningful.. The reason I check this file is that Celera Assembler assumes even coverage, so any high copy number elements (in particular plasmids) will end up in this file not the final assembly.
2. Check the single contig for circularity using a dot plot:
3. If the contig is circular, then circularize it. I found the easiest way to do this is to take the polished contig, arbitrarily introduce a break in the middle, then use the two contigs as input to minimus2:
Download AMOS for free. AMOS is a collection of tools for genome assembly. AMOS is a collection of tools and class interfaces for the assembly of DNA reads. The package includes a robust infrastructure, modular assembly pipelines, and tools for overlapping, consensus generation, contigging, and assembly manipulation.
which should then output a single circular contig by overlapping the start of one contig with the end of the other.
4. Run the circularized contig through RAST, BaSys, or using the original data and the contig as a reference run a Base Modification analysis in SMRT Portal.
5. Publish.
Comment
-
The base modification pipeline will not break contigs. The number of contigs output should be the same as in the reference. If you use the reference that is one contig, by first downloading it from the HGAP job, then uploading it as a reference you should only get results for one contig.
Comment
-
HGAP Parameters
Hi rhall,
Thanks for your help. I am working on this single contig assembly and able to run base_modification correctly. I was trying to assemble another small genome with pacbio RS || data with 135x coverage. Through default HGAP method I got 3 contigs. What other HGAP parameters can be tweaked to get better (possibly 1 contig) assembly for these genome?
Also, what are the changes in the SMRTanalysis 2.0.1 version? Are there any changes that might improve the HGAP results from version 2.0.0?
Thanks
Sagar
Comment
-
Given that you are getting only 3 contigs, tweaking HGAP parameters will likely not improve the assembly. I would try to understand why / what the 3 contigs are. Are the 3 contigs all significant length? or does one contig make up most of the expected genome length? Looking at Dot plots of the contigs against one another will indicated if the contigs are being broken due to un-resolvable repeat structure / ambiguous overlap, in which case the only way to join the contigs will likely be producing longer reads / library. Also looking at the actual alignment of the raw reads to the contigs can indicate if the contig ends have broken due to repeats, indicated by mapQV=0 (red in SMRT View) reads mapped to the end of contigs.
Also Celera Assembler can often be quite conservative in overlapping. If the 3 contigs do actually unambiguously overlap, minimus2 can be used to complete the assembly.
Comment
-
The genome size is 4.3 MB while longest contig is 3.5 MB. smallest one is 12 KB. We were able to get this with default parameters while our external data_providers were able use same data and get down to 1 contig with HGAP method. I am waiting for their reply about how they tweak the parameters. However, I though meanwhile I could try some tweaking which could help me to improve the assembly. May be updating the mapping parameters would help?
I am trying to get update to v 2.0.1 with system admins.
Many Thanks
Sagar
Comment
-
If your PreAssembled yield is really good, and you are using an automatically calculated seed length, using a slightly longer seed, provided that the preassembled #bases is still ~15x may give better results. Also toggling Allow Partial Alignments will have a significant effect, which way depends on data. If the PreAssembled read yield is low, decreasing the RQ filter to 0.75 may improve results. The blasr mapping parameters will likely not have much of an impact in going from 3 contigs to 1.
Comment
-
Originally posted by tfahland View PostCan I ask a quick tad unrelated. I have an older linux machine running redhat version 5.3. The smartanalysis 2.01 for centos version 5.3 should work ok right?
Comment
-
Version 2.0.1 generated improved assembly
Hi rhall,
Thanks for your reply. After update to SMRTanalysis version 2.0.1, I was able to use default HGAP method to assemble genome as single contig. Same genome was assembled to 3 contigs in SMRTanalysis version 2.0. There is definitely some improvement in HGAP algorithm. I have seen improvement in asembly stats for some other genomes too. I recommend update to version 2.0.1 to all users.
Initially, we tried to install SMRTanalysis on redhat system. But got lot of errors related to python and never able to get this running on redhat. I suggest installing it on recommended version rather than spending time for troubleshooting later.
Thanks
Comment
-
Hi everyone,
I am trying to install SMRT v2.0.1 on ubuntu 10.04 i386 server.
when Perform fresh installation with the
"./configure_smrtanalysis.sh"command, an error returned.The error message is :
"./configure_smrtanalysis.sh:line 29:/smrtanalysis-2.0.1/redist/python2.7/bin/python:cannot execute binary file".
can anyone help me out there? thanks in advance!
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
57 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
51 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment