Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    and.. The *.metadata.xml is a requirement if the data is to be used in SMRT Portal.

    Comment


    • #32
      Hi rhall,

      Thanks for your reply. I am able to import the data into SMRTportal.

      The RS || platform data looks excellent with average read_length of 6KB and longest reads of 25KB. I was able to assemble genome in single contig with HGAP method. This is amazing.

      I also tried "RS_modification and motif analysis" protocol which generated consensus of 33 contigs. Does this protocol breaks the assembly at low confidence bases? Is it necessary to do the base-modification or I should go ahead with 1 contig assembly as high-confidence one?

      Any suggestions about downstream analysis or any SMRTportal tools which will help me?
      Last edited by sagarutturkar; 08-07-2013, 11:36 AM.

      Comment


      • #33
        Great!
        The output from SMRT Portal HGAP goes through correction using the quiver algorithm, with >40x coverage you can expect a base QV of > 50. For the coverage vs QV plot see:


        My normal workflow, given a single contig.
        1. Check the degenerate file from the Celera Assembler stage (downloadable from the data section of SMRT Portal). It may contain a few to no sequences, blasting them is the quickest way to see if they are junk, or meaningful.. The reason I check this file is that Celera Assembler assumes even coverage, so any high copy number elements (in particular plasmids) will end up in this file not the final assembly.
        2. Check the single contig for circularity using a dot plot:

        3. If the contig is circular, then circularize it. I found the easiest way to do this is to take the polished contig, arbitrarily introduce a break in the middle, then use the two contigs as input to minimus2:
        Download AMOS for free. AMOS is a collection of tools for genome assembly. AMOS is a collection of tools and class interfaces for the assembly of DNA reads. The package includes a robust infrastructure, modular assembly pipelines, and tools for overlapping, consensus generation, contigging, and assembly manipulation.

        which should then output a single circular contig by overlapping the start of one contig with the end of the other.
        4. Run the circularized contig through RAST, BaSys, or using the original data and the contig as a reference run a Base Modification analysis in SMRT Portal.



        5. Publish.

        Comment


        • #34
          The base modification pipeline will not break contigs. The number of contigs output should be the same as in the reference. If you use the reference that is one contig, by first downloading it from the HGAP job, then uploading it as a reference you should only get results for one contig.

          Comment


          • #35
            HGAP Parameters

            Hi rhall,

            Thanks for your help. I am working on this single contig assembly and able to run base_modification correctly. I was trying to assemble another small genome with pacbio RS || data with 135x coverage. Through default HGAP method I got 3 contigs. What other HGAP parameters can be tweaked to get better (possibly 1 contig) assembly for these genome?

            Also, what are the changes in the SMRTanalysis 2.0.1 version? Are there any changes that might improve the HGAP results from version 2.0.0?

            Thanks
            Sagar

            Comment


            • #36
              Given that you are getting only 3 contigs, tweaking HGAP parameters will likely not improve the assembly. I would try to understand why / what the 3 contigs are. Are the 3 contigs all significant length? or does one contig make up most of the expected genome length? Looking at Dot plots of the contigs against one another will indicated if the contigs are being broken due to un-resolvable repeat structure / ambiguous overlap, in which case the only way to join the contigs will likely be producing longer reads / library. Also looking at the actual alignment of the raw reads to the contigs can indicate if the contig ends have broken due to repeats, indicated by mapQV=0 (red in SMRT View) reads mapped to the end of contigs.
              Also Celera Assembler can often be quite conservative in overlapping. If the 3 contigs do actually unambiguously overlap, minimus2 can be used to complete the assembly.

              Comment


              • #37
                2.0.1 will likely not improve HGAP results over 2.0.0 for old data, but recommended if you are going to process data from P4 enzyme.

                Comment


                • #38
                  The genome size is 4.3 MB while longest contig is 3.5 MB. smallest one is 12 KB. We were able to get this with default parameters while our external data_providers were able use same data and get down to 1 contig with HGAP method. I am waiting for their reply about how they tweak the parameters. However, I though meanwhile I could try some tweaking which could help me to improve the assembly. May be updating the mapping parameters would help?

                  I am trying to get update to v 2.0.1 with system admins.

                  Many Thanks
                  Sagar

                  Comment


                  • #39
                    If your PreAssembled yield is really good, and you are using an automatically calculated seed length, using a slightly longer seed, provided that the preassembled #bases is still ~15x may give better results. Also toggling Allow Partial Alignments will have a significant effect, which way depends on data. If the PreAssembled read yield is low, decreasing the RQ filter to 0.75 may improve results. The blasr mapping parameters will likely not have much of an impact in going from 3 contigs to 1.

                    Comment


                    • #40
                      Can I ask a quick tad unrelated. I have an older linux machine running redhat version 5.3. The smartanalysis 2.01 for centos version 5.3 should work ok right?

                      Comment


                      • #41
                        It it definitely worth trying, but I'm not an expert on the subtle differences between centos and redhat.

                        Comment


                        • #42
                          Originally posted by tfahland View Post
                          Can I ask a quick tad unrelated. I have an older linux machine running redhat version 5.3. The smartanalysis 2.01 for centos version 5.3 should work ok right?
                          It does work. We are running 2.0.1 on Rocks v.5.3 and 5.4 clusters. There is a separate installer for CentOS 5.3 which you should use.

                          Comment


                          • #43
                            Version 2.0.1 generated improved assembly

                            Hi rhall,

                            Thanks for your reply. After update to SMRTanalysis version 2.0.1, I was able to use default HGAP method to assemble genome as single contig. Same genome was assembled to 3 contigs in SMRTanalysis version 2.0. There is definitely some improvement in HGAP algorithm. I have seen improvement in asembly stats for some other genomes too. I recommend update to version 2.0.1 to all users.

                            Initially, we tried to install SMRTanalysis on redhat system. But got lot of errors related to python and never able to get this running on redhat. I suggest installing it on recommended version rather than spending time for troubleshooting later.

                            Thanks

                            Comment


                            • #44
                              Hi everyone,
                              I am trying to install SMRT v2.0.1 on ubuntu 10.04 i386 server.
                              when Perform fresh installation with the
                              "./configure_smrtanalysis.sh"command, an error returned.The error message is :
                              "./configure_smrtanalysis.sh:line 29:/smrtanalysis-2.0.1/redist/python2.7/bin/python:cannot execute binary file".

                              can anyone help me out there? thanks in advance!

                              Comment


                              • #45
                                I believe SMRT Analysis requires a 64bit OS.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin


                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                                  Yesterday, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                55 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                45 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                55 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X