Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Installing SMRTanalysis package

    Hi,

    I am trying to install SMRTanalysis package from pacbio on SUSE Linux server.
    1. Edited the setup script (/opt/smrtanalysis-1.4.0/etc/setup.sh) to match our installation location.
    2. Performed fresh installation with configure_smrtanalysis.sh


    Then run the command '/smartanalysis/analysis/bin/smrtpipe.py' and got an error message as below:

    Traceback (most recent call last):
    File "/data1/smartanalysis/analysis/bin/smrtpipe.py", line 4, in <module>
    import pkg_resources
    File "/usr/local/python2.7/lib/python2.7/site-packages/distribute-0.6.34-py2.7.egg/pkg_resources.py", line 2803, in <module>
    working_set.require(__requires__)
    File "/usr/local/python2.7/lib/python2.7/site-packages/distribute-0.6.34-py2.7.egg/pkg_resources.py", line 696, in require
    needed = self.resolve(parse_requirements(requirements))
    File "/usr/local/python2.7/lib/python2.7/site-packages/distribute-0.6.34-py2.7.egg/pkg_resources.py", line 594, in resolve
    raise DistributionNotFound(req)
    pkg_resources.DistributionNotFound: pbpy==0.1

    Its some errors related to python. The current python path its searching for is (/usr/local/python2.7/) which is our local python install. I know SMRTanalysis have python bundled under folder (smartanalysis/analysis/lib/python2.7). But it always search in local python directory and throws error.

    I have setup the path for "SEYMOUR_HOME" in setup.sh. Any suggestions regarding this?

  • #2
    The bundled python is required for things to work. Try:
    Code:
    source /opt/smrtanalysis-1.4.0/etc/setup.sh
    Then:
    Code:
    which python
    Should return:
    Code:
    /opt/smrtanalysis-1.4.0/redist/python2.7/bin/python

    Comment


    • #3
      Hi rhall,

      Thanks for your reply. After I source setup.sh, it displayed me correct python version. However, when I tried to run smrtpipe.py, I got following errors

      Code:
      Traceback (most recent call last):
        File "/data1/smrtanalysis-1.4.0/analysis/bin/smrtpipe.py", line 5, in <module>
          pkg_resources.run_script('pbpy==0.1', 'smrtpipe.py')
        File "build/bdist.linux-i686/egg/pkg_resources.py", line 489, in run_script
          keys.append(dist.key)
        File "build/bdist.linux-i686/egg/pkg_resources.py", line 1207, in run_script
          def __init__(self,module):
        File "/data1/smrtanalysis-1.4.0/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/EGG-INFO/scripts/smrtpipe.py", line 11, in <module>
          from pbpy.smrtpipe.SmrtPipeMain import SmrtPipeMain, _sanityCheck
        File "/data1/smartanalysis/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/pbpy/smrtpipe/SmrtPipeMain.py", line 22, in <module>
          from pbpy.smrtpipe.engine.SmrtCloud import SmrtCloudWorkflow
        File "/data1/smartanalysis/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/pbpy/smrtpipe/engine/SmrtCloud.py", line 9, in <module>
          from pbpy.smrtpipe.engine.SmrtPipeWorkflow import SmrtPipeWorkflow
        File "/data1/smartanalysis/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/pbpy/smrtpipe/engine/SmrtPipeWorkflow.py", line 35, in <module>
          from pbpy.smrtpipe.engine.SmrtDAG import SMRTDAG
        File "/data1/smartanalysis/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/pbpy/smrtpipe/engine/SmrtDAG.py", line 25, in <module>
          from pbpy.plot.PlotHelpers import makeHBarPlotPng
        File "/data1/smartanalysis/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/pbpy/plot/PlotHelpers.py", line 10, in <module>
          import matplotlib.pyplot as plt
        File "/data1/smartanalysis/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/pyplot.py", line 23, in <module>
          from matplotlib.figure import Figure, figaspect
        File "/data1/smartanalysis/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/figure.py", line 18, in <module>
          from axes import Axes, SubplotBase, subplot_class_factory
        File "/data1/smartanalysis/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/axes.py", line 14, in <module>
          import matplotlib.axis as maxis
        File "/data1/smartanalysis/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/axis.py", line 10, in <module>
          import matplotlib.font_manager as font_manager
        File "/data1/smartanalysis/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/font_manager.py", line 52, in <module>
          from matplotlib import ft2font
      ImportError: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by /data1/smartanalysis/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/ft2font.so)
      Do I need to install some extra modules or update paths or source other files?

      Thanks

      Comment


      • #4
        Originally posted by sagarutturkar View Post
        Hi rhall,

        Thanks for your reply. After I source setup.sh, it displayed me correct python version. However, when I tried to run smrtpipe.py, I got following errors

        Code:
        ImportError: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by /data1/smartanalysis/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/ft2font.so)
        Do I need to install some extra modules or update paths or source other files?

        Thanks
        Stepping out of my comfort zone (I am not a sys admin) .. Is your LD_LIBRARY_PATH variable set correctly? Do you have multiple versions of "libstdc++.so.6" in /usr/lib?

        On a local cluster with a working install of SMRTanalysis 1.4.0 there is only one "libstdc++.so.6" and the latest I see is GLIBCXX_3.4.8, if I do

        Code:
        strings /usr/lib/libstdc++.so.6 | grep GLIBC
        How about

        Code:
        strings /opt/smrtanalysis-1.4.0/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/ft2font.so | grep GLIBC

        Comment


        • #5
          Sorry, I notice that you are using SUSE. SMRTanalysis is only distributed for Ubuntu 10.04 and Centos 5.6. While Ubuntu is SUSE based getting SMRTanalysis to work will likely prove futile given the version differences for things like glibc. Unfortunately, given the complexity of the system, building from source on different Linux distributions is not an option. Depending on what you have planned there are three options:
          1. Install Ubuntu 10.04 - Not generally practical, but the best option if you intend using SMRTanalysis a lot, or installing the web server for other people to use.
          2. Install Ubuntu 10.04 (the server version has the smallest footprint) in a virtual machine (VM) that runs on your SUSE system.
          This is probably the best option, it will give good performance, but the VM will likely take up a lot of disk space.
          For setting up SUSE as a VM host:
          http://doc.opensuse.org/documentatio.../book.kvm.html
          Or.
          http://www.oracle.com/technetwork/se...ads/index.html
          I would highly recommend virtualbox, it is very easy to use.
          3. If you only want to try SMRTanalysis and are not going to do any heavy computation, or simply as a test to see if you want to go to the effort of installing it in a VM then the Amazon AMI route is very easy, but has a cost associated with it.

          Comment


          • #6
            AMI howto:
            http://files.pacb.com/software/smrta...n%20Amazon.pdf
            Last edited by rhall; 04-10-2013, 12:59 PM.

            Comment


            • #7
              Success

              Dear rhall and GenoMax,

              Thanks you very much for your help and comments regarding installation. We built new ubuntu server and installed SMRTanalysis correctly . Updating the gfortran library helped to resolve errors with the help of system admins.

              I want to try AHA pipeline to improve existing assembly with pacbio data.

              Thanks
              Sagar

              Comment


              • #8
                Hello,

                Again some problems. I was able to run the smrtpipe.py command without any errors. However when I tried to run the SMRTpipe example as given in http://pacb.com/devnet/files/softwar...ce%20Guide.pdf

                The data is located at:
                Code:
                /opt/smrtanalysis/common/test/smrtpipe/lambda_resequencing/*
                Created input.xml as :
                Code:
                fofnToSmrtpipeInput.py lambda_resequencing.fofn > input.xml
                settings.xml was gathered from:
                Code:
                /opt/smrtanalysis/smartanalysis/common/protocols/lambda_RS_Resequencing.1.xml
                However there was no files generated in results and data sub-directories.

                Few error lines I see in master.log file are:
                Code:
                [DEBUG] 2013-02-19 11:21:48,983 [pbpy.io.MetaAnalysisXml load 116] No header found in input.xml. Unable to load jobId
                
                [DEBUG] 2013-02-19 11:21:48,984 [pbpy.smrtpipe.InputData loadXml 214] Skipping assignment of JobId. Unable to find header in input.xml
                
                [INFO] 2013-02-19 11:45:30,281 [pbpy.smrtpipe.SmrtPipeContext movieFiles 365] Found /data2/smart/smartanalysis/common/test/primary/lambda/Analysis_Results/m120404_104101_00114_c100318002550000001523015908241265_s1_p0.bas.h5 (81059282 bytes)
                [WARNING] 2013-02-19 11:45:30,282 [pbpy.smrtpipe.SmrtPipeMain _getBasVersions 456] Unable to correctly Parse the basH5 versions. Allowing job to proceed, but please fix the compatibility matrix under $SEYMOUR_HOME/common/etc. Error unable to create file (File accessability: Unable to open file)
                I have attached log files for reference. Please help with this.

                Can anybody post the xml files that worked. Instructions for creating appropriate xml files are way beyond understanding of biologist. More clarification is needed from pacbio

                Thanks
                Sagar
                Attached Files

                Comment


                • #9
                  The protocols in /opt/smartanalysis/common/protocols/ are not for use with smrtpipe.py, they are templates used with the SMRT portal web interface. The settings.xml file used with smrtpipe.py can be much simpler, but could also include lots of parameters. The simplest settings.xml for filtering, mapping, and calling consensus on lambda data (using all default parameters)
                  Code:
                  <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
                  <smrtpipeSettings>
                          <protocol id="lambda_resequencing.1">
                                  <param name="reference">
                                          <value>common/references/lambda</value>
                                  </param>
                          </protocol>
                      <moduleStage name="fetch" editable="true">
                          <module label="Fetch v1" id="P_Fetch">
                          </module>
                      </moduleStage>
                      <moduleStage name="filtering">
                          <module label="Filter" id="P_Filter">
                          </module>
                      </moduleStage>
                      <moduleStage name="mapping" editable="true">
                          <module label="BLASR v1" id="P_Mapping">
                          </module>
                      </moduleStage>
                      <moduleStage name="consensus" editable="true">
                          <module label="Quiver v1" id="P_GenomicConsensus">
                          </module>
                      </moduleStage>
                  </smrtpipeSettings>
                  While running SMRT pipe is great for writing complex custom pipelines, it is not the most user friendly. Almost everything you could ever want to do can be achieved via SMRT portal (the web interface), right down to the level of customizing workflows. As someone who works with PacBio data, and has a background in computing / bioinformatics, I still use the SMRT portal web interface for 90% of my analysis.

                  P.S. the [WARNING] in the log output is really just a warning, real errors will be tagged [ERROR], and an exit will be tagged [CRITICAL]. The reason that you got zero output but no [ERROR] or [CRITICAL] is that the xml file you used is a template so includes some of the necessary input, without including the tags that do the computation.

                  Comment


                  • #10
                    I verified that "Rhall's" minimal settings.xml file does indeed work with SMRTpipe v.1.4 on the command line with lambda test data . Thanks Rhall!

                    PacBio command line is not for the faint of heart.

                    One problem with previous versions of the web interface was that the "role/account based" restrictions for SMRTcell data did not work (not good in a core environment where everyone could see/access all data). They have supposedly been fixed in SMRTanalysis v.1.4.

                    Rhall: By chance is that something you have looked at?
                    Last edited by GenoMax; 02-19-2013, 10:45 AM.

                    Comment


                    • #11
                      Dear rhall and GenoMax,

                      Thanks for the reply. I was able to run the analysis with rhalls minimal settings.xml. Meanwhile our system admin setup the SMRTportal (pretty quick ) and added me as user.

                      Originally posted by rhall View Post
                      While running SMRT pipe is great for writing complex custom pipelines, it is not the most user friendly. Almost everything you could ever want to do can be achieved via SMRT portal (the web interface), right down to the level of customizing workflows. As someone who works with PacBio data, and has a background in computing / bioinformatics, I still use the SMRT portal web interface for 90% of my analysis.
                      Now I want to run AHA pipeline to improve my current assembly.
                      1. I have imported illumina assembly with 48 scaffolds as reference in reference_dropbox folder.
                      2. I have received pacbio data (filtered_subreads.fastq) file from our collaborator.
                      3. I also have access to raw pacbio data but collaborator suggested to use filtered_subreads.fastq file.
                      4. I also have access to corrected pacbio reads (from pacbioToCA pipeline).



                      In SMRTportal, I selected the "RS_AHA_scaffolding" protocol and "Soap_scaffolds.fasta" as reference. Now how can I input the pacbio data and run the AHA algorithm? I guess using error corrected data would be best. Any suggestions?

                      Thanks
                      Sagar

                      Comment


                      • #12
                        Yes in 1.4 jobs and raw data can be assigned to a group, then only users in that group can access and see the data.

                        Comment


                        • #13
                          Originally posted by rhall View Post
                          Yes in 1.4 jobs and raw data can be assigned to a group, then only users in that group can access and see the data.
                          Thanks. But how to do this specifically. I tried to import raw data using "import SMRT cells" option. However, after scanning the path it says no SMRT cells found.

                          my raw data looks lie this:
                          Code:
                          long_m130123_002504_42153_c100461682550000001523059505101395_s1_p0.bas.h5
                          m130123_002504_42153_c100461682550000001523059505101395_s1_p0-02.log
                          m130123_002504_42153_c100461682550000001523059505101395_s1_p0-03.log
                          m130123_002504_42153_c100461682550000001523059505101395_s1_p0-04.log
                          m130123_002504_42153_c100461682550000001523059505101395_s1_p0.bas.h5
                          m130123_002504_42153_c100461682550000001523059505101395_s1_p0.ccs.fasta
                          m130123_002504_42153_c100461682550000001523059505101395_s1_p0.ccs.fastq
                          m130123_002504_42153_c100461682550000001523059505101395_s1_p0.fasta
                          m130123_002504_42153_c100461682550000001523059505101395_s1_p0.fastq
                          m130123_002504_42153_c100461682550000001523059505101395_s1_p0.sts.csv
                          m130123_002504_42153_c100461682550000001523059505101395_s1_p0.sts.xml
                          strobe_m130123_002504_42153_c100461682550000001523059505101395_s1_p0.bas.h5
                          Apart from this I have

                          Code:
                          filtered_subreads_CF080.fastq
                          Corrected_pacbio.fasta
                          Also, I created a xml file for AHA and tried to run it through command line. I tried this with bas.h5 file as well as corrected_pacbio.fasta files. Each time I got error as

                          Code:
                          [INFO] 2013-02-19 16:21:52,513 [pbpy.smrtpipe.SmrtPipeScope fitRefLengthToScope 95] Total length of reference is 7024.52 kbp
                          
                          [INFO] 2013-02-19 16:21:52,514 [pbpy.smrtpipe.SmrtPipeScope fitRefLengthToScope 99] Reference scope is huge
                          
                          [INFO] 2013-02-19 16:21:52,514 [pbpy.smrtpipe.modules.HybridAssembly run 452] Genome scope is large enough to potentially slow down nucmer repeat detection, so refusing to run. Not running nucmer can increase false positive scaffolds links induced by repeats. To allow nucmer execution increase DENOVO_GENOME_SCOPES in smrtpipe.rcor on command line with e.g. -DDENOVO_GENOME_SCOPES=small:1,large:1,huge:1.
                          
                          ValueError: invalid literal for int() with base 10: ''
                          [ERROR] 2013-02-19 16:21:52,550 [pbpy.smrtpipe.SmrtPipeMain exit 760] invalid literal for int() with base 10: ''


                          Thanks
                          Sagar

                          Comment


                          • #14
                            sagarutturkar,
                            The basic input type into SMRT portal is PacBio raw data. You should be able to Import SMRT Cells from the 'import and manage' tab, pointing it to the directory structure of the data that comes off the machine.
                            The filtered_subreads.fastq is useful for using non PacBio software, but you should use raw data for SMRT portal workflows.
                            Once you have the data imported you should be able to run the RS_AHA_scaffolding workflow, with the reference you have imported, and the raw data.
                            I'm not aware of a method for using the corrected reads (pacbioToCA) to scaffold an assembly without going to the command line, and outside of the SMRT portal / pipe system.

                            Comment


                            • #15
                              Sorry,
                              Yes in 1.4 jobs and raw data can be assigned to a group, then only users in that group can access and see the data.
                              was in reply to GenoMax.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              33 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              72 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              81 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X