SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
Installing SplitRead pepsimax Bioinformatics 0 06-14-2012 03:26 AM
installing goseq chknbio Bioinformatics 2 05-29-2012 02:00 PM
problems installing package goseq PFS Bioinformatics 2 05-29-2012 01:18 PM
installing birdsuite rworthi Bioinformatics 3 12-03-2011 01:04 AM
Installing CisGenome Mich_Scientist Bioinformatics 3 12-18-2010 05:17 AM

Reply
 
Thread Tools
Old 02-14-2013, 12:32 PM   #1
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default Installing SMRTanalysis package

Hi,

I am trying to install SMRTanalysis package from pacbio on SUSE Linux server.
  1. Edited the setup script (/opt/smrtanalysis-1.4.0/etc/setup.sh) to match our installation location.
  2. Performed fresh installation with configure_smrtanalysis.sh

Then run the command '/smartanalysis/analysis/bin/smrtpipe.py' and got an error message as below:

Traceback (most recent call last):
File "/data1/smartanalysis/analysis/bin/smrtpipe.py", line 4, in <module>
import pkg_resources
File "/usr/local/python2.7/lib/python2.7/site-packages/distribute-0.6.34-py2.7.egg/pkg_resources.py", line 2803, in <module>
working_set.require(__requires__)
File "/usr/local/python2.7/lib/python2.7/site-packages/distribute-0.6.34-py2.7.egg/pkg_resources.py", line 696, in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/local/python2.7/lib/python2.7/site-packages/distribute-0.6.34-py2.7.egg/pkg_resources.py", line 594, in resolve
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: pbpy==0.1

Its some errors related to python. The current python path its searching for is (/usr/local/python2.7/) which is our local python install. I know SMRTanalysis have python bundled under folder (smartanalysis/analysis/lib/python2.7). But it always search in local python directory and throws error.

I have setup the path for "SEYMOUR_HOME" in setup.sh. Any suggestions regarding this?
sagarutturkar is offline   Reply With Quote
Old 02-14-2013, 04:00 PM   #2
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 322
Default

The bundled python is required for things to work. Try:
Code:
source /opt/smrtanalysis-1.4.0/etc/setup.sh
Then:
Code:
which python
Should return:
Code:
/opt/smrtanalysis-1.4.0/redist/python2.7/bin/python
rhall is offline   Reply With Quote
Old 02-14-2013, 06:48 PM   #3
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default

Hi rhall,

Thanks for your reply. After I source setup.sh, it displayed me correct python version. However, when I tried to run smrtpipe.py, I got following errors

Code:
Traceback (most recent call last):
  File "/data1/smrtanalysis-1.4.0/analysis/bin/smrtpipe.py", line 5, in <module>
    pkg_resources.run_script('pbpy==0.1', 'smrtpipe.py')
  File "build/bdist.linux-i686/egg/pkg_resources.py", line 489, in run_script
    keys.append(dist.key)
  File "build/bdist.linux-i686/egg/pkg_resources.py", line 1207, in run_script
    def __init__(self,module):
  File "/data1/smrtanalysis-1.4.0/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/EGG-INFO/scripts/smrtpipe.py", line 11, in <module>
    from pbpy.smrtpipe.SmrtPipeMain import SmrtPipeMain, _sanityCheck
  File "/data1/smartanalysis/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/pbpy/smrtpipe/SmrtPipeMain.py", line 22, in <module>
    from pbpy.smrtpipe.engine.SmrtCloud import SmrtCloudWorkflow
  File "/data1/smartanalysis/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/pbpy/smrtpipe/engine/SmrtCloud.py", line 9, in <module>
    from pbpy.smrtpipe.engine.SmrtPipeWorkflow import SmrtPipeWorkflow
  File "/data1/smartanalysis/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/pbpy/smrtpipe/engine/SmrtPipeWorkflow.py", line 35, in <module>
    from pbpy.smrtpipe.engine.SmrtDAG import SMRTDAG
  File "/data1/smartanalysis/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/pbpy/smrtpipe/engine/SmrtDAG.py", line 25, in <module>
    from pbpy.plot.PlotHelpers import makeHBarPlotPng
  File "/data1/smartanalysis/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/pbpy/plot/PlotHelpers.py", line 10, in <module>
    import matplotlib.pyplot as plt
  File "/data1/smartanalysis/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/pyplot.py", line 23, in <module>
    from matplotlib.figure import Figure, figaspect
  File "/data1/smartanalysis/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/figure.py", line 18, in <module>
    from axes import Axes, SubplotBase, subplot_class_factory
  File "/data1/smartanalysis/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/axes.py", line 14, in <module>
    import matplotlib.axis as maxis
  File "/data1/smartanalysis/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/axis.py", line 10, in <module>
    import matplotlib.font_manager as font_manager
  File "/data1/smartanalysis/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/font_manager.py", line 52, in <module>
    from matplotlib import ft2font
ImportError: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by /data1/smartanalysis/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/ft2font.so)
Do I need to install some extra modules or update paths or source other files?

Thanks
sagarutturkar is offline   Reply With Quote
Old 02-15-2013, 04:30 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,087
Default

Quote:
Originally Posted by sagarutturkar View Post
Hi rhall,

Thanks for your reply. After I source setup.sh, it displayed me correct python version. However, when I tried to run smrtpipe.py, I got following errors

Code:
ImportError: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by /data1/smartanalysis/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/ft2font.so)
Do I need to install some extra modules or update paths or source other files?

Thanks
Stepping out of my comfort zone (I am not a sys admin) .. Is your LD_LIBRARY_PATH variable set correctly? Do you have multiple versions of "libstdc++.so.6" in /usr/lib?

On a local cluster with a working install of SMRTanalysis 1.4.0 there is only one "libstdc++.so.6" and the latest I see is GLIBCXX_3.4.8, if I do

Code:
strings /usr/lib/libstdc++.so.6 | grep GLIBC
How about

Code:
strings /opt/smrtanalysis-1.4.0/analysis/lib/python2.7/matplotlib-1.0.1-py2.7-linux-x86_64.egg/matplotlib/ft2font.so | grep GLIBC
GenoMax is offline   Reply With Quote
Old 02-15-2013, 08:20 AM   #5
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 322
Default

Sorry, I notice that you are using SUSE. SMRTanalysis is only distributed for Ubuntu 10.04 and Centos 5.6. While Ubuntu is SUSE based getting SMRTanalysis to work will likely prove futile given the version differences for things like glibc. Unfortunately, given the complexity of the system, building from source on different Linux distributions is not an option. Depending on what you have planned there are three options:
1. Install Ubuntu 10.04 - Not generally practical, but the best option if you intend using SMRTanalysis a lot, or installing the web server for other people to use.
2. Install Ubuntu 10.04 (the server version has the smallest footprint) in a virtual machine (VM) that runs on your SUSE system.
This is probably the best option, it will give good performance, but the VM will likely take up a lot of disk space.
For setting up SUSE as a VM host:
http://doc.opensuse.org/documentatio.../book.kvm.html
Or.
http://www.oracle.com/technetwork/se...ads/index.html
I would highly recommend virtualbox, it is very easy to use.
3. If you only want to try SMRTanalysis and are not going to do any heavy computation, or simply as a test to see if you want to go to the effort of installing it in a VM then the Amazon AMI route is very easy, but has a cost associated with it.
rhall is offline   Reply With Quote
Old 02-15-2013, 08:22 AM   #6
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 322
Default

AMI howto:
http://files.pacb.com/software/smrta...n%20Amazon.pdf

Last edited by rhall; 04-10-2013 at 01:59 PM.
rhall is offline   Reply With Quote
Old 02-18-2013, 12:09 PM   #7
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default Success

Dear rhall and GenoMax,

Thanks you very much for your help and comments regarding installation. We built new ubuntu server and installed SMRTanalysis correctly . Updating the gfortran library helped to resolve errors with the help of system admins.

I want to try AHA pipeline to improve existing assembly with pacbio data.

Thanks
Sagar
sagarutturkar is offline   Reply With Quote
Old 02-19-2013, 09:01 AM   #8
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default

Hello,

Again some problems. I was able to run the smrtpipe.py command without any errors. However when I tried to run the SMRTpipe example as given in http://pacb.com/devnet/files/softwar...ce%20Guide.pdf

The data is located at:
Code:
/opt/smrtanalysis/common/test/smrtpipe/lambda_resequencing/*
Created input.xml as :
Code:
fofnToSmrtpipeInput.py lambda_resequencing.fofn > input.xml
settings.xml was gathered from:
Code:
/opt/smrtanalysis/smartanalysis/common/protocols/lambda_RS_Resequencing.1.xml
However there was no files generated in results and data sub-directories.

Few error lines I see in master.log file are:
Code:
[DEBUG] 2013-02-19 11:21:48,983 [pbpy.io.MetaAnalysisXml load 116] No header found in input.xml. Unable to load jobId

[DEBUG] 2013-02-19 11:21:48,984 [pbpy.smrtpipe.InputData loadXml 214] Skipping assignment of JobId. Unable to find header in input.xml

[INFO] 2013-02-19 11:45:30,281 [pbpy.smrtpipe.SmrtPipeContext movieFiles 365] Found /data2/smart/smartanalysis/common/test/primary/lambda/Analysis_Results/m120404_104101_00114_c100318002550000001523015908241265_s1_p0.bas.h5 (81059282 bytes)
[WARNING] 2013-02-19 11:45:30,282 [pbpy.smrtpipe.SmrtPipeMain _getBasVersions 456] Unable to correctly Parse the basH5 versions. Allowing job to proceed, but please fix the compatibility matrix under $SEYMOUR_HOME/common/etc. Error unable to create file (File accessability: Unable to open file)
I have attached log files for reference. Please help with this.

Can anybody post the xml files that worked. Instructions for creating appropriate xml files are way beyond understanding of biologist. More clarification is needed from pacbio

Thanks
Sagar
Attached Files
File Type: zip smrtpipe.log.zip (6.6 KB, 2 views)
sagarutturkar is offline   Reply With Quote
Old 02-19-2013, 09:51 AM   #9
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 322
Default

The protocols in /opt/smartanalysis/common/protocols/ are not for use with smrtpipe.py, they are templates used with the SMRT portal web interface. The settings.xml file used with smrtpipe.py can be much simpler, but could also include lots of parameters. The simplest settings.xml for filtering, mapping, and calling consensus on lambda data (using all default parameters)
Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<smrtpipeSettings>
        <protocol id="lambda_resequencing.1">
                <param name="reference">
                        <value>common/references/lambda</value>
                </param>
        </protocol>
    <moduleStage name="fetch" editable="true">
        <module label="Fetch v1" id="P_Fetch">
        </module>
    </moduleStage>
    <moduleStage name="filtering">
        <module label="Filter" id="P_Filter">
        </module>
    </moduleStage>
    <moduleStage name="mapping" editable="true">
        <module label="BLASR v1" id="P_Mapping">
        </module>
    </moduleStage>
    <moduleStage name="consensus" editable="true">
        <module label="Quiver v1" id="P_GenomicConsensus">
        </module>
    </moduleStage>
</smrtpipeSettings>
While running SMRT pipe is great for writing complex custom pipelines, it is not the most user friendly. Almost everything you could ever want to do can be achieved via SMRT portal (the web interface), right down to the level of customizing workflows. As someone who works with PacBio data, and has a background in computing / bioinformatics, I still use the SMRT portal web interface for 90% of my analysis.

P.S. the [WARNING] in the log output is really just a warning, real errors will be tagged [ERROR], and an exit will be tagged [CRITICAL]. The reason that you got zero output but no [ERROR] or [CRITICAL] is that the xml file you used is a template so includes some of the necessary input, without including the tags that do the computation.
rhall is offline   Reply With Quote
Old 02-19-2013, 10:42 AM   #10
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,087
Default

I verified that "Rhall's" minimal settings.xml file does indeed work with SMRTpipe v.1.4 on the command line with lambda test data . Thanks Rhall!

PacBio command line is not for the faint of heart.

One problem with previous versions of the web interface was that the "role/account based" restrictions for SMRTcell data did not work (not good in a core environment where everyone could see/access all data). They have supposedly been fixed in SMRTanalysis v.1.4.

Rhall: By chance is that something you have looked at?

Last edited by GenoMax; 02-19-2013 at 10:45 AM.
GenoMax is offline   Reply With Quote
Old 02-19-2013, 12:25 PM   #11
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default

Dear rhall and GenoMax,

Thanks for the reply. I was able to run the analysis with rhalls minimal settings.xml. Meanwhile our system admin setup the SMRTportal (pretty quick ) and added me as user.

Quote:
Originally Posted by rhall View Post
While running SMRT pipe is great for writing complex custom pipelines, it is not the most user friendly. Almost everything you could ever want to do can be achieved via SMRT portal (the web interface), right down to the level of customizing workflows. As someone who works with PacBio data, and has a background in computing / bioinformatics, I still use the SMRT portal web interface for 90% of my analysis.
Now I want to run AHA pipeline to improve my current assembly.
  1. I have imported illumina assembly with 48 scaffolds as reference in reference_dropbox folder.
  2. I have received pacbio data (filtered_subreads.fastq) file from our collaborator.
  3. I also have access to raw pacbio data but collaborator suggested to use filtered_subreads.fastq file.
  4. I also have access to corrected pacbio reads (from pacbioToCA pipeline).


In SMRTportal, I selected the "RS_AHA_scaffolding" protocol and "Soap_scaffolds.fasta" as reference. Now how can I input the pacbio data and run the AHA algorithm? I guess using error corrected data would be best. Any suggestions?

Thanks
Sagar
sagarutturkar is offline   Reply With Quote
Old 02-19-2013, 01:35 PM   #12
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 322
Default

Yes in 1.4 jobs and raw data can be assigned to a group, then only users in that group can access and see the data.
rhall is offline   Reply With Quote
Old 02-19-2013, 01:48 PM   #13
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default

Quote:
Originally Posted by rhall View Post
Yes in 1.4 jobs and raw data can be assigned to a group, then only users in that group can access and see the data.
Thanks. But how to do this specifically. I tried to import raw data using "import SMRT cells" option. However, after scanning the path it says no SMRT cells found.

my raw data looks lie this:
Code:
long_m130123_002504_42153_c100461682550000001523059505101395_s1_p0.bas.h5
m130123_002504_42153_c100461682550000001523059505101395_s1_p0-02.log
m130123_002504_42153_c100461682550000001523059505101395_s1_p0-03.log
m130123_002504_42153_c100461682550000001523059505101395_s1_p0-04.log
m130123_002504_42153_c100461682550000001523059505101395_s1_p0.bas.h5
m130123_002504_42153_c100461682550000001523059505101395_s1_p0.ccs.fasta
m130123_002504_42153_c100461682550000001523059505101395_s1_p0.ccs.fastq
m130123_002504_42153_c100461682550000001523059505101395_s1_p0.fasta
m130123_002504_42153_c100461682550000001523059505101395_s1_p0.fastq
m130123_002504_42153_c100461682550000001523059505101395_s1_p0.sts.csv
m130123_002504_42153_c100461682550000001523059505101395_s1_p0.sts.xml
strobe_m130123_002504_42153_c100461682550000001523059505101395_s1_p0.bas.h5
Apart from this I have

Code:
filtered_subreads_CF080.fastq
Corrected_pacbio.fasta
Also, I created a xml file for AHA and tried to run it through command line. I tried this with bas.h5 file as well as corrected_pacbio.fasta files. Each time I got error as

Code:
[INFO] 2013-02-19 16:21:52,513 [pbpy.smrtpipe.SmrtPipeScope fitRefLengthToScope 95] Total length of reference is 7024.52 kbp

[INFO] 2013-02-19 16:21:52,514 [pbpy.smrtpipe.SmrtPipeScope fitRefLengthToScope 99] Reference scope is huge

[INFO] 2013-02-19 16:21:52,514 [pbpy.smrtpipe.modules.HybridAssembly run 452] Genome scope is large enough to potentially slow down nucmer repeat detection, so refusing to run. Not running nucmer can increase false positive scaffolds links induced by repeats. To allow nucmer execution increase DENOVO_GENOME_SCOPES in smrtpipe.rcor on command line with e.g. -DDENOVO_GENOME_SCOPES=small:1,large:1,huge:1.

ValueError: invalid literal for int() with base 10: ''
[ERROR] 2013-02-19 16:21:52,550 [pbpy.smrtpipe.SmrtPipeMain exit 760] invalid literal for int() with base 10: ''


Thanks
Sagar
sagarutturkar is offline   Reply With Quote
Old 02-19-2013, 01:49 PM   #14
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 322
Default

sagarutturkar,
The basic input type into SMRT portal is PacBio raw data. You should be able to Import SMRT Cells from the 'import and manage' tab, pointing it to the directory structure of the data that comes off the machine.
The filtered_subreads.fastq is useful for using non PacBio software, but you should use raw data for SMRT portal workflows.
Once you have the data imported you should be able to run the RS_AHA_scaffolding workflow, with the reference you have imported, and the raw data.
I'm not aware of a method for using the corrected reads (pacbioToCA) to scaffold an assembly without going to the command line, and outside of the SMRT portal / pipe system.
rhall is offline   Reply With Quote
Old 02-19-2013, 01:51 PM   #15
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 322
Default

Sorry,
Quote:
Yes in 1.4 jobs and raw data can be assigned to a group, then only users in that group can access and see the data.
was in reply to GenoMax.
rhall is offline   Reply With Quote
Old 02-19-2013, 01:55 PM   #16
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 322
Default

The raw data is not complete, it should look like this:
Quote:
./
*.metadata.xml
./Analysis_Results
[the files you have]
rhall is offline   Reply With Quote
Old 02-19-2013, 04:33 PM   #17
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 322
Default

Command line AHA including howto to go from filtered subreads and corrected reads:
https://github.com/PacificBioscience...ining/wiki/AHA
rhall is offline   Reply With Quote
Old 02-20-2013, 06:18 AM   #18
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default

Quote:
Originally Posted by rhall View Post
Command line AHA including howto to go from filtered subreads and corrected reads:
https://github.com/PacificBioscience...ining/wiki/AHA
I used the same guidelines and got error as:

Code:
[ERROR] 2013-02-19 16:21:52,547 [pbpy.smrtpipe.SmrtPipeMain run 648] invalid literal for int() with base 10:
I tried different input files (fasta, bas.h5) and different data, but same error for each of them. Not yet able to figure out whats going wrong.

About SMRT cells importing - kind of weired things happening.

I downloaded the complete data and imported it again. It showed a message "1 SMRT cell imported". However, in design jobs I do not see any SMRT cells available. I tried to use search option but no luck. Does import of SMRT cells need to be approved by administrator? I might be doing some silly mistake or missing some step

When I tried to re-import from same location, I got message "No New SMRT cells found".

Have you noticed something like this before.

Thanks
Sagar
sagarutturkar is offline   Reply With Quote
Old 02-20-2013, 07:26 AM   #19
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 322
Default

The [ERROR] is a python exception, could you post the entire log so we can get some more context?
The SMRT cell should show up, it does not need to be approved, my only thought is that maybe you do not have a group assigned to you. If it is just a personal copy of SMRT portal, log in as administrator and check under 'admin' that you are in group 'all' (note you can place a user in multiple groups by 'ctrl' clicking), also check that your role is 'scientist' (although I'm not sure what the different roles actually mean).
SMRT Cells will only import once, any further attempt will give 'No New SMRT cells found', so it sounds like they are there somewhere.
rhall is offline   Reply With Quote
Old 02-20-2013, 09:05 AM   #20
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default

Quote:
Originally Posted by rhall View Post
The [ERROR]
The SMRT cell should show up, it does not need to be approved, my only thought is that maybe you do not have a group assigned to you.
Hi, I am able to import the SMRT cells now. But When I ran the "RS_AHA_scaffolding.1" protocol with my assembly as reference, the job failed.

I have attached log file for the job that failed through SMRTportal (CF80_AHA.zip).

Quote:
Originally Posted by rhall View Post

The [ERROR] is a python exception, could you post the entire log so we can get some more context?
The log files for AHA job that failed through command line are also attached (smrtpipe.log.zip).

Many thanks for your help.

-Sagar
Attached Files
File Type: zip CF80_AHA.txt.zip (6.6 KB, 4 views)
File Type: zip smrtpipe.log.zip (7.9 KB, 5 views)
sagarutturkar is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:42 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO