SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help: de novo assembly using SMRT portal and hgap.3 nwfungi Pacific Biosciences 6 03-01-2016 01:09 PM
HGAP assembly failed task on SMRT Portal cascoamarillo Pacific Biosciences 15 05-26-2015 10:50 AM
Diary: Assembly in SMRT Portal 2.1.1 with HGAP+CA 8.1 pag Pacific Biosciences 20 09-06-2014 10:20 AM
HGAP Advanced Protocol skerker Pacific Biosciences 2 06-07-2014 12:39 PM
HGAP Parameters in SMRT Analysis bonifera Pacific Biosciences 7 10-24-2013 02:10 PM

Reply
 
Thread Tools
Old 08-16-2016, 02:27 PM   #1
rachita
Junior Member
 
Location: Boston

Join Date: Mar 2013
Posts: 7
Default HGAP.3 protocol in the SMRT 2.3 --failing

Hi,

I am trying to assemble a 124kb BAC and I am new to PacBio data. I have installed SMRT analysis 2.3. I have started a HGAP_Assembly.3 though the SMRT portal, but it keep failing at P_PreAssemblerDagcon/hgapAlignForCorrection step. The issue is that the scattered fasta files required do not exist. The preceding step hgapAlignForCorrection.target.Scatter get completed without any error. I tried running the script manually and it creates the files fine. So is there any missing option in the parameter file ?

Few other questions:
1. Is there a way to change the setting for the job queues. I tried updating the files in $SMRT_ROOT/current/analysis/etc/cluster/LSF/ but they are not reflected in the run.
2. Why do I need to supply a reference genome file to HGAP_Assembly when I intend to do a de-novo assembly.

I am attaching the log as well as my settings.xml file for your reference.
Attached Files
File Type: txt smrtpipe.stdout.txt (9.2 KB, 2 views)
File Type: txt smrtpipe.stderr.txt (704 Bytes, 1 views)
File Type: txt settings.xml.txt (15.9 KB, 3 views)
rachita is offline   Reply With Quote
Old 08-16-2016, 03:02 PM   #2
gconcepcion
Member
 
Location: Menlo Park

Join Date: Dec 2010
Posts: 68
Default

Quote:
Originally Posted by rachita View Post
Hi,

I am trying to assemble a 124kb BAC and I am new to PacBio data. I have installed SMRT analysis 2.3. I have started a HGAP_Assembly.3 though the SMRT portal, but it keep failing at P_PreAssemblerDagcon/hgapAlignForCorrection step. The issue is that the scattered fasta files required do not exist. The preceding step hgapAlignForCorrection.target.Scatter get completed without any error. I tried running the script manually and it creates the files fine. So is there any missing option in the parameter file ?
I see you have the genomeSize set @ 124kb, how much input coverage do you have?

If you can zip up the rest of the logs (particularly the ones in log/P_PreAssemblerDagcon/*) it will facilitate the troubleshooting process.


Quote:
Originally Posted by rachita View Post

Few other questions:
1. Is there a way to change the setting for the job queues. I tried updating the files in $SMRT_ROOT/current/analysis/etc/cluster/LSF/ but they are not reflected in the run.
There is information for configuring LSF here: https://github.com/PacificBioscience...llation-v2.2.0

Granted that's for the previous version (2.2.0 not 2.3.0 that you have installed) but configuration should be identical.

Quote:
Originally Posted by rachita View Post
2. Why do I need to supply a reference genome file to HGAP_Assembly when I intend to do a de-novo assembly.
You don't need to supply a reference. The reference that is referred to in the settings.xml is the final product of the assembly process that is used for error correction by mapping the raw reads to this freshly generated denovo reference.
gconcepcion is offline   Reply With Quote
Old 08-16-2016, 08:22 PM   #3
rachita
Junior Member
 
Location: Boston

Join Date: Mar 2013
Posts: 7
Default

1. I added the genome size as 124,000. I can't figure out the coverage as I have not aligned the data yet. The data consists of 89,252 long reads and 726,663,272 bases.

2. I have manually changed the LSF in .tmpl files to add a queue name ($smrtanalysis/current/analysis/etc/cluster/LSF) but still the jobs are going to the default queue.

3. When the pipeline did not work, so I manually ran the scripts in "P_PreAssemblerDagcon" followed by script align.plsFofn.Scatter.sh and align_003of003.sh. Which gave an IOError: The input path /PHShome/ry077/bin/smrtanalysis/userdata/jobs/016/016445/reference does not exist.


I am attaching the logs.

Thanks for all the help.
Attached Files
File Type: zip P_PreAssemblerDagcon.zip (5.9 KB, 1 views)
File Type: zip smrtpipe.log.txt.zip (6.6 KB, 0 views)
File Type: txt smrtpipe.stdout_1.txt (9.2 KB, 0 views)
File Type: txt smrtpipe.stderr_1.txt (704 Bytes, 0 views)
File Type: xml settings.xml (15.7 KB, 1 views)
rachita is offline   Reply With Quote
Old 08-17-2016, 09:58 AM   #4
gconcepcion
Member
 
Location: Menlo Park

Join Date: Dec 2010
Posts: 68
Default

Quote:
Originally Posted by rachita View Post
1. I added the genome size as 124,000. I can't figure out the coverage as I have not aligned the data yet. The data consists of 89,252 long reads and 726,663,272 bases.
OK, that's plenty of coverage, that's not the issue.

Quote:
Originally Posted by rachita View Post
2. I have manually changed the LSF in .tmpl files to add a queue name ($smrtanalysis/current/analysis/etc/cluster/LSF) but still the jobs are going to the default queue.
Who installed smrtanalysis for you? Make sure CLUSTER_MANAGER=LSF is in the smrtpipe.rc file located here:

$SMRT_ROOT/current/analysis/etc/smrtpipe.rc

If not, change it to LSF and restart smrtanalysis.

I'm not too familiar with LSF, but based on the error message in hgapAlignForCorrection_*.log
# Writing stdout and stderr from Popen:
/bin/bash: /opt/lsf/conf/profile.lsf: No such file or directory
Queue only accepts interactive jobs. Job not submitted.
SMRTanalysis is unable to resolve your LSF settings properly.



Quote:
Originally Posted by rachita View Post
3. When the pipeline did not work, so I manually ran the scripts in "P_PreAssemblerDagcon" followed by script align.plsFofn.Scatter.sh and align_003of003.sh. Which gave an IOError: The input path /PHShome/ry077/bin/smrtanalysis/userdata/jobs/016/016445/reference does not exist.
There is an intermediate step in between P_PreAssemblerDagcon and P_Mapping - and that's P_ReferenceUploader/runUploaderUnitig that formats the raw reads as a reference repository entry so that the raw reads can then be mapped and corrected prior to assembly.
gconcepcion is offline   Reply With Quote
Old 08-17-2016, 10:02 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

Not to high-jack this thread but Is LSF now fully supported for SMRTanalysis?
GenoMax is offline   Reply With Quote
Old 08-17-2016, 10:09 AM   #6
gconcepcion
Member
 
Location: Menlo Park

Join Date: Dec 2010
Posts: 68
Default

Quote:
Originally Posted by GenoMax View Post
Not to high-jack this thread but Is LSF now fully supported for SMRTanalysis?
Sorry, I'm not sure exactly what you mean by "fully" supported.

The last time I tested and used SMRTanalysis 2.3.0 with LSF was sometime late last year to run some basic integration tests, and everything worked fine.

According to the 2.3.0 install guide, it is "supported".

http://www.pacb.com/wp-content/uploa...ion-v2.3.0.pdf

On that note, If there is a key feature of LSF that we are not supporting in SMRTAnalysis, we're probably not aware of it, and I wouldn't hold your breath waiting for support.

Development efforts are currently focused on SMRTAnalysis's successor SMRTLink.
gconcepcion is offline   Reply With Quote
Old 08-17-2016, 10:23 AM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

Last time we had tried to get SMRTportal working with LSF things did not go too far (but that was 2+ years ago).

We switched to SGE/different cluster at that point.
GenoMax is offline   Reply With Quote
Old 08-22-2016, 12:46 PM   #8
rachita
Junior Member
 
Location: Boston

Join Date: Mar 2013
Posts: 7
Default

Thanks for your reply. I was able to change the settings to run HGAP assemble. The bam generated two untigs. When I map the first contig back to human reference, I get a 8Kb region mapping to "Cloning vector pBACe3.6, complete sequence". I followed the "https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP-Whitelisting-Tutorial" and created a list of reads without vector. Then I added the following tags to my settings.xml in the filtering module:

Quote:
<param name="whiteList" label="Read IDs to whitelist">
<value>PATH/whitelist.txt</value>
</param>
Is this not the correct way to do this ? After changing the settings.xml I still used the portal to save and run the job. Should it be run this setup.py script.

Last edited by rachita; 08-22-2016 at 12:58 PM.
rachita is offline   Reply With Quote
Old 08-22-2016, 01:17 PM   #9
gconcepcion
Member
 
Location: Menlo Park

Join Date: Dec 2010
Posts: 68
Default

Quote:
Originally Posted by rachita View Post
Thanks for your reply. I was able to change the settings to run HGAP assemble. The bam generated two untigs. When I map the first contig back to human reference, I get a 8Kb region mapping to "Cloning vector pBACe3.6, complete sequence". I followed the "https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP-Whitelisting-Tutorial" and created a list of reads without vector. Then I added the following tags to my settings.xml in the filtering module:



Is this not the correct way to do this ? After changing the settings.xml I still used the portal to save and run the job. Should it be run this setup.py script.

How long are the two contigs? Is one roughly 4Kb? The SMRTCell internal control is not removed by the HGAP assembly process, and may be what you are seeing - see this thread:
http://seqanswers.com/forums/showthread.php?t=40801

Is the whitelist not working for you? You didn't post any error messages. All you need to do is add the section that you mentioned, and make sure the list is one read Id per line. I'm assuming the "PATH/whitelist.txt" you referenced is actually a fully resolved path, and not exactly what you copy and pasted above as that will obviously not work.
gconcepcion is offline   Reply With Quote
Reply

Tags
hgap.3, smrt2.3

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:59 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO