SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Pacific Biosciences (http://seqanswers.com/forums/forumdisplay.php?f=39)
-   -   Can use multiple CPUs or thread to speed up the running rate for tofu_wrap.py? (http://seqanswers.com/forums/showthread.php?t=70765)

lingling huang 08-08-2016 01:51 AM

Can use multiple CPUs or thread to speed up the running rate for tofu_wrap.py?
 
Hi, there. I have 7 smrtcell data. It's really slow to run tofu_wrap.py, I want to know to weather I can use multiple CPUs or thread to speed up the running rate. Thank you for any tips!

Magdoll 08-08-2016 09:03 AM

Hi,

Short answer is yes. There are multiple ways to hack tofu_wrap.py -- will require additional monitoring of the parallel jobs. But I first need to ask what parameters did you use to call tofu_wrap.py? What are the size bins (aka what are the subdirectories in clusterOut/)? Do you have an SGE cluster? Do you have multiple nodes from which you can run parallel pbtranscript.py cluster jobs?

Also, please consider joining the Iso-Seq google group: https://groups.google.com/forum/#!forum/smrt_isoseq

Since tofu_wrap.py is on the cutting edge (it's not officially supported in SMRTAnalysis 2.x but is on the agenda for SMRTAnalysis 3.x), the google group is better suited :)

--Liz

lingling huang 08-08-2016 07:49 PM

Quote:

Originally Posted by Magdoll (Post 197639)
Hi,

Short answer is yes. There are multiple ways to hack tofu_wrap.py -- will require additional monitoring of the parallel jobs. But I first need to ask what parameters did you use to call tofu_wrap.py? What are the size bins (aka what are the subdirectories in clusterOut/)? Do you have an SGE cluster? Do you have multiple nodes from which you can run parallel pbtranscript.py cluster jobs?

Also, please consider joining the Iso-Seq google group: https://groups.google.com/forum/#!forum/smrt_isoseq

Since tofu_wrap.py is on the cutting edge (it's not officially supported in SMRTAnalysis 2.x but is on the agenda for SMRTAnalysis 3.x), the google group is better suited :)

--Liz

Code:

tofu_wrap.py --nfl_fa isoseq_nfl.fasta --ccs_fofn reads_of_insert.fofn --bas_fofn input.fofn -d clusterOut --quiver --bin_manual "(0,2,4,6,8,9,10,11,12,13,15,17,19,20,23)" --gmap_db /zs32/data-analysis/liucy_group/llhuang/Reflib/gmapdb --gmap_name gmapdb_h19 --output_seqid_prefix human isoseq_flnc.fasta final.consensus.fa
The lab's sever is high-powered single-node computer and
has no an SGE cluster. Can I use multiple CPUs to run command?

Magdoll 08-09-2016 02:11 PM

Without SGE, it will be slow anyway.

But if you think that single node can handle it, one way is to run multiple instances of `pbtranscript.py cluster` on the different bins.

ex: tofu_wrap.py always creates bins 0to1kb_part0, 1to2kb_part0, etc

You can terminate tofu_wrap and keep the bins as they are. Then separately in each bin call a separate instance of cluster:

pbtranscript.py cluster isoseq_flnc.fasta final.consensus.fa \
--nfl_fa isoseq_nfl.fasta -d cluster --ccs_fofn reads_of_insert.fofn \
--bas_fofn input.fofn --quiver --use_sge \
--max_sge_jobs 40 --unique_id 300 --blasr_nproc 24 --quiver_nproc 8

(for you, you would remove the --use_sge and --max_sge_jobs option)

(see cluster tutorial here: https://github.com/PacificBioscience...-and-Quiver%29)

My guess is this would make it a bit faster but still relatively slow since everything is running in serial instead of parallel, but may be better than nothing...


All times are GMT -8. The time now is 09:17 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.