SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Hello - I use to think I was good with a computer Jon_Keats Introductions 83 08-26-2015 05:16 PM
run cufflink, cuffcompare and cuffdiff workflow Robin RNA Sequencing 5 07-16-2012 04:21 PM
v3: Effect of high cluster densities on cluster PF and %Q30 pmiguel Illumina/Solexa 3 10-05-2011 06:36 AM
cuffdiff: same run but different FPKM, your thoughts? tdm Bioinformatics 0 01-31-2011 01:58 PM
How many hours does it take to run cuffdiff cskey General 0 04-29-2010 07:34 PM

Reply
 
Thread Tools
Old 07-12-2012, 12:23 AM   #1
satp
Member
 
Location: GuangZhou

Join Date: Jun 2008
Posts: 13
Default Run cuffdiff in a computer cluster

Hi all,

I tried to use cuffdiff to deal with many assembled RNA-Seq data, but it fail due to the insufficient memory problem. The cuffdiff was run on a cluster, but someone tell me that it can only use the memory resource of one node, although the total memory of the cluster is sufficient for the cuffdiff. I'm not familiar with parallel computation, I want to know is there any way to make cuffdiff can use the memory resources of different nodes in a cluster?

Thanks for any help.
Jian-You Liao
satp is offline   Reply With Quote
Old 07-12-2012, 12:20 PM   #2
mbblack
Senior Member
 
Location: Research Triangle Park, NC

Join Date: Aug 2009
Posts: 245
Default

Quote:
Originally Posted by satp View Post
Hi all,

I tried to use cuffdiff to deal with many assembled RNA-Seq data, but it fail due to the insufficient memory problem. The cuffdiff was run on a cluster, but someone tell me that it can only use the memory resource of one node, although the total memory of the cluster is sufficient for the cuffdiff. I'm not familiar with parallel computation, I want to know is there any way to make cuffdiff can use the memory resources of different nodes in a cluster?

Thanks for any help.
Jian-You Liao
Cannot run it on multiple nodes for the simple reason that the only way to get accurate FDR values is to analyze all the genes at once. So if you split it out across multiple nodes, you'd never get accurate estimates of significance.

The only thing you can do is increase the available memory on that one node (and use the -p switch to use multithreading, if that node has multiple compute cores).

One way in Linux to do this is use a temporary disc file to expand swap memory space. You need root to do this, so you may need to get your cluster admin to help.

----example commands to make a 24Gb file and append it to swap----------
# dd if=/dev/zero of=/directory_name/tmpswap_file_name bs=1024 count=25165824
# mkswap -f /directory_name/tmpswap_file_name
# swapon /directory_name/tmpswap_file_name

once no longer needed

# swapoff /directory_name/tmpswap_file_name
-------------------------------------------------------------------------
__________________
Michael Black, Ph.D.
ScitoVation LLC. RTP, N.C.
mbblack is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:48 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO