Seqanswers Leaderboard Ad

**Daehwan** · 12-03-2010, 09:33 AM

-p is the number of threads you want to run on each node. I think you just need to run n lanes on n nodes. However, if your node has a multicore processor, you can specify -p "# of cores" in each processor to make your test faster. Otherwise, using -p is not helpful.

**NM_010117** · 12-03-2010, 10:09 AM

That was what I figured, and is what I have started doing with other runs. It just still feels like it's taking to long to complete a run. But I guess this is just what happens when dealing with a massive amount of biological data.

**rpauly** · 10-31-2016, 06:44 AM

Originally posted by NM_010117 View Post

Hello, all.

I'm currently having issues while running tophat on a cluster with a pbs scheduler. From what I can tell, it's taking far too long to perform the analysis (it hasn't even completed yet - I've had to keep reallocating time on the cluster since it's taking longer than I had anticipated - currently I'm sitting at 100+ hours run-time). The data is paired-end GAIIx data - each lane comes in at about 9.7GB for the fastq file.

I guess my question is, when running tophat, do I set the -p flag to the number of nodes I have allocated, or only the number of cores one each node? Or is it a combination of the two: nodes*cores?

Sorry if this seems like a silly question, but google didn't return anything helpful from here (some information was close though), so I thought I'd give it a shot.

I am facing a similar issue...were you able to resolve it?
~Thanks!

**GenoMax** · 10-31-2016, 06:50 AM

Originally posted by rpauly View Post

I am facing a similar issue...were you able to resolve it?
~Thanks!

It is normal for a tophat run to take a few hours depending on the amount of your input data. You just need to be patient.

If you are running on a cluster it is important to keep the threads for an individual job confined to one physical node (depending on the scheduler your cluster uses, you would need to provide the right options).

**rpauly** · 10-31-2016, 06:54 AM

Originally posted by GenoMax View Post

It is normal for a tophat run to take a few hours depending on the amount of your input data. You just need to be patient.

If you are running on a cluster it is important to keep the threads for an individual job confined to one physical node (depending on the scheduler your cluster uses, you would need to provide the right options).

Thank you for the quick reply!
I am using the PBS cluster and analyzing the 101bps paired end RNA-seq illumina data, with the option of -p 12 on tophat and with ncpus=24:mem=16gb on the cluster. Is there another way I could optimize the process? I gave it a walltime of 30 hours, which does not seem to be sufficent, so I am going to increase it to 72 hrs.

**GenoMax** · 10-31-2016, 07:13 AM

What is the size of your input data and what genome are you aligning against?

Did the job get killed after 30 h (which should be enough unless you have a billion read dataset, which you may want to split and start multiple tophat jobs, if that is the case)?

**rpauly** · 10-31-2016, 07:23 AM

Originally posted by GenoMax View Post

What is the size of your input data and what genome are you aligning against?

Did the job get killed after 30 h (which should be enough unless you have a billion read dataset, which you may want to split and start multiple tophat jobs, if that is the case).

The fastq files are close to 10GB, I am aligning it to the older human reference hg19.
Yes, the job got killed after 30hrs and it has close to 3 million reads. I also had previously dedicated the process to 1 node. I have 10 samples so spliting the file would be hard...maybe I should give STAR a shot.

**GenoMax** · 10-31-2016, 07:29 AM

10G is not that big. Something here does not sound right. Do you have an idea of how many reads did the job go through before it got killed? Did you get a partial accepted_hits.bam file?

If you want to post your PBS script we can take a look at how you submitted the job (remove any identifying information such as file paths/names).

**rpauly** · 10-31-2016, 07:36 AM

Originally posted by GenoMax View Post

10G is not that big. Something here does not sound right. Do you have an idea of how many reads did the job go through before it got killed? Did you get a partial accepted_hits.bam file?

If you want to post your PBS script we can take a look at how you submitted the job (remove any identifying information such as file paths/names).

No I did not get a partial accepted_hits.bam.file,but it did give me an error of exceeded walltime. Please see my PBS script below:
#!/bin/bash
#PBS -N tophat_cms23055_2624-40399001
#PBS -l walltime=30:00:00
#PBS -l select=1:ncpus=24:mem=16gb

#PBS -o /home/rpauly/2624-40399001/cms23055.log
#PBS -o /home/rpauly/2624-40399001/cms23055.err

module load samtools/0.1.19
module load bowtie/1.0.1

cd /home/rpauly

/home/rpauly/tophat-2.1.1.Linux_x86_64/tophat --bowtie1 --fusion-search --no-coverage-search -o /scratch1/rpauly/2624-40399001/cms23055_tophat_output -p 20 -G /home/rpauly/refFlat_Oct_2016.gtf /home/rpauly/tophat-2.1.1.Linux_x86_64/bowtie2-2.2.9/genomes/hg19 /home/rpauly/2624-40399001/cms23055_S28_L006_R1_001.fastq.gz /home/rpauly/2624-40399001/cms23055_S28_L006_R2_001.fastq.gz >/home/rpauly/2624-40399001/cms23055_error

~Thanks!

**GenoMax** · 10-31-2016, 07:40 AM

Did you look at the log and err files to see if they had anything related?

Is "home/rpauly/tophat-2.1.1.Linux_x86_64/bowtie2-2.2.9/genomes/hg19" the the basename for your bowtie1 index files?

**rpauly** · 10-31-2016, 07:48 AM

Originally posted by GenoMax View Post

Did you look at the log and err files to see if they had anything related?

Is "home/rpauly/tophat-2.1.1.Linux_x86_64/bowtie2-2.2.9/genomes/hg19" the the basename for your bowtie1 index files?

There was no error in the in the log or err file, it just simply stopped running.
I have attached a screenshot of the bowtie1 index files.

Thanks again!

Attached Files

bowtie.jpg (32.0 KB, 64 views)

**GenoMax** · 10-31-2016, 07:51 AM

Those appear to be bowtie2 genome index files (if I follow the directory names at the top of the page). They will not work with bowtie1 (those are different), which you are specifying in your tophat command. Is there a reason you are using --bowtie1?

**rpauly** · 10-31-2016, 08:08 AM

Originally posted by GenoMax View Post

Those appear to be bowtie2 genome index files (if I follow the directory names at the top of the page). They will not work with bowtie1 (those are different), which you are specifying in your tophat command. Is there a reason you are using --bowtie1?

So that was the problem? But I did not get any error messages indicating this!
The only reason I was using bowtie1 was because I read it does better with fusion detection than bowtie2.

Also I just downloaded the hg19_c.ebwt.zip file (which I assume are bowtie1 index files?) and added it to the same folder. I will try rerunning the process and see if that helps.
~Thanks !

**GenoMax** · 10-31-2016, 08:16 AM

Hopefully getting the bowtie1 indexes (not sure where you got them from but there should be multiple files) will do the trick (I would put them in a different directory and change the name in your TopHat command to avoid any further "issues").

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 26 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 28 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Running tophat on a cluster

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News