Seqanswers Leaderboard Ad

**Ohad** · 05-10-2014, 02:33 AM

I would give a look at that file - right_kept_reads_seg2
Check if it was created at all.

**GenoMax** · 05-10-2014, 05:31 AM

One should never run regular programs as root.

Are you running bio-linux as a virtual machine or is that what is running natively on server hardware?

**CompBio** · 05-10-2014, 10:58 PM

Thanks for your fast responses. I agree that running as root isn't a great idea, but I'm grasping at straws a bit here. As it turns out, I haven't had to try running as root yet (see below). To answer Ohad's question, it did create a right_kept_reads_seg2 file.

We started the same job twice using the command:

Code:

tophat -a 10 -g 1 --microexon-search -p 1 --segment-length=33 --segment-mismatches=2 --mate-inner-dist 31 --mate-std-dev 107 -o test_dir --solexa-quals --library-type=fr-firststrand Rn5 test_dir/sample1_R1_1.fastq test_dir/sample1_R1_2.fastq

Both times it halted early at the same step, but with no indication of a problem.

Next I simply restarted the alignments using the '--restart' option and this time they ran to completion?! This was a surprise.

Unfortunately we've got about 30 more FASTQ pairs we want to run and I'm not much closer to an answer than I was before. Running tophat twice on each pair doesn't seem like a good solution. I'm concerned there is something systemic that is not set up properly and that we'll get poor alignment fidelity as a result.

**Ohad** · 05-11-2014, 02:03 AM

You need to take a look at your log file. I think run.log
Also , what do you mean by "halts" ? are you sure it's not running ? does it crash at all ?
You are using -p 1 , why ? you have many processors...

**GenoMax** · 05-11-2014, 10:07 AM

@CompBio: Why are you using "--solexa-quals"? If your data is of recent vintage it is almost certainly in sanger fastq format.

Has this install of TopHat been validated (used) before? If this is the first time you are using the install then do yourself a favor and run the test data that is available (http://tophat.cbcb.umd.edu/tutorial.shtml: see "test the install") with Tophat to make sure there is no problem with the install.

**CompBio** · 05-11-2014, 08:58 PM

Thanks again for the quick responses. We are currently testing our pipline, including timing. As we have 24 processors (and plenty of memory/disk space), we anticipate running 24 alignments concurrently. Hence I use '-p 1' to restrict it to a single processor.

As for the quality scores, our options (tophat 2.0.9) are as follows:

Code:

    --solexa-quals
    --solexa1.3-quals          (same as phred64-quals)
    --phred64-quals            (same as solexa1.3-quals)
    -Q/--quals
    --integer-quals

Our reads are Illumina 1.9 (Sanger, verified using FASTQC, as you rightly guessed). As this is a recent vintage I was under the impression that solexa1.3/phred64 was correct, but when I tried it, tophat gave me errors. We bypassed that particular error by switching to solexa-quals; unfortunately the Tophat documentation doesn't mention Sanger at all.

**GenoMax** · 05-12-2014, 04:12 AM

Sanger quality is phred+33 (http://en.wikipedia.org/wiki/FASTQ_format#Encoding). No need to explicitly specify quality since your reads are already in sanger format (phred33 is TopHat default).

You have not addressed a couple of previous questions:

Have you tried the test data set to verify if the install of tophat is working as expected (or have you used it for other analysis before this)?

Are you running biolinux in a virtual machines or as the main OS on this hardware?

**CompBio** · 05-13-2014, 09:59 PM

Originally posted by GenoMax View Post

Sanger quality is phred+33 (http://en.wikipedia.org/wiki/FASTQ_format#Encoding). No need to explicitly specify quality since your reads are already in sanger format (phred33 is TopHat default).

You have not addressed a couple of previous questions:

Have you tried the test data set to verify if the install of tophat is working as expected (or have you used it for other analysis before this)?

Are you running biolinux in a virtual machines or as the main OS on this hardware?

Thanks re: tophat defaults, didn't see that in the documentation.

To the other questions:

When I say the process 'halts' I mean we use 'ps' or 'top' and see no evidence of any tophat-related scripts running. CPU and disk usage are nominal. The last modification time for tophat-related files is not within the past 2-3 hours. And as I mentioned before, the log file shows no progress after "Mapping right_kept_reads_seg2..."

I have run tophat on a tiny subset of the data (100,000 records) and it worked just fine. I have also run it on another data set without problems. The key differences seem to be the data sets and the user/login name. Since 'tophat --restart' worked, I don't think the data sets are the issue.

Bio-Linux (Ubuntu 12.04) is indeed the main OS for this hardware.

Update: I did find a possible culprit: typically we run our wrapper script in background, redirect stderr and stdout to a file, and log off. Thus I tried using 'nohup' to bypass SIGHUPs and this time it ran to completion.

I'm hopeful the solution really is that simple, but I'm not convinced because we do not set 'huponexit' for any users, either in the system-wide bashrc file or in user-specific files. Thus as I understand it, SIGHUP signals should not be an issue. Perhaps an experienced Linux sysadmin can shed some light.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Tophat halts for no apparent reason

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News