![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
reason for low mapping rate?? | miaom | RNA Sequencing | 3 | 05-10-2014 08:25 AM |
Tophat fusion crashed for unknown reason | ymc | Bioinformatics | 0 | 06-21-2013 04:18 AM |
any reason to compile GATK from source? | Kotoro | Bioinformatics | 2 | 04-22-2012 04:36 PM |
Apparent duplication levels incongruence between bismark and fastqc with BS-Seq data | gcarbajosa | Bioinformatics | 2 | 12-13-2011 09:43 AM |
Getting started 454, academic center has no apparent core facility | dkoelleseattle | General | 1 | 09-28-2009 07:20 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Bristol, UK Join Date: Aug 2009
Posts: 26
|
![]()
We're trying to align a set of stranded, paired-end reads to the rat genome with Tophat. Tophat runs through the early stages just fine, but then halts during segment search -- no errors, no warnings, nothing. This is an example of our output:
Code:
[2014-05-07 13:22:44] Preparing reads left reads: min. length=101, max. length=101, 27329923 kept reads (10347 discarded) right reads: min. length=101, max. length=101, 27195050 kept reads (145220 discarded) [2014-05-07 13:35:52] Mapping left_kept_reads to genome Rn5 with Bowtie2 [2014-05-07 14:34:36] Mapping left_kept_reads_seg1 to genome Rn5 with Bowtie2 (1/3) [2014-05-07 15:06:40] Mapping left_kept_reads_seg2 to genome Rn5 with Bowtie2 (2/3) [2014-05-07 15:43:09] Mapping left_kept_reads_seg3 to genome Rn5 with Bowtie2 (3/3) [2014-05-07 16:34:38] Mapping right_kept_reads to genome Rn5 with Bowtie2 [2014-05-07 17:36:40] Mapping right_kept_reads_seg1 to genome Rn5 with Bowtie2 (1/3) [2014-05-07 18:11:10] Mapping right_kept_reads_seg2 to genome Rn5 with Bowtie2 (2/3) I'm going to try several avenues: I'll run as root in case file permissions are an issue, I'll try running it on a tiny subset of the data to see if it's related to the reads. Otherwise I'm a bit stumped. I should mention we're using Python 2.7.3 and running under Bio-Linux (Ubuntu 12.04). Any ideas greatly appreciated! |
![]() |
![]() |
![]() |
#2 |
Member
Location: Israel TA Join Date: Jul 2013
Posts: 28
|
![]()
I would give a look at that file - right_kept_reads_seg2
Check if it was created at all. |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
One should never run regular programs as root.
Are you running bio-linux as a virtual machine or is that what is running natively on server hardware? |
![]() |
![]() |
![]() |
#4 |
Member
Location: Bristol, UK Join Date: Aug 2009
Posts: 26
|
![]()
Thanks for your fast responses. I agree that running as root isn't a great idea, but I'm grasping at straws a bit here. As it turns out, I haven't had to try running as root yet (see below). To answer Ohad's question, it did create a right_kept_reads_seg2 file.
We started the same job twice using the command: Code:
tophat -a 10 -g 1 --microexon-search -p 1 --segment-length=33 --segment-mismatches=2 --mate-inner-dist 31 --mate-std-dev 107 -o test_dir --solexa-quals --library-type=fr-firststrand Rn5 test_dir/sample1_R1_1.fastq test_dir/sample1_R1_2.fastq Next I simply restarted the alignments using the '--restart' option and this time they ran to completion?! This was a surprise. Unfortunately we've got about 30 more FASTQ pairs we want to run and I'm not much closer to an answer than I was before. Running tophat twice on each pair doesn't seem like a good solution. I'm concerned there is something systemic that is not set up properly and that we'll get poor alignment fidelity as a result. |
![]() |
![]() |
![]() |
#5 |
Member
Location: Israel TA Join Date: Jul 2013
Posts: 28
|
![]()
You need to take a look at your log file. I think run.log
Also , what do you mean by "halts" ? are you sure it's not running ? does it crash at all ? You are using -p 1 , why ? you have many processors... |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
@CompBio: Why are you using "--solexa-quals"? If your data is of recent vintage it is almost certainly in sanger fastq format.
Has this install of TopHat been validated (used) before? If this is the first time you are using the install then do yourself a favor and run the test data that is available (http://tophat.cbcb.umd.edu/tutorial.shtml: see "test the install") with Tophat to make sure there is no problem with the install. |
![]() |
![]() |
![]() |
#7 |
Member
Location: Bristol, UK Join Date: Aug 2009
Posts: 26
|
![]()
Thanks again for the quick responses. We are currently testing our pipline, including timing. As we have 24 processors (and plenty of memory/disk space), we anticipate running 24 alignments concurrently. Hence I use '-p 1' to restrict it to a single processor.
As for the quality scores, our options (tophat 2.0.9) are as follows: Code:
--solexa-quals --solexa1.3-quals (same as phred64-quals) --phred64-quals (same as solexa1.3-quals) -Q/--quals --integer-quals |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
Sanger quality is phred+33 (http://en.wikipedia.org/wiki/FASTQ_format#Encoding). No need to explicitly specify quality since your reads are already in sanger format (phred33 is TopHat default).
You have not addressed a couple of previous questions: Have you tried the test data set to verify if the install of tophat is working as expected (or have you used it for other analysis before this)? Are you running biolinux in a virtual machines or as the main OS on this hardware? |
![]() |
![]() |
![]() |
#9 | |
Member
Location: Bristol, UK Join Date: Aug 2009
Posts: 26
|
![]() Quote:
To the other questions: When I say the process 'halts' I mean we use 'ps' or 'top' and see no evidence of any tophat-related scripts running. CPU and disk usage are nominal. The last modification time for tophat-related files is not within the past 2-3 hours. And as I mentioned before, the log file shows no progress after "Mapping right_kept_reads_seg2..." I have run tophat on a tiny subset of the data (100,000 records) and it worked just fine. I have also run it on another data set without problems. The key differences seem to be the data sets and the user/login name. Since 'tophat --restart' worked, I don't think the data sets are the issue. Bio-Linux (Ubuntu 12.04) is indeed the main OS for this hardware. Update: I did find a possible culprit: typically we run our wrapper script in background, redirect stderr and stdout to a file, and log off. Thus I tried using 'nohup' to bypass SIGHUPs and this time it ran to completion. I'm hopeful the solution really is that simple, but I'm not convinced because we do not set 'huponexit' for any users, either in the system-wide bashrc file or in user-specific files. Thus as I understand it, SIGHUP signals should not be an issue. Perhaps an experienced Linux sysadmin can shed some light. |
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|