![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
cuffdiff Error: cannot open reference GTF file –L for reading | crh | Bioinformatics | 4 | 04-14-2014 02:52 PM |
tophat2 fail while running long_spanning_reads with out_of_range error. | julio.fernandez.banet | RNA Sequencing | 1 | 08-09-2013 09:29 AM |
TopHat2 Failing during long_spanning_reads | dstorey | Bioinformatics | 0 | 06-05-2013 11:12 AM |
Tophat2 Error running 'long_spanning_reads': | dvanic | Bioinformatics | 16 | 04-26-2013 01:56 PM |
Rsamtools Bam file reading error | dab32 | Bioinformatics | 0 | 11-07-2011 04:21 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: NYC Join Date: Aug 2013
Posts: 8
|
![]()
This is what I am running:
TopHat run (v2.0.9), Bowtie version: 2.1.0.0, Samtools version: 0.1.19.0 This is what I entered: Code:
tophat2 -p 24 -G /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Annotation/Genes/genes.gtf -o PMA_0hr_TotalRNA /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Sequence/Bowtie2index/genome /Users/mascano/Sequence_Analyses/DATA/THP1_timecourse/Act_1_ATCACG_L002_R1_001.fastq Code:
[2013-08-20 12:28:14] Beginning TopHat run (v2.0.9) ----------------------------------------------- [2013-08-20 12:28:14] Checking for Bowtie Bowtie version: 2.1.0.0 [2013-08-20 12:28:14] Checking for Samtools Samtools version: 0.1.19.0 [2013-08-20 12:28:14] Checking for Bowtie index files (genome).. [2013-08-20 12:28:14] Checking for reference FASTA file [2013-08-20 12:28:14] Generating SAM header for /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Sequence/Bowtie2index/genome format: fastq quality scale: phred33 (default) [2013-08-20 12:28:47] Reading known junctions from GTF file [2013-08-20 12:28:51] Preparing reads left reads: min. length=101, max. length=101, 26861915 kept reads (78856 discarded) [2013-08-20 12:38:22] Building transcriptome data files.. [2013-08-20 12:39:34] Building Bowtie index from genes.fa [2013-08-20 12:52:18] Mapping left_kept_reads to transcriptome genes with Bowtie2 [2013-08-20 13:03:00] Resuming TopHat pipeline with unmapped reads [2013-08-20 13:03:00] Mapping left_kept_reads.m2g_um to genome genome with Bowtie2 [2013-08-20 13:36:56] Mapping left_kept_reads.m2g_um_seg1 to genome genome with Bowtie2 (1/4) [2013-08-20 13:43:24] Mapping left_kept_reads.m2g_um_seg2 to genome genome with Bowtie2 (2/4) [2013-08-20 13:51:47] Mapping left_kept_reads.m2g_um_seg3 to genome genome with Bowtie2 (3/4) [2013-08-20 13:59:15] Mapping left_kept_reads.m2g_um_seg4 to genome genome with Bowtie2 (4/4) [2013-08-20 14:10:43] Searching for junctions via segment mapping [2013-08-20 14:15:54] Retrieving sequences for splices [2013-08-20 14:18:21] Indexing splices [2013-08-20 14:19:02] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/4) [2013-08-20 14:20:37] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/4) [2013-08-20 14:22:42] Mapping left_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/4) [2013-08-20 14:24:24] Mapping left_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4) [2013-08-20 14:26:36] Joining segment hits [FAILED] Error running 'long_spanning_reads':Error: cannot open PMA_0hr_TotalRNA/tmp/left_kept_reads.m2g_um.bam for reading That file is ~1GB (and it's permissions are me:read and write, staff:read only, everyone:read only) Help is appreciated ![]() |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: NYC Join Date: Aug 2013
Posts: 8
|
![]()
A bit of an update
Running 16 threads, instead of 24, allowed tophat to complete the run: Code:
tophat2 -p 16 -G /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Annotation/Genes/genes.gtf -o PMA_0hr_TotalRNA /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Sequence/Bowtie2index/genome /Users/mascano/Sequence_Analyses/DATA/THP1_timecourse/Act_1_ATCACG_L002_R1_001.fastq I have a 2 x 2.4Ghz 6-core Xeon - so that's 12-core physical plus 12 virtual with hyperthreading, which theoretically means I can assign '-p 24' My guess is memory usage, but not entirely clear. I have 64GB RAM (which is the maximum allowed, until Mavericks OSX comes out). Any advice on how to assign 24 threads without TopHat2 failing? Would calling the '-mm' argument work? |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
Why is using all 24 threads so important?
Have you considered the possibility that the storage subsystem you have on this machine is probably a bottleneck (look in the activity monitor to see if you are maxing out the throughput). So rather than having 24 cores in some sort of iowait round robin state it may be better to start with a smaller number of cores and experiment to find the optimal performance balance. |
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: NYC Join Date: Aug 2013
Posts: 8
|
![]()
Thank you for the reply and suggestion. I had not considered that the HDD io may be the bottleneck; I imagine an SSD may improve it. However, in looking at the disk activity, I haven't seen it peak anywhere near 6 Gb/sec (or 768MB) which should be the bandwidth of my HDD (using the ICH10 bridge), in a successful run (using 16 threads).
I doubt it will skyrocket to that throughput ceiling with all 24 threads, no? |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
There is theoretical throughput and real life performance. Since tophat suite is developed on Mac it should be optimized for OS X.
If you are interested you could look at specific application level stats by following the suggestions in this post: http://blog.yerkanian.com/2011/10/17...io-on-macos-x/ Check using the following to see CPU level performance for various processes in a terminal window (adjust parameters as needed by looking at man entry for top). Code:
$ top -n10 -u Last edited by GenoMax; 08-21-2013 at 11:03 AM. |
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: NYC Join Date: Aug 2013
Posts: 8
|
![]()
Under -p 16 conditions:
Memory usage peaked at 4 GB for long_spanning process. But IO for HDD did not exceed 20MB/sec (read or write) total. I used Code:
sudo iotop -C 5 I was monitoring during these log events, which is when it would fail if -p 24: Code:
[2013-08-21 14:59:08] Mapping left_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4) [2013-08-21 15:01:19] Joining segment hits [2013-08-21 15:05:52] Reporting output tracks |
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
So we know that 16 threads work but not 24. OS X may need some cores to keep essential parts of the OS running. Next thing to try would be to increment 16 towards 24 and see at what point the process fails.
|
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: NYC Join Date: Aug 2013
Posts: 8
|
![]()
So I can move to 22 cores either as a single run or parallel process runs that total up to 22. That said, I'm not being shy about using the computer simultaneously for other applications (MsOffice, Chrome, Mail, etc.) so, at least for my configuration, 22 threads is more than satisfactory.
|
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
Since you did the experiment...
How much time (if any) is saved by going from 16 to 22 cores for the same job? A rough estimate is fine if you did not time the runs. |
![]() |
![]() |
![]() |
#10 |
Junior Member
Location: NYC Join Date: Aug 2013
Posts: 8
|
![]()
Not using the exact same fastq, but of similar size (~30Mio):
I went from ~2hrs at 16 cores, to 1.5hrs at 22 cores, to 2.5 hrs at 11 cores. Although, when I run two shelled processes at 11 cores each, one of them is consistently around 2.5 and the other one 3.5 hrs. I think I'll have to just wing this and figure out the best balance of the number of parallel processes vs the number of cores per process. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|