SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
cuffdiff Error: cannot open reference GTF file L for reading crh Bioinformatics 4 04-14-2014 02:52 PM
tophat2 fail while running long_spanning_reads with out_of_range error. julio.fernandez.banet RNA Sequencing 1 08-09-2013 09:29 AM
TopHat2 Failing during long_spanning_reads dstorey Bioinformatics 0 06-05-2013 11:12 AM
Tophat2 Error running 'long_spanning_reads': dvanic Bioinformatics 16 04-26-2013 01:56 PM
Rsamtools Bam file reading error dab32 Bioinformatics 0 11-07-2011 04:21 AM

Reply
 
Thread Tools
Old 08-20-2013, 01:52 PM   #1
mascano
Junior Member
 
Location: NYC

Join Date: Aug 2013
Posts: 8
Default Tophat2 long_spanning_reads error -- cannot open xx.bam for reading

This is what I am running:
TopHat run (v2.0.9), Bowtie version: 2.1.0.0, Samtools version: 0.1.19.0

This is what I entered:
Code:
tophat2 -p 24 -G /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Annotation/Genes/genes.gtf -o PMA_0hr_TotalRNA /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Sequence/Bowtie2index/genome /Users/mascano/Sequence_Analyses/DATA/THP1_timecourse/Act_1_ATCACG_L002_R1_001.fastq
And finally, here is my tophat.log
Code:
[2013-08-20 12:28:14] Beginning TopHat run (v2.0.9)
-----------------------------------------------
[2013-08-20 12:28:14] Checking for Bowtie
		  Bowtie version:	 2.1.0.0
[2013-08-20 12:28:14] Checking for Samtools
		Samtools version:	 0.1.19.0
[2013-08-20 12:28:14] Checking for Bowtie index files (genome)..
[2013-08-20 12:28:14] Checking for reference FASTA file
[2013-08-20 12:28:14] Generating SAM header for /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Sequence/Bowtie2index/genome
	format:		 fastq
	quality scale:	 phred33 (default)
[2013-08-20 12:28:47] Reading known junctions from GTF file
[2013-08-20 12:28:51] Preparing reads
	 left reads: min. length=101, max. length=101, 26861915 kept reads (78856 discarded)
[2013-08-20 12:38:22] Building transcriptome data files..
[2013-08-20 12:39:34] Building Bowtie index from genes.fa
[2013-08-20 12:52:18] Mapping left_kept_reads to transcriptome genes with Bowtie2 
[2013-08-20 13:03:00] Resuming TopHat pipeline with unmapped reads
[2013-08-20 13:03:00] Mapping left_kept_reads.m2g_um to genome genome with Bowtie2 
[2013-08-20 13:36:56] Mapping left_kept_reads.m2g_um_seg1 to genome genome with Bowtie2 (1/4)
[2013-08-20 13:43:24] Mapping left_kept_reads.m2g_um_seg2 to genome genome with Bowtie2 (2/4)
[2013-08-20 13:51:47] Mapping left_kept_reads.m2g_um_seg3 to genome genome with Bowtie2 (3/4)
[2013-08-20 13:59:15] Mapping left_kept_reads.m2g_um_seg4 to genome genome with Bowtie2 (4/4)
[2013-08-20 14:10:43] Searching for junctions via segment mapping
[2013-08-20 14:15:54] Retrieving sequences for splices
[2013-08-20 14:18:21] Indexing splices
[2013-08-20 14:19:02] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/4)
[2013-08-20 14:20:37] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/4)
[2013-08-20 14:22:42] Mapping left_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/4)
[2013-08-20 14:24:24] Mapping left_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4)
[2013-08-20 14:26:36] Joining segment hits
	[FAILED]
Error running 'long_spanning_reads':Error: cannot open PMA_0hr_TotalRNA/tmp/left_kept_reads.m2g_um.bam for reading
The output directory is created, as are the subdirectories. The tmp directory contains plenty of files, including "left_kept_reads.m2g_um.bam"
That file is ~1GB (and it's permissions are me:read and write, staff:read only, everyone:read only)

Help is appreciated
mascano is offline   Reply With Quote
Old 08-21-2013, 08:44 AM   #2
mascano
Junior Member
 
Location: NYC

Join Date: Aug 2013
Posts: 8
Default

A bit of an update

Running 16 threads, instead of 24, allowed tophat to complete the run:
Code:
tophat2 -p 16 -G /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Annotation/Genes/genes.gtf -o PMA_0hr_TotalRNA /Users/mascano/Sequence_Analyses/Reference/Homo_sapiens_NCBI_build37.2/Homo_sapiens/NCBI/build37.2/Sequence/Bowtie2index/genome /Users/mascano/Sequence_Analyses/DATA/THP1_timecourse/Act_1_ATCACG_L002_R1_001.fastq

I have a 2 x 2.4Ghz 6-core Xeon - so that's 12-core physical plus 12 virtual with hyperthreading, which theoretically means I can assign '-p 24'

My guess is memory usage, but not entirely clear. I have 64GB RAM (which is the maximum allowed, until Mavericks OSX comes out).

Any advice on how to assign 24 threads without TopHat2 failing? Would calling the '-mm' argument work?
mascano is offline   Reply With Quote
Old 08-21-2013, 09:18 AM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,087
Default

Why is using all 24 threads so important?

Have you considered the possibility that the storage subsystem you have on this machine is probably a bottleneck (look in the activity monitor to see if you are maxing out the throughput).

So rather than having 24 cores in some sort of iowait round robin state it may be better to start with a smaller number of cores and experiment to find the optimal performance balance.
GenoMax is offline   Reply With Quote
Old 08-21-2013, 10:28 AM   #4
mascano
Junior Member
 
Location: NYC

Join Date: Aug 2013
Posts: 8
Default

Thank you for the reply and suggestion. I had not considered that the HDD io may be the bottleneck; I imagine an SSD may improve it. However, in looking at the disk activity, I haven't seen it peak anywhere near 6 Gb/sec (or 768MB) which should be the bandwidth of my HDD (using the ICH10 bridge), in a successful run (using 16 threads).

I doubt it will skyrocket to that throughput ceiling with all 24 threads, no?
mascano is offline   Reply With Quote
Old 08-21-2013, 10:56 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,087
Default

There is theoretical throughput and real life performance. Since tophat suite is developed on Mac it should be optimized for OS X.

If you are interested you could look at specific application level stats by following the suggestions in this post: http://blog.yerkanian.com/2011/10/17...io-on-macos-x/

Check using the following to see CPU level performance for various processes in a terminal window (adjust parameters as needed by looking at man entry for top).

Code:
$ top -n10 -u

Last edited by GenoMax; 08-21-2013 at 11:03 AM.
GenoMax is offline   Reply With Quote
Old 08-21-2013, 12:13 PM   #6
mascano
Junior Member
 
Location: NYC

Join Date: Aug 2013
Posts: 8
Default

Under -p 16 conditions:

Memory usage peaked at 4 GB for long_spanning process. But IO for HDD did not exceed 20MB/sec (read or write) total. I used
Code:
sudo iotop -C 5
as well as viewing memory usage and disk activity via Activity Monitor.

I was monitoring during these log events, which is when it would fail if -p 24:
Code:
[2013-08-21 14:59:08] Mapping left_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4)
[2013-08-21 15:01:19] Joining segment hits
[2013-08-21 15:05:52] Reporting output tracks
mascano is offline   Reply With Quote
Old 08-21-2013, 12:22 PM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,087
Default

So we know that 16 threads work but not 24. OS X may need some cores to keep essential parts of the OS running. Next thing to try would be to increment 16 towards 24 and see at what point the process fails.
GenoMax is offline   Reply With Quote
Old 08-22-2013, 09:40 AM   #8
mascano
Junior Member
 
Location: NYC

Join Date: Aug 2013
Posts: 8
Default

So I can move to 22 cores either as a single run or parallel process runs that total up to 22. That said, I'm not being shy about using the computer simultaneously for other applications (MsOffice, Chrome, Mail, etc.) so, at least for my configuration, 22 threads is more than satisfactory.
mascano is offline   Reply With Quote
Old 08-22-2013, 09:56 AM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,087
Default

Since you did the experiment...

How much time (if any) is saved by going from 16 to 22 cores for the same job? A rough estimate is fine if you did not time the runs.
GenoMax is offline   Reply With Quote
Old 08-22-2013, 10:15 AM   #10
mascano
Junior Member
 
Location: NYC

Join Date: Aug 2013
Posts: 8
Default

Not using the exact same fastq, but of similar size (~30Mio):
I went from ~2hrs at 16 cores, to 1.5hrs at 22 cores, to 2.5 hrs at 11 cores. Although, when I run two shelled processes at 11 cores each, one of them is consistently around 2.5 and the other one 3.5 hrs. I think I'll have to just wing this and figure out the best balance of the number of parallel processes vs the number of cores per process.
mascano is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO