Seqanswers Leaderboard Ad

**JChase** · 07-07-2012, 12:54 AM

I re-ran TopHat, and got only slightly further than the previous times. This time, the BAM files are all intact (no missing headers, I can view all of them), but I got an error that the program couldn't open one of my BAM files.. The segment-juncs log says this was because there were "Too many open files." Has anyone run into this problem before?? I'm desperate to sort this out!

[2012-07-03 23:15:54] Beginning TopHat run (v2.0.3)
-----------------------------------------------
[2012-07-03 23:15:54] Checking for Bowtie
Bowtie version: 2.0.0.6
[2012-07-03 23:15:54] Checking for Samtools
Samtools version: 0.1.18.0
[2012-07-03 23:15:54] Checking for Bowtie index files
[2012-07-03 23:15:54] Checking for reference FASTA file
[2012-07-03 23:15:54] Generating SAM header for /work/jeremy/BowtieIndex/Mus_musculus/NCBI/build37.2/Sequence/Bowtie2Index/genome
format: fastq
quality scale: phred33 (default)
[2012-07-03 23:16:02] Preparing reads
left reads: min. length=40, max. length=100, 151335054 kept reads (1380 discarded)
right reads: min. length=40, max. length=100, 151333005 kept reads (3429 discarded)
[2012-07-04 02:19:20] Mapping left_kept_reads to genome genome with Bowtie2
[2012-07-04 18:36:05] Mapping left_kept_reads_seg1 to genome genome with Bowtie2 (1/4)
[2012-07-04 22:45:41] Mapping left_kept_reads_seg2 to genome genome with Bowtie2 (2/4)
[2012-07-05 02:56:53] Mapping left_kept_reads_seg3 to genome genome with Bowtie2 (3/4)
[2012-07-05 07:22:00] Mapping left_kept_reads_seg4 to genome genome with Bowtie2 (4/4)
[2012-07-05 12:48:52] Mapping right_kept_reads to genome genome with Bowtie2
[2012-07-06 06:24:25] Mapping right_kept_reads_seg1 to genome genome with Bowtie2 (1/4)
[2012-07-06 11:29:32] Mapping right_kept_reads_seg2 to genome genome with Bowtie2 (2/4)
[2012-07-06 16:44:32] Mapping right_kept_reads_seg3 to genome genome with Bowtie2 (3/4)
[2012-07-06 21:18:45] Mapping right_kept_reads_seg4 to genome genome with Bowtie2 (4/4)
[2012-07-07 02:37:47] Searching for junctions via segment mapping
[FAILED]
Error: segment-based junction search failed with err =1
Error opening SAM file tophatout2/tmp/right_kept_reads_seg2.bam

-bash-4.1$ less segment_juncs.log
segment_juncs v2.0.3 (3443S)
---------------------------
[samopen] SAM header is present: 22 sequences.
Loading reference sequences...
Loading 10...done
Loading 11...done
Loading 12...done
Loading 13...done
Loading 14...done
Loading 15...done
Loading 16...done
Loading 17...done
Loading 18...done
Loading 19...done
Loading 1...done
Loading 2...done
Loading 3...done
Loading 4...done
Loading 5...done
Loading 6...done
Loading 7...done
Loading 8...done
Loading 9...done
Loading MT...done
Loading X...done
Loading Y...done
Loading ...done
>> Performing segment-search:
Loading left segment hits...
done.
Loading right segment hits...
open: Too many open files
Error opening SAM file tophatout2/tmp/right_kept_reads_seg2.bam

**AsoBioInfo** · 07-07-2012, 04:11 AM

There are already many threads related to this issue (http://seqanswers.com/forums/showthread.php?t=15142, http://seqanswers.com/forums/showthread.php?t=7266)

So maybe the problem is arising due to less memory. Just try to run the process solely.

**JChase** · 07-07-2012, 04:31 AM

Originally posted by AsoBioInfo View Post

There are already many threads related to this issue (http://seqanswers.com/forums/showthread.php?t=15142, http://seqanswers.com/forums/showthread.php?t=7266)

So maybe the problem is arising due to less memory. Just try to run the process solely.

Hello,

I did read through those threads before I posted, but they are referring to different errors (eg, error 9). I'm running this process on a node with half a terabyte of memory, so I do not think that memory can be the issue. When you say "run the process solely", what exactly do you mean? Is it possible to re-start the tophat process without re-mapping?

**AsoBioInfo** · 07-07-2012, 05:53 AM

What command did you use?

**JChase** · 07-07-2012, 06:16 AM

Originally posted by AsoBioInfo View Post

What command did you use?

tophat -p 56 -o tophatout2 --genome-read-mismatches 4 --read-mismatches 4 /NCBI/build37.2/Sequence/Bowtie2Index/genome /C4/PChapmani4F_AdpCutTrimmed.fastq,/C4/PC5F_AdpCutTrimmed.fastq,/C4/PC6F_AdpCutTrimmed.fastq,/C4/PC12F_AdpCutTrimmed.fastq /C4/PC4R_AdpCutTrimmed.fastq,/C4/PC5R_AdpCutTrimmed.fastq,/C4/PC6R_AdpCutTrimmed.fastq,/C4/PC12R_AdpCutTrimmed.fastq

**AsoBioInfo** · 07-07-2012, 08:59 AM

Run the command without the option -p.

**JChase** · 07-07-2012, 09:22 AM

Originally posted by AsoBioInfo View Post

Run the command without the option -p.

Well, I'll give that a try... Considering it takes 4 days to do the mapping when multithreading with 54 cores, I guess it will take a while to see if this is going to throw an error with only 1 core.

**AsoBioInfo** · 07-07-2012, 09:45 AM

Even I want to give it a try too... as mentioned in the following link:

Tophat- Cufflinks - Bioinformatics Team (BioITeam) at the University of Texas - UT Austin Wikis

https://wikis.utexas.edu/display/bioiteam/Tophat-+Cufflinks

**JChase** · 07-08-2012, 01:55 AM

So i'm re-running Tophat on just a single core, and it this rate it should finish mapping my 300million reads and all of their segments sometime next year. I also decided to try a few other things... I ran Tophat on just 10,000 reads, again on 54 cores, and it completed without errors; this suggests to me that multithreading in itself isn't the issue. I then tried it on a quarter of my reads (35 million forward, 35 million reverse), and the program threw the same error as above (too many open files). Will try a few more things; wish me luck.

**ians** · 07-09-2012, 08:11 AM

I've seen the "too many open files" error when dealing with alignments else where. I've seen this when sorting very large bam files. This ultimately happens when samtools tries merging all the intermediary files together.
If this is the case, the only option may be to raise the limit of simultaneously open files. By default most linux is set at 1024.

Code:

ulimit -n unlimited

to remove the ceiling (for ubuntu.)

Be careful as this will only set for the given bash session. I forget how to make it persist. In my case, i increased performance by increasing -m 10x (in samtools sort.) and thus had to merge 1/10 the amount of files.

**JChase** · 07-12-2012, 11:47 PM

Hello,

I wanted to follow up with one final post that, hopefully, will help others should they run into similar problems in the future. Like ians, I suspected that the "too many files open" might have been a node-specific issue despite the fact that it shouldn't have been a problem on the node I was using. In any case, I transferred my data to another node with the same specifications (64 cores, half a terabyte of memory) that had fewer people using it. I re-ran things as before, multithreading across 54 cores, and instead of the "too many files open" issue, I got this error: "Error: ReadStream::getRead() called with out-of-order id#!" I may be incorrect in this assumption, but I think this is directly related to multithreading and the fact that Bowtie2 doesn't keep multithreaded alignments in order (unless you tell it to). In any case, I was unable to multithread Tophat on my reads on either node.

To resolve this, it was suggested that I run Tophat on a single thread. Because 300 million reads takes Tophat a long time to process on a single thread, I split my reads up into fourths and ran each set of ~70million reads on a single thread. I am happy to report that this worked! So, for those of you running into these problems, I hope that removing the multi-thread option will also work for you.

Thanks again to those in the community who helped me work through this.

**ians** · 07-13-2012, 06:50 AM

Originally posted by JChase View Post

Hello,

I re-ran things as before, multithreading across 54 cores, and instead of the "too many files open" issue, I got this error: "Error: ReadStream::getRead() called with out-of-order id#!" I may be incorrect in this assumption, but I think this is directly related to multithreading and the fact that Bowtie2 doesn't keep multithreaded alignments in order (unless you tell it to). In any case, I was unable to multithread Tophat on my reads on either node.

Is there a way to explicitly "keep multithreaded alignments in order"?

Don't give up on multithreading. If you can reserve a box, where you don't have to compete for cpu, you may find the run finishes successfully. I had the same problem:

Tophat error -segment-based junction search failled with err=1 - SEQanswers

http://seqanswers.com/forums/showthread.php?t=15142

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

**JChase** · 07-13-2012, 07:24 AM

Originally posted by ians View Post

Is there a way to explicitly "keep multithreaded alignments in order"?

Don't give up on multithreading. If you can reserve a box, where you don't have to compete for cpu, you may find the run finishes successfully. I had the same problem:
http://seqanswers.com/forums/showthread.php?t=15142

Well, Bowtie2 has an option to keep things in order, but I've never had luck with it. And I wouldn't know how to feed that option through Tophat to Bowtie2 anyhow...

**slockton** · 08-08-2012, 01:28 PM

Code:
ulimit -n unlimited
to remove the ceiling (for ubuntu.)

I had the "too many open files" error in tophat during segment mapping also. Changing the maximum number of open files seems to have fixed the error (Post #11). However, I am running OSX and found out you achieve the same result with a different command.

See the following website for details on how to change the max. open files limit on Linux and OSX: http://wiki.basho.com/Open-Files-Limit.html

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Tophat segment junction error 1, invalid BAM binary header

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News