SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat error -segment-based junction search failled with err=1 upadhyayanm Bioinformatics 24 06-21-2016 06:20 PM
sam to bam conversion error, no @SQ lines in the header, missing header? efoss Bioinformatics 17 12-03-2015 04:28 AM
Tophat Error: Error: segment-based junction search failed with err =-6 sjnewhouse RNA Sequencing 8 03-19-2013 04:14 AM
tophat2 segment_juncs error: Error: segment-based junction search failed with err =-6 hulan0@gmail.com Bioinformatics 1 04-16-2012 06:37 AM
Tophat Segment-based junction error = -9 UNCKidney Bioinformatics 4 04-08-2010 07:29 AM

Reply
 
Thread Tools
Old 07-01-2012, 03:53 AM   #1
JChase
Member
 
Location: Berkeley, CA

Join Date: Jun 2012
Posts: 17
Default Tophat segment junction error 1, invalid BAM binary header

**Note: This has been resolved-- please see post 12 in this thread for the update.

Hello folks,

Nearly at my wits end with tophat. I've gone through this twice now, using Bowtie once and Bowtie2 once. After days of mapping reads and read segments, I get the error below when it comes to "Searching for junctions via segment mapping":

Quote:
[2012-06-27 13:47:45] Beginning TopHat run (v2.0.3)
-----------------------------------------------
[2012-06-27 13:47:45] Checking for Bowtie
Bowtie version: 2.0.0.6
[2012-06-27 13:47:45] Checking for Samtools
Samtools version: 0.1.18.0
[2012-06-27 13:47:45] Checking for Bowtie index files
[2012-06-27 13:47:45] Checking for reference FASTA file
[2012-06-27 13:47:45] Generating SAM header for /work/jeremy/BowtieIndex/Mus_musculus/NCBI/build37.2/Sequence/Bowtie2Index/genome

format: fastq
quality scale: phred33 (default)
[2012-06-27 13:48:39] Preparing reads
left reads: min. length=40, max. length=100, 151335054 kept reads (1380 discarded)
right reads: min. length=40, max. length=100, 151333005 kept reads (3429 discarded)
[2012-06-27 16:54:38] Mapping left_kept_reads to genome genome with Bowtie2
[2012-06-28 10:53:22] Mapping left_kept_reads_seg1 to genome genome with Bowtie2 (1/4)
[2012-06-28 16:06:36] Mapping left_kept_reads_seg2 to genome genome with Bowtie2 (2/4)
[2012-06-28 20:40:10] Mapping left_kept_reads_seg3 to genome genome with Bowtie2 (3/4)
[2012-06-29 02:13:31] Mapping left_kept_reads_seg4 to genome genome with Bowtie2 (4/4)
[2012-06-29 07:53:33] Mapping right_kept_reads to genome genome with Bowtie2
[2012-06-30 02:23:29] Mapping right_kept_reads_seg1 to genome genome with Bowtie2 (1/4)
[2012-06-30 07:03:59] Mapping right_kept_reads_seg2 to genome genome with Bowtie2 (2/4)
[2012-06-30 12:44:20] Mapping right_kept_reads_seg3 to genome genome with Bowtie2 (3/4)
[2012-06-30 17:33:54] Mapping right_kept_reads_seg4 to genome genome with Bowtie2 (4/4)
[2012-06-30 23:10:44] Searching for junctions via segment mapping
[FAILED]
Error: segment-based junction search failed with err =1
[bam_header_read] invalid BAM binary header (this is not a BAM file).
When I look at the segment_juncs log, I find out that the problem I have is that a single one of my segment BAM files lacks a header:

Quote:
segment_juncs v2.0.3 (3443S)
---------------------------
[samopen] SAM header is present: 22 sequences.
Loading reference sequences...
Loading 10...done
Loading 11...done
Loading 12...done
Loading 13...done
Loading 14...done
Loading 15...done
Loading 16...done
Loading 17...done
Loading 18...done
Loading 19...done
Loading 1...done
Loading 2...done
Loading 3...done
Loading 4...done
Loading 5...done
Loading 6...done
Loading 7...done
Loading 8...done
Loading 9...done
Loading MT...done
Loading X...done
Loading Y...done
Loading ...done
>> Performing segment-search:
Loading left segment hits...
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] invalid BAM binary header (this is not a BAM file).
Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] invalid BAM binary header (this is not a BAM file).
Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] invalid BAM binary header (this is not a BAM file).
Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] invalid BAM binary header (this is not a BAM file).
Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
Error: no SAM header found for file ./tophat_out/tmp/left_kept_reads_seg2.bam
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_header_read] invalid BAM binary header (this is not a BAM file).
Does anyone have any suggestions as to how to deal with this? Considering that only a single one of the BAM files is missing a header, I'm hoping there is some way to repair it, using the same header found in all the other BAM files. Does anyone know if this is possible? I've tried using samtools reheader tool, but that for some odd reason sends what seems like millions of lines of weird symbols through my terminal (eventually crashing it) and doesn't end up changing the file at all.

Last edited by JChase; 07-13-2012 at 07:25 AM.
JChase is offline   Reply With Quote
Old 07-07-2012, 12:54 AM   #2
JChase
Member
 
Location: Berkeley, CA

Join Date: Jun 2012
Posts: 17
Default

I re-ran TopHat, and got only slightly further than the previous times. This time, the BAM files are all intact (no missing headers, I can view all of them), but I got an error that the program couldn't open one of my BAM files.. The segment-juncs log says this was because there were "Too many open files." Has anyone run into this problem before?? I'm desperate to sort this out!


Quote:
[2012-07-03 23:15:54] Beginning TopHat run (v2.0.3)
-----------------------------------------------
[2012-07-03 23:15:54] Checking for Bowtie
Bowtie version: 2.0.0.6
[2012-07-03 23:15:54] Checking for Samtools
Samtools version: 0.1.18.0
[2012-07-03 23:15:54] Checking for Bowtie index files
[2012-07-03 23:15:54] Checking for reference FASTA file
[2012-07-03 23:15:54] Generating SAM header for /work/jeremy/BowtieIndex/Mus_musculus/NCBI/build37.2/Sequence/Bowtie2Index/genome
format: fastq
quality scale: phred33 (default)
[2012-07-03 23:16:02] Preparing reads
left reads: min. length=40, max. length=100, 151335054 kept reads (1380 discarded)
right reads: min. length=40, max. length=100, 151333005 kept reads (3429 discarded)
[2012-07-04 02:19:20] Mapping left_kept_reads to genome genome with Bowtie2
[2012-07-04 18:36:05] Mapping left_kept_reads_seg1 to genome genome with Bowtie2 (1/4)
[2012-07-04 22:45:41] Mapping left_kept_reads_seg2 to genome genome with Bowtie2 (2/4)
[2012-07-05 02:56:53] Mapping left_kept_reads_seg3 to genome genome with Bowtie2 (3/4)
[2012-07-05 07:22:00] Mapping left_kept_reads_seg4 to genome genome with Bowtie2 (4/4)
[2012-07-05 12:48:52] Mapping right_kept_reads to genome genome with Bowtie2
[2012-07-06 06:24:25] Mapping right_kept_reads_seg1 to genome genome with Bowtie2 (1/4)
[2012-07-06 11:29:32] Mapping right_kept_reads_seg2 to genome genome with Bowtie2 (2/4)
[2012-07-06 16:44:32] Mapping right_kept_reads_seg3 to genome genome with Bowtie2 (3/4)
[2012-07-06 21:18:45] Mapping right_kept_reads_seg4 to genome genome with Bowtie2 (4/4)
[2012-07-07 02:37:47] Searching for junctions via segment mapping
[FAILED]
Error: segment-based junction search failed with err =1
Error opening SAM file tophatout2/tmp/right_kept_reads_seg2.bam

Quote:
-bash-4.1$ less segment_juncs.log
segment_juncs v2.0.3 (3443S)
---------------------------
[samopen] SAM header is present: 22 sequences.
Loading reference sequences...
Loading 10...done
Loading 11...done
Loading 12...done
Loading 13...done
Loading 14...done
Loading 15...done
Loading 16...done
Loading 17...done
Loading 18...done
Loading 19...done
Loading 1...done
Loading 2...done
Loading 3...done
Loading 4...done
Loading 5...done
Loading 6...done
Loading 7...done
Loading 8...done
Loading 9...done
Loading MT...done
Loading X...done
Loading Y...done
Loading ...done
>> Performing segment-search:
Loading left segment hits...
done.
Loading right segment hits...
open: Too many open files
Error opening SAM file tophatout2/tmp/right_kept_reads_seg2.bam

Last edited by JChase; 07-07-2012 at 01:00 AM.
JChase is offline   Reply With Quote
Old 07-07-2012, 04:11 AM   #3
AsoBioInfo
Member
 
Location: KSA

Join Date: Dec 2011
Posts: 37
Default

There are already many threads related to this issue (http://seqanswers.com/forums/showthread.php?t=15142, http://seqanswers.com/forums/showthread.php?t=7266)

So maybe the problem is arising due to less memory. Just try to run the process solely.
AsoBioInfo is offline   Reply With Quote
Old 07-07-2012, 04:31 AM   #4
JChase
Member
 
Location: Berkeley, CA

Join Date: Jun 2012
Posts: 17
Default

Quote:
Originally Posted by AsoBioInfo View Post
There are already many threads related to this issue (http://seqanswers.com/forums/showthread.php?t=15142, http://seqanswers.com/forums/showthread.php?t=7266)

So maybe the problem is arising due to less memory. Just try to run the process solely.
Hello,

I did read through those threads before I posted, but they are referring to different errors (eg, error 9). I'm running this process on a node with half a terabyte of memory, so I do not think that memory can be the issue. When you say "run the process solely", what exactly do you mean? Is it possible to re-start the tophat process without re-mapping?
JChase is offline   Reply With Quote
Old 07-07-2012, 05:53 AM   #5
AsoBioInfo
Member
 
Location: KSA

Join Date: Dec 2011
Posts: 37
Default

What command did you use?
AsoBioInfo is offline   Reply With Quote
Old 07-07-2012, 06:16 AM   #6
JChase
Member
 
Location: Berkeley, CA

Join Date: Jun 2012
Posts: 17
Default

Quote:
Originally Posted by AsoBioInfo View Post
What command did you use?
tophat -p 56 -o tophatout2 --genome-read-mismatches 4 --read-mismatches 4 /NCBI/build37.2/Sequence/Bowtie2Index/genome /C4/PChapmani4F_AdpCutTrimmed.fastq,/C4/PC5F_AdpCutTrimmed.fastq,/C4/PC6F_AdpCutTrimmed.fastq,/C4/PC12F_AdpCutTrimmed.fastq /C4/PC4R_AdpCutTrimmed.fastq,/C4/PC5R_AdpCutTrimmed.fastq,/C4/PC6R_AdpCutTrimmed.fastq,/C4/PC12R_AdpCutTrimmed.fastq
JChase is offline   Reply With Quote
Old 07-07-2012, 08:59 AM   #7
AsoBioInfo
Member
 
Location: KSA

Join Date: Dec 2011
Posts: 37
Default

Run the command without the option -p.
AsoBioInfo is offline   Reply With Quote
Old 07-07-2012, 09:22 AM   #8
JChase
Member
 
Location: Berkeley, CA

Join Date: Jun 2012
Posts: 17
Default

Quote:
Originally Posted by AsoBioInfo View Post
Run the command without the option -p.
Well, I'll give that a try... Considering it takes 4 days to do the mapping when multithreading with 54 cores, I guess it will take a while to see if this is going to throw an error with only 1 core.
JChase is offline   Reply With Quote
Old 07-07-2012, 09:45 AM   #9
AsoBioInfo
Member
 
Location: KSA

Join Date: Dec 2011
Posts: 37
Default

Even I want to give it a try too... as mentioned in the following link:

https://wikis.utexas.edu/display/bio...hat-+Cufflinks
AsoBioInfo is offline   Reply With Quote
Old 07-08-2012, 01:55 AM   #10
JChase
Member
 
Location: Berkeley, CA

Join Date: Jun 2012
Posts: 17
Default

So i'm re-running Tophat on just a single core, and it this rate it should finish mapping my 300million reads and all of their segments sometime next year. I also decided to try a few other things... I ran Tophat on just 10,000 reads, again on 54 cores, and it completed without errors; this suggests to me that multithreading in itself isn't the issue. I then tried it on a quarter of my reads (35 million forward, 35 million reverse), and the program threw the same error as above (too many open files). Will try a few more things; wish me luck.
JChase is offline   Reply With Quote
Old 07-09-2012, 08:11 AM   #11
ians
Member
 
Location: St. Louis, MO

Join Date: Aug 2011
Posts: 53
Default

I've seen the "too many open files" error when dealing with alignments else where. I've seen this when sorting very large bam files. This ultimately happens when samtools tries merging all the intermediary files together.
If this is the case, the only option may be to raise the limit of simultaneously open files. By default most linux is set at 1024.

Code:
ulimit -n unlimited
to remove the ceiling (for ubuntu.)

Be careful as this will only set for the given bash session. I forget how to make it persist. In my case, i increased performance by increasing -m 10x (in samtools sort.) and thus had to merge 1/10 the amount of files.
ians is offline   Reply With Quote
Old 07-12-2012, 11:47 PM   #12
JChase
Member
 
Location: Berkeley, CA

Join Date: Jun 2012
Posts: 17
Default

Hello,

I wanted to follow up with one final post that, hopefully, will help others should they run into similar problems in the future. Like ians, I suspected that the "too many files open" might have been a node-specific issue despite the fact that it shouldn't have been a problem on the node I was using. In any case, I transferred my data to another node with the same specifications (64 cores, half a terabyte of memory) that had fewer people using it. I re-ran things as before, multithreading across 54 cores, and instead of the "too many files open" issue, I got this error: "Error: ReadStream::getRead() called with out-of-order id#!" I may be incorrect in this assumption, but I think this is directly related to multithreading and the fact that Bowtie2 doesn't keep multithreaded alignments in order (unless you tell it to). In any case, I was unable to multithread Tophat on my reads on either node.

To resolve this, it was suggested that I run Tophat on a single thread. Because 300 million reads takes Tophat a long time to process on a single thread, I split my reads up into fourths and ran each set of ~70million reads on a single thread. I am happy to report that this worked! So, for those of you running into these problems, I hope that removing the multi-thread option will also work for you.

Thanks again to those in the community who helped me work through this.
JChase is offline   Reply With Quote
Old 07-13-2012, 06:50 AM   #13
ians
Member
 
Location: St. Louis, MO

Join Date: Aug 2011
Posts: 53
Default

Quote:
Originally Posted by JChase View Post
Hello,

I re-ran things as before, multithreading across 54 cores, and instead of the "too many files open" issue, I got this error: "Error: ReadStream::getRead() called with out-of-order id#!" I may be incorrect in this assumption, but I think this is directly related to multithreading and the fact that Bowtie2 doesn't keep multithreaded alignments in order (unless you tell it to). In any case, I was unable to multithread Tophat on my reads on either node.
Is there a way to explicitly "keep multithreaded alignments in order"?

Don't give up on multithreading. If you can reserve a box, where you don't have to compete for cpu, you may find the run finishes successfully. I had the same problem:
http://seqanswers.com/forums/showthread.php?t=15142
ians is offline   Reply With Quote
Old 07-13-2012, 07:24 AM   #14
JChase
Member
 
Location: Berkeley, CA

Join Date: Jun 2012
Posts: 17
Default

Quote:
Originally Posted by ians View Post
Is there a way to explicitly "keep multithreaded alignments in order"?

Don't give up on multithreading. If you can reserve a box, where you don't have to compete for cpu, you may find the run finishes successfully. I had the same problem:
http://seqanswers.com/forums/showthread.php?t=15142
Well, Bowtie2 has an option to keep things in order, but I've never had luck with it. And I wouldn't know how to feed that option through Tophat to Bowtie2 anyhow...
JChase is offline   Reply With Quote
Old 08-08-2012, 01:28 PM   #15
slockton
SteveL
 
Location: San Diego

Join Date: Sep 2009
Posts: 6
Default

Quote:
Code:
ulimit -n unlimited
to remove the ceiling (for ubuntu.)
I had the "too many open files" error in tophat during segment mapping also. Changing the maximum number of open files seems to have fixed the error (Post #11). However, I am running OSX and found out you achieve the same result with a different command.

See the following website for details on how to change the max. open files limit on Linux and OSX: http://wiki.basho.com/Open-Files-Limit.html
slockton is offline   Reply With Quote
Old 09-24-2014, 07:47 PM   #16
Adam Taranto
Junior Member
 
Location: Australia

Join Date: Sep 2014
Posts: 1
Default

Quote:
Originally Posted by JChase View Post
Does anyone have any suggestions as to how to deal with this? Considering that only a single one of the BAM files is missing a header, I'm hoping there is some way to repair it, using the same header found in all the other BAM files. Does anyone know if this is possible? I've tried using samtools reheader tool, but that for some odd reason sends what seems like millions of lines of weird symbols through my terminal (eventually crashing it) and doesn't end up changing the file at all.
Hi JChase,

Did you ever figure out how to fix the missing header? or was the error resolved by re-running the step?

Adam
Adam Taranto is offline   Reply With Quote
Old 09-24-2014, 08:06 PM   #17
JChase
Member
 
Location: Berkeley, CA

Join Date: Jun 2012
Posts: 17
Default

Quote:
Originally Posted by Adam Taranto View Post
Hi JChase,

Did you ever figure out how to fix the missing header? or was the error resolved by re-running the step?

Adam
See post 12: I had to re-run without multithreading. However, I haven't tested to see if multithreading has been fixed since this original post.
JChase is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:51 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO