SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
tophat2 errors ahmetz Bioinformatics 25 09-04-2013 06:24 AM
tophat2 installion problem IceWater Bioinformatics 1 05-09-2012 12:09 AM
Tophat2: bam file has no quality sequences for a lot of reads duhaimj Bioinformatics 4 04-28-2012 06:47 AM
TopHat2 insertion bed outputs Bukowski Bioinformatics 0 04-23-2012 12:29 AM
tophat2 segment_juncs error: Error: segment-based junction search failed with err =-6 hulan0@gmail.com Bioinformatics 1 04-16-2012 06:37 AM

Reply
 
Thread Tools
Old 04-26-2012, 11:40 PM   #1
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default tophat2 error

I ran tophat2 with bowite1 as dealing with color space reads. The command line I used was

Code:
tophat --bowtie1 --keep-tmp -o T34_tophat2 -p 8 --color --quals --library-type=fr-secondstrand --transcriptome-index=transcriptome/hg19_Ensemble.GRCh37_65 /home/xwang/data/hg
19/bowtie_index/hg19.color T34.csfasta T34.qual
But tophat ended up with an error:

Code:
[2012-04-27 10:36:10] Beginning TopHat run (v2.0.0)
-----------------------------------------------
[2012-04-27 10:36:10] Checking for Bowtie
		  Bowtie version:	 0.12.7.0
[2012-04-27 10:36:11] Checking for Samtools
		Samtools version:	 0.1.17.0
[2012-04-27 10:36:11] Checking for Bowtie index files
[2012-04-27 10:36:11] Checking for Bowtie index files
[2012-04-27 10:36:11] Checking for reference FASTA file
[2012-04-27 10:36:11] Generating SAM header for /home/xwang/data/hg19/bowtie_index/hg19.color
	format:		 fasta
[2012-04-27 10:38:10] Reading known junctions from GTF file
[2012-04-27 10:38:48] Preparing reads
	 left reads: min. length=50, count=64422218
[2012-04-27 11:43:11] Using pre-built transcriptome index..
[2012-04-27 11:43:49] Mapping left_kept_reads against transcriptome hg19_Ensemble.GRCh37_65 with Bowtie 
[2012-04-27 12:11:41] Converting left_kept_reads.m2g to genomic coordinates (map2gtf)
[2012-04-27 12:14:57] Resuming TopHat pipeline with unmapped reads
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "T34_tophat2/tmp/left_kept_reads.m2g_um.fq".
[2012-04-27 12:14:57] Reporting output tracks
-----------------------------------------------
[2012-04-27 13:08:39] Run complete: 02:32:28 elapsed
From the time points recorded, "Resuming TopHat pipeline with unmapped reads" wasn't executed, and it seemed the reason was the file "left_kept_reads.m2g_um.fq" was not found. But in fact the file is there.

Any hints? Thanks.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 04-27-2012, 07:03 AM   #2
saad0105050
Junior Member
 
Location: Storrs, Connecticut, US

Join Date: Apr 2012
Posts: 5
Unhappy Same issue

Hi,

I have very similar issue. I have used samtools and checked that every bam file could be opened without error. Could you solve your problem?

Thanks,
Saad
saad0105050 is offline   Reply With Quote
Old 04-27-2012, 07:33 AM   #3
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by saad0105050 View Post
Hi,

I have very similar issue. I have used samtools and checked that every bam file could be opened without error. Could you solve your problem?

Thanks,
Saad
Sorry to hear about that you have the same issue. I've reported this bug to the developers. Hope them can find a solution soon. Anyone get this solved please share with us.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 04-28-2012, 06:43 AM   #4
caddymob
Member
 
Location: USA

Join Date: Apr 2009
Posts: 36
Exclamation

I have the same issue with tophat2, using bowtie2. Some reads have qualities, some just have "*" in the quality field. Here are 2 examples, 1st with no quality, 2nd with quality:

Code:
HWI-ST201:229:C07HGACXX:2:1306:5066:164732:1:N:0:ATCACG	321	1	10015	0	91M	X	155260312	0	ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA	*	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:91	YT:Z:UU	NH:i:20	CC:Z:5	CP:i:10285	HI:i:18
HWI-ST201:229:C07HGACXX:2:1203:20609:127413:1:N:0:ATCACG	83	1	10129	3	51M1I6M1I6M1I25M	=	10335	298	CCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCCTAACCCTAACCCTAACCCTAACCCT	?ABA<3?<?<<3DCB@B<DDCAA9,?CBA=5DDBB>;EDEB;7HHHED=JIGFA<GIIIHF?IJIIGHFJIHCFCJIGFHFIHFFFDJHFD	AS:i:-24	XN:i:0	XM:i:0	XO:i:3	XG:i:3	NM:i:3	MD:Z:88	YT:Z:UU	NH:i:2	CC:Z:=	CP:i:10129	HI:i:0
I checked chromosome 1 to quantify this issue, I get 2,224,578 reads with quality=* and 24,160,938 reads had regular Phred scores.

I have also sent report to the tophat email - but wanted to share that you're not alone!
caddymob is offline   Reply With Quote
Old 04-29-2012, 08:36 PM   #5
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

I had another run without mapping to a transcriptome but to the reference genome directly. Tophat2 ended up with a similar error:

Code:
fail to read the header from "T34_tophat2_genome/tmp/left_kept_reads_unmapped.fq".
By looking into the tmp files, I found this issue might be relevant to the read IDs. I paste a head of "T34_tophat2_genome/tmp/left_kept_reads_unmapped.fq" here:

Code:
@39387
T13323133231032130303001010113000104313423130441340
+3_19_590_F3
AAA=A2.%(='81-5%&;%%51(.1)&',')-'5!3**!*,'+)!!)=!,
@39398
T31202110130210003323123122331321034123433032442343
+3_19_1526_F3
(A=/5/A@>(.B=9)&BA@/=B>)>3B?)'*@??!)B-!&/8:2!!A<!'
@39402
T30231202003222033021022010303030024203413010441343
+3_20_156_F3
@8(&2,9(-3731%:3*''783&6)8.1'-+)0-!408!(3%%+!!+(!3
@39403
T31130311333002111122221010023203034033432000441040
+3_20_203_F3
A7B>5A:?>BB;@4'3:A=+;6<3?51@>'<,A>!=53!,/.-/!!0=!2
The lines beginning with "@" and "+" have different read IDs.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 04-30-2012, 11:10 AM   #6
saad0105050
Junior Member
 
Location: Storrs, Connecticut, US

Join Date: Apr 2012
Posts: 5
Default Tophat tmp.samheader.sam is broken

I checked all bam/sam files in the tmp directory with samtools. It turns out that the file tmp.samheader.sam (and other sam files) cannot be opened with samtools, and it gives those exact error messages ([bam_header_read]... bad EOF etc.) that we see on screen.

I ran bowtie with the exact commands issued by Tophat (from the run.log file). Bowtie runs fine (with both sam and plain-text output), and the output is valid. But when this output is piped to fix_map_order (an internal utility of Tophat), Tophat tries to read this temp.samheader.sam file and breaks. Note: this file is created very early when you run Tophat.

Getting frustrated, I am not using Tophat for now. I have created my own splice junction library (through RSEQtools library) and intend to use bowtie (or bfast or bwa) to align my reads with both the reference genome and this splice junction library.

Last edited by saad0105050; 04-30-2012 at 11:13 AM. Reason: Typo in the tool name `RSEQtools'
saad0105050 is offline   Reply With Quote
Old 04-30-2012, 05:01 PM   #7
mikhmv
Registered Vendor
 
Location: MD

Join Date: Feb 2012
Posts: 18
Default Error:

+1 to all of you:
I run this command:
Code:
$TOPHAT -o $DEST -C -Q --bowtie1 -p 60 -r 200 --mate-std-dev 30 --report-secondary-alignments --report-discordant-pair-alignments --coverage-search --microexon-search --library-type fr-secondstrand --keep-tmp -z0 $BOWTiEIndex/human_g1k_v37_decoy "$SAMPLE"_F3.csfasta "$SAMPLE"_F5.csfasta "$SAMPLE"_F3_QV.qual "$SAMPLE"_F5_QV.qual
and got these message:
Code:
[2012-04-30 11:48:52] Beginning TopHat run (v2.0.0)
-----------------------------------------------
[2012-04-30 11:48:52] Checking for Bowtie
                  Bowtie version:        0.12.7.0
[2012-04-30 11:48:52] Checking for Samtools
                Samtools version:        0.1.18.0
[2012-04-30 11:48:52] Checking for Bowtie index files
[2012-04-30 11:48:52] Checking for reference FASTA file
[2012-04-30 11:48:52] Generating SAM header for /home/biouml/galaxy/galaxy-tools-data/genomes/Hsapiens/hg19/bowtie_color//human_g1k_v37_decoy
        format:          fasta
[2012-04-30 11:49:32] Preparing reads
         left reads: min. length=50, count=28234582
        right reads: min. length=35, count=28088955
[2012-04-30 12:19:10] Mapping left_kept_reads against human_g1k_v37_decoy with Bowtie 
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "tophat_out2/tmp/left_kept_reads_unmapped.fq".
[2012-04-30 12:32:57] Mapping right_kept_reads against human_g1k_v37_decoy with Bowtie 
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "tophat_out2/tmp/right_kept_reads_unmapped.fq".
Warning: junction database is empty!
[2012-04-30 12:45:26] Processing bowtie hits
[2012-04-30 13:06:25] Processing bowtie hits
[2012-04-30 13:23:50] Reporting output tracks
-----------------------------------------------
[2012-04-30 13:48:50] Run complete: 01:59:58 elapsed
Does anyone know solution?
P.S. I sent all logs to developers, hope they will answer.
mikhmv is offline   Reply With Quote
Old 04-30-2012, 05:30 PM   #8
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by saad0105050 View Post
I checked all bam/sam files in the tmp directory with samtools. It turns out that the file tmp.samheader.sam (and other sam files) cannot be opened with samtools, and it gives those exact error messages ([bam_header_read]... bad EOF etc.) that we see on screen.

I ran bowtie with the exact commands issued by Tophat (from the run.log file). Bowtie runs fine (with both sam and plain-text output), and the output is valid. But when this output is piped to fix_map_order (an internal utility of Tophat), Tophat tries to read this temp.samheader.sam file and breaks. Note: this file is created very early when you run Tophat.

Getting frustrated, I am not using Tophat for now. I have created my own splice junction library (through RSEQtools library) and intend to use bowtie (or bfast or bwa) to align my reads with both the reference genome and this splice junction library.
Thanks for sharing. However, I found "tmp.samheader.sam" is a SAM file and can be opened with `samtools view -S`. In my runs, `fix_map_order` worked properly and thus "temp.samheader.sam" may not be the reason. Could you please show us your "run.log" or the error message.

My runs ended up with "left_kept_reads.m2g_um.fq", which was a FASTQ file, and I cannot understand at all why samtools tried to open a FASTQ file! It's ridiculous!
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 04-30-2012, 05:35 PM   #9
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by mikhmv View Post
+1 to all of you:
I run this command:
Code:
$TOPHAT -o $DEST -C -Q --bowtie1 -p 60 -r 200 --mate-std-dev 30 --report-secondary-alignments --report-discordant-pair-alignments --coverage-search --microexon-search --library-type fr-secondstrand --keep-tmp -z0 $BOWTiEIndex/human_g1k_v37_decoy "$SAMPLE"_F3.csfasta "$SAMPLE"_F5.csfasta "$SAMPLE"_F3_QV.qual "$SAMPLE"_F5_QV.qual
and got these message:
Code:
[2012-04-30 11:48:52] Beginning TopHat run (v2.0.0)
-----------------------------------------------
[2012-04-30 11:48:52] Checking for Bowtie
                  Bowtie version:        0.12.7.0
[2012-04-30 11:48:52] Checking for Samtools
                Samtools version:        0.1.18.0
[2012-04-30 11:48:52] Checking for Bowtie index files
[2012-04-30 11:48:52] Checking for reference FASTA file
[2012-04-30 11:48:52] Generating SAM header for /home/biouml/galaxy/galaxy-tools-data/genomes/Hsapiens/hg19/bowtie_color//human_g1k_v37_decoy
        format:          fasta
[2012-04-30 11:49:32] Preparing reads
         left reads: min. length=50, count=28234582
        right reads: min. length=35, count=28088955
[2012-04-30 12:19:10] Mapping left_kept_reads against human_g1k_v37_decoy with Bowtie 
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "tophat_out2/tmp/left_kept_reads_unmapped.fq".
[2012-04-30 12:32:57] Mapping right_kept_reads against human_g1k_v37_decoy with Bowtie 
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "tophat_out2/tmp/right_kept_reads_unmapped.fq".
Warning: junction database is empty!
[2012-04-30 12:45:26] Processing bowtie hits
[2012-04-30 13:06:25] Processing bowtie hits
[2012-04-30 13:23:50] Reporting output tracks
-----------------------------------------------
[2012-04-30 13:48:50] Run complete: 01:59:58 elapsed
Does anyone know solution?
P.S. I sent all logs to developers, hope they will answer.
Thanks! That's exactly what I got if I didn't map reads to a virtual transcriptome first. Seems something wrong with BAM/SAM file checking. More precisely, checking with the wrong files.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 05-01-2012, 04:30 PM   #10
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default seems this issue solved by modifying Tophat python script

Quote:
Originally Posted by Xi Wang View Post
Thanks! That's exactly what I got if I didn't map reads to a virtual transcriptome first. Seems something wrong with BAM/SAM file checking. More precisely, checking with the wrong files.
Following this finding, I've edited the Tophat python script and made the BAM/SAM file checking disabled, and finally got Tophat working well. But it took quite a lot time to run one of the following step "segment_juncs" and it still keeps running. See the Tophat message below.

Code:
[2012-05-01 11:58:20] Beginning TopHat run (v2.0.0)
-----------------------------------------------
[2012-05-01 11:58:20] Checking for Bowtie
		  Bowtie version:	 0.12.7.0
[2012-05-01 11:58:20] Checking for Samtools
		Samtools version:	 0.1.17.0
[2012-05-01 11:58:20] Checking for Bowtie index files
[2012-05-01 11:58:20] Checking for Bowtie index files
[2012-05-01 11:58:20] Checking for reference FASTA file
[2012-05-01 11:58:20] Generating SAM header for /home/xwang/data/hg19/bowtie_index/hg19.color
	format:		 fasta
[2012-05-01 11:59:25] Reading known junctions from GTF file
[2012-05-01 12:00:03] Preparing reads
	 left reads: min. length=50, count=64422218
[2012-05-01 13:02:39] Using pre-built transcriptome index..
[2012-05-01 13:03:03] Mapping left_kept_reads against transcriptome hg19_Ensemble.GRCh37_65 with Bowtie 
[2012-05-01 13:30:59] Converting left_kept_reads.m2g to genomic coordinates (map2gtf)
[2012-05-01 13:34:20] Resuming TopHat pipeline with unmapped reads
[2012-05-01 13:34:20] Mapping left_kept_reads.m2g_um against hg19.color with Bowtie 
[2012-05-01 16:52:01] Mapping left_kept_reads.m2g_um_seg1 against hg19.color with Bowtie (1/2)
[2012-05-01 20:09:31] Mapping left_kept_reads.m2g_um_seg2 against hg19.color with Bowtie (2/2)
[2012-05-01 23:25:42] Searching for junctions via segment mapping
I checked the CPU usage, and it seems that "segment_juncs" wasn't parallelised. If the developers can make this sub-routine parallelised, it will save lot of time.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 05-04-2012, 11:48 AM   #11
townway
Member
 
Location: Rockville

Join Date: May 2009
Posts: 40
Default

yes, I have the same problem, it has been running for two days (48 hours), and no file updates in the tmp folder for last 10 hours..it seems to be stopped...

did you fix this problem?

Thanks

Quote:
Originally Posted by Xi Wang View Post
Following this finding, I've edited the Tophat python script and made the BAM/SAM file checking disabled, and finally got Tophat working well. But it took quite a lot time to run one of the following step "segment_juncs" and it still keeps running. See the Tophat message below.

Code:
[2012-05-01 11:58:20] Beginning TopHat run (v2.0.0)
-----------------------------------------------
[2012-05-01 11:58:20] Checking for Bowtie
		  Bowtie version:	 0.12.7.0
[2012-05-01 11:58:20] Checking for Samtools
		Samtools version:	 0.1.17.0
[2012-05-01 11:58:20] Checking for Bowtie index files
[2012-05-01 11:58:20] Checking for Bowtie index files
[2012-05-01 11:58:20] Checking for reference FASTA file
[2012-05-01 11:58:20] Generating SAM header for /home/xwang/data/hg19/bowtie_index/hg19.color
	format:		 fasta
[2012-05-01 11:59:25] Reading known junctions from GTF file
[2012-05-01 12:00:03] Preparing reads
	 left reads: min. length=50, count=64422218
[2012-05-01 13:02:39] Using pre-built transcriptome index..
[2012-05-01 13:03:03] Mapping left_kept_reads against transcriptome hg19_Ensemble.GRCh37_65 with Bowtie 
[2012-05-01 13:30:59] Converting left_kept_reads.m2g to genomic coordinates (map2gtf)
[2012-05-01 13:34:20] Resuming TopHat pipeline with unmapped reads
[2012-05-01 13:34:20] Mapping left_kept_reads.m2g_um against hg19.color with Bowtie 
[2012-05-01 16:52:01] Mapping left_kept_reads.m2g_um_seg1 against hg19.color with Bowtie (1/2)
[2012-05-01 20:09:31] Mapping left_kept_reads.m2g_um_seg2 against hg19.color with Bowtie (2/2)
[2012-05-01 23:25:42] Searching for junctions via segment mapping
I checked the CPU usage, and it seems that "segment_juncs" wasn't parallelised. If the developers can make this sub-routine parallelised, it will save lot of time.
townway is offline   Reply With Quote
Old 05-06-2012, 06:48 AM   #12
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by townway View Post
yes, I have the same problem, it has been running for two days (48 hours), and no file updates in the tmp folder for last 10 hours..it seems to be stopped...

did you fix this problem?

Thanks
Yes, the running time for "segment_juncs" dealing with a large data set can be very slow. You may have a look at the logs folder, where up-to-date progress is recorded. I hadn't looked into this issue, but probably the developers should try to solve it out: fix the bug (if it is) or provide a new facility.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 05-21-2012, 05:34 AM   #13
Ender985
Member
 
Location: Spain

Join Date: Mar 2009
Posts: 12
Default

Hey guys,

I'm having the same problem here. I think it has to do with the Colorspace formated reads, since I can run TopHat with normal Illumina fastq files without errors but not with these kind of colorspace reads. It seems for some reason bowtie1/TopHat are trying to read a fastq file as if it were a bam file, and everything fails down from there.

My temporary workaround will be to manually convert the colorspace reads to normal .fastq reads and map them with bowtie2 and against a normal index, since that should work.

Here is hoping the TopHat guys will fix this downstream at some point.
Ender985 is offline   Reply With Quote
Old 12-21-2012, 06:36 AM   #14
nachocab
Member
 
Location: cambridge, MA

Join Date: Dec 2012
Posts: 11
Default

In case anyone is still struggling with this issue, I was able to get rid of this error by using a newer version of tophat (2.0.6). This is the call that I used (for single-end 50bp reads):

Code:
tophat --library-type fr-secondstrand --segment-length 25 --no-coverage-search --no-novel-juncs -G gencode.v14.annotation.gtf -o my_output_dir --color --bowtie1 --quals --transcriptome-index my_transcriptome_index hg19 "/unprotected/projects/lasvchal/moss/raw_data/my.csfasta" "/unprotected/projects/lasvchal/moss/raw_data/my_QV.qual"
nachocab is offline   Reply With Quote
Reply

Tags
rna-seq, solid, tophat2

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:28 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO