![]() |
tophat running problem
Hi all,
I'm trying to run mRNA-seq for human by tophat (v1.0.12). I succeeded to get proper output file in the preliminary dataset (first 100K reads from each .fq file). But I failed to get proper output in the real dataset (each contains ~17M reads). I would appreciate any help you could give me with this. Thanks in advance. -Yasu ### preliminary_test ### $ tophat -r 10 -p 8 -o tophat_hg19_test hg19 s_1_1.head4000000.fq,s_6_1.head4000000.fq,s_7_1.head4000000.fq s_1_2.head4000000.fq,s_6_2.head4000000.fq,s_7_2.head4000000.fq [Mon Feb 8 13:31:03 2010] Beginning TopHat run (v1.0.12) ----------------------------------------------- [Mon Feb 8 13:31:03 2010] Preparing output location tophat_hg19_test/ [Mon Feb 8 13:31:03 2010] Checking for Bowtie index files [Mon Feb 8 13:31:03 2010] Checking for reference FASTA file [Mon Feb 8 13:31:03 2010] Checking for Bowtie Bowtie version: 0.11.3.0 [Mon Feb 8 13:31:03 2010] Checking reads seed length: 43bp format: fastq quality scale: --phred33-quals [Mon Feb 8 13:31:51 2010] Mapping reads against hg19 with Bowtie [Mon Feb 8 13:34:24 2010] Joining segment hits [Mon Feb 8 13:34:59 2010] Mapping reads against hg19 with Bowtie [Mon Feb 8 13:37:30 2010] Joining segment hits [Mon Feb 8 13:38:04 2010] Searching for junctions via segment mapping [Mon Feb 8 13:44:59 2010] Retrieving sequences for splices [Mon Feb 8 13:46:42 2010] Indexing splices [Mon Feb 8 13:47:58 2010] Mapping reads against segment_juncs with Bowtie [Mon Feb 8 13:48:47 2010] Joining segment hits [Mon Feb 8 13:49:26 2010] Mapping reads against segment_juncs with Bowtie [Mon Feb 8 13:50:15 2010] Joining segment hits [Mon Feb 8 13:50:52 2010] Reporting output tracks ----------------------------------------------- Run complete [00:33:58 elapsed] ### real_data ### tophat -r 10 -p 8 -o tophat_hg19 hg19 s_1_1.fq,s_6_1.fq,s_7_1.fq s_1_2.fq,s_6_2.fq,s_7_2.fq [Mon Feb 8 14:24:47 2010] Beginning TopHat run (v1.0.12) ----------------------------------------------- [Mon Feb 8 14:24:47 2010] Preparing output location tophat_hg19/ [Mon Feb 8 14:24:47 2010] Checking for Bowtie index files [Mon Feb 8 14:24:47 2010] Checking for reference FASTA file [Mon Feb 8 14:24:47 2010] Checking for Bowtie Bowtie version: 0.11.3.0 [Mon Feb 8 14:24:47 2010] Checking reads seed length: 43bp format: fastq quality scale: --phred33-quals [Mon Feb 8 14:39:23 2010] Mapping reads against hg19 with Bowtie [Mon Feb 8 15:24:40 2010] Joining segment hits [Mon Feb 8 15:35:38 2010] Mapping reads against hg19 with Bowtie [Mon Feb 8 16:18:44 2010] Joining segment hits [Mon Feb 8 16:18:44 2010] Searching for junctions via segment mapping Warning: junction database is empty! [Mon Feb 8 18:01:42 2010] Joining segment hits [Mon Feb 8 18:11:38 2010] Joining segment hits [Mon Feb 8 18:11:38 2010] Reporting output tracks [FAILED] Error: Report generation failed with err = 1 Traceback (most recent call last): File "/bin/tophat", line 1518, in ? sys.exit(main()) File "/bin/tophat", line 1490, in main params.gff_annotation) File "/bin/tophat", line 936, in compile_reports exit(1) TypeError: 'str' object is not callable |
I add the report.log file from real_data (failed one).
### Real_data (reports.log) ### tophat_reports v1.0.12 --------------------------------------- Error: cannot open map file for reading ##################### Comparing with the run.log files from preliminary_test (succeeded one) and from real_data (failed one), "/bin/segment_juncs" doesn't work well. Can somebody give me any help? Thanks, -Yasu |
The fact that TopHat thinks the seed length is 43bp is concerning. The default is 25, and it shouldn't be different unless you specified --segment-length, which you didn't. TopHat currently requires that FASTQ files have records where all of the nucleotides for each read appear on a single line. Same goes for the quality strings - all the quality characters need to be on one line. This is a limitation I haven't had time to fix yet. Can you verify that your FASTQ file is formatted this way?
|
Thanks for your kind help!!
My fastq file is something like this. I omitted the sequence+position id from the line after "+". Does this make the things bad? -Yasu ########### @HWI-EAS368:1:1:9:316#0/1 CTGGATGATAACATTCCAGAAGATGACTCAGGTGTCCCCACCC + BB66AB9ACBB@BCBAAA><BBBAAB?BBB@@@BA?B@BB@AB @HWI-EAS368:1:1:9:424#0/1 CTCCCTGCCAGATATCGAGGAGGTGAAAGACCAGAGCAGGAAC + BCBBB>?>@CBABBB;A877??.:<<B@;@@?=>?A>?6AAA? @HWI-EAS368:1:1:9:1060#0/1 TGGATGGTTCAGGATAATCACCTGAGCAGTGAAGCCAGCTGCT + BBBBB=?@BBB?=@A9AA@CAA><<@:5>7?=A?A@?=A???@ @HWI-EAS368:1:1:9:410#0/1 CGGAGGCGGAGGCTTGGGTGCGTTCAAGATTCAGCTTCACCCG + AA9AAA=:A7'=7=?4+366=AA@:A>999B:=2,=>1014>7 @HWI-EAS368:1:1:9:807#0/1 CGAACATTTCTGGCCCCCAAGTGTCAGCCCATTCACGTAAAAA + BBBBBBC@@<:;6BC>:@2<B=BBB@7;BB=:C@799:BBB?% @HWI-EAS368:1:1:9:405#0/1 TGTAAAGCCTGAAACAGCTGCCTGTGTGGGACTGAGATGCAGG + ?=>B=AA@AB@AAB?88>@@@BB>?B>A<=>?A81<-<@@B@@ |
I added the '--segment-length 25', but the comment is still as follows;
[Tue Feb 9 16:47:40 2010] Checking reads seed length: 43bp format: fastq quality scale: --phred33-quals Did I go some wrong way? -Yasu |
All times are GMT -8. The time now is 06:58 PM. |
Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.