![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mixed read lengths in TopHat input file | shurjo | Bioinformatics | 13 | 11-01-2012 05:46 AM |
Add Readgroups to Tophat multiple sample input? | Pamela Mukhopadhyay | Bioinformatics | 0 | 12-14-2011 09:55 PM |
Tophat Error: No unambiguous stretches of characters in the input(?). | thinkRNA | Bioinformatics | 2 | 10-03-2011 03:38 PM |
Tophat and SOLID -> zero kb .bed files | ahmetz | Bioinformatics | 1 | 06-16-2011 02:04 PM |
tophat pair end input | damiankao | Bioinformatics | 0 | 03-23-2011 09:06 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Singapore Join Date: Oct 2009
Posts: 44
|
![]()
Hello all,
I have now spent some time reformatting my .csfasta and .qual files to make them compatible to the new TopHat v 1.1. I would be interested to see your approaches! First I used the tools supplied with MAQ (fq_all2std.pl csfa2std and solid2fastq.pl). However these did not work, since they converted my colorspace code to nucleotide code. Then I chose to use solid2fastq.c from the BFAST package and this seems to work. I also converted the "." characters in my .csfasta to "N"s using: Code:
awk '{if ( (NR-2)%4==0) {gsub(/\./,"N",$0); print $0} else { print $0}}' Code:
sed -i 's_-1_1_g' Sample_14_Qual_minus.qual Code:
@853_8_25 T32NN232N3N10NN123NN202NN101NN010N1120NN012N0002N0N + +*""%%%"%"%%""%%%""%)&""%%%""%%+"&'%&""'(%"'))'"&" @853_8_35 T31NN012N0N01NN011NN112NN300NNNN2N2001NN1N0N1N1NN3N + ?:""=54"@"=+""A98""745"";98""""2"=@>8""<"4"<"6"";" Code:
[Thu Oct 7 09:32:30 2010] Beginning TopHat run (v1.1.0) ----------------------------------------------- [Thu Oct 7 09:32:30 2010] Preparing output location /home/schaefer/tophat/RBM20/Sample14/ [Thu Oct 7 09:32:30 2010] Checking for Bowtie index files [Thu Oct 7 09:32:30 2010] Checking for reference FASTA file [Thu Oct 7 09:32:30 2010] Checking for Bowtie Bowtie version: 0.12.3.0 [Thu Oct 7 09:32:30 2010] Checking for Samtools Samtools version: 0.1.8.0 [Thu Oct 7 09:32:37 2010] Checking reads min read length: 50bp, max read length: 50bp format: fastq quality scale: phred33 (default) [FAILED] Error: could not execute prep_reads Code:
/usr/local/tophat_1.1.0/tophat --integer-quals -o /home/schaefer/tophat/Sample14 -C rn4_c /home/schaefer/tophat/fastq/Sample14Ns.fastq Now I wonder if this is due to my input file or due to other problems mentioned here: http://seqanswers.com/forums/showthread.php?t=7141 What input files have worked for you and how did you generate them? All the best, Sebastian |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Boston area Join Date: Nov 2007
Posts: 747
|
![]()
Check in the logs subdirectory of the directory you told Tophat to work in; there will be a prep_reads.log there -- post that & we can see if it is the same trouble I am running into or something I haven't (yet) butted heads against.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Singapore Join Date: Oct 2009
Posts: 44
|
![]()
There is a prep_reads.log file in the logs directory, but it is empty.
The run.log shows how prep_reads was started: Code:
/usr/local/tophat_1.1.0/prep_reads --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir /home/schaefer/tophat/Sample14/ --max-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --sam-header /home/schaefer/tophat/Sample14/tmp/stub_header.sam --no-microexon-search --integer-quals --color --fastq /home/schaefer/tophat/fastq/Sample14Ns.fastq |
![]() |
![]() |
![]() |
#4 |
Member
Location: New Jersey Join Date: Aug 2009
Posts: 23
|
![]()
I am running into the same problem. I converted from csfasta and qual to fastq using solid2fastq from the Bfast package.
The prep_reads.log says prep_reads v1.1.0 (1604) --------------------------- Saw ASCII character 29 but expected 33-based Phred qual. terminate called after throwing an instance of 'int' Running prep_reads directly give me the following error ~/Desktop > prep_reads test.fastq prep_reads v1.1.0 (1604) --------------------------- Error: qual length (62) differs from seq length (51) for fastq record ! Last edited by dsidote; 10-07-2010 at 07:23 AM. |
![]() |
![]() |
![]() |
#5 |
Member
Location: Singapore Join Date: Oct 2009
Posts: 44
|
![]()
Since your quality and sequence files differ in length, maybe you have not changed the negative values in your .qual file? (see above command)
|
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: Boston area Join Date: Nov 2007
Posts: 747
|
![]()
Does someone (such as the Tophat team) have a small colorspace dataset which works in Tophat that they'd be willing & able to make public? Having a positive control would be awfully handy.
|
![]() |
![]() |
![]() |
#7 |
Member
Location: New Jersey Join Date: Aug 2009
Posts: 23
|
![]()
Since I have many more reads then I need I just removed all of the reads with missed colorcalls from the csfasta and qual files. It was about 1% of the total number of reads that got removed.
|
![]() |
![]() |
![]() |
#8 |
Member
Location: New Jersey Join Date: Aug 2009
Posts: 23
|
![]()
I emailed Daehwan and he said the command is
Code:
tophat --color --quals bowtie_color_index 1.csfasta 1.qual It seems to be running fine following the above. |
![]() |
![]() |
![]() |
#9 |
Junior Member
Location: Seattle, WA Join Date: Jul 2010
Posts: 4
|
![]()
Here is a patch fixing all the colorspace bugs I've encountered so far:
1. prep_reads failing on -1 qualities 2. prep_reads failing on comments 3. prep_reads converting . to N only on the last read Change a few lines of the source code seemed preferable to changing the reads. I haven't finished a run yet so I'm not sure there if there are other bugs. To apply the patch, Code:
tar xzf tophat-1.1.0.tar.gz # put tophat-1.1.0.colorspace.patch.gz in tophat-1.1.0 cd tophat-1.1.0 gunzip tophat-1.1.0.colorspace.patch.gz patch -p0 < tophat-1.1.0.colorspace.patch ./configure ; make ; make install |
![]() |
![]() |
![]() |
#10 |
Member
Location: New Jersey Join Date: Aug 2009
Posts: 23
|
![]()
Thanks dcjones. Much easier than messing with multi-GB files.
|
![]() |
![]() |
![]() |
#11 | |
Member
Location: Santa Cruz Join Date: Apr 2010
Posts: 18
|
![]() Quote:
I was finally able to process a small subset of solid data without tophat crashing! One note, comment lines still cause the tophat error Code:
[Thu Oct 7 14:02:44 2010] Beginning TopHat run (v1.1.0) ----------------------------------------------- [Thu Oct 7 14:02:44 2010] Preparing output location tophat-out/short/ [Thu Oct 7 14:02:44 2010] Checking for Bowtie index files [Thu Oct 7 14:02:44 2010] Checking for reference FASTA file [Thu Oct 7 14:02:44 2010] Checking for Bowtie Bowtie version: 0.12.7.0 [Thu Oct 7 14:02:44 2010] Checking for Samtools Samtools version: 0.1.8.0 [Thu Oct 7 14:03:09 2010] Checking reads Error: file short.csfasta does not appear to be a valid FASTA or FASTQ file Error encountered parsing file short.csfasta: Records in Fastq files should start with '@' character |
|
![]() |
![]() |
![]() |
#12 |
Member
Location: Singapore Join Date: Oct 2009
Posts: 44
|
![]()
hello and thx for all the ideas and help. I don't want to mess around with the installation so far and still try to convert my csfasta to Ns only. however, when I use this:
Code:
awk '{if ( (NR-2)%2==0) {gsub(/\./,"N",$0); print $0} else { print $0}}' Code:
sed '1,3d' to make my .csfasta: Code:
>853_8_25_F3 T32NN232N3N10NN123NN202NN101NN010N1120NN012N0002N0N >853_8_35_F3 T31NN012N0N01NN011NN112NN300NNNN2N2001NN1N0N1N1NN3N >853_8_75_F3 T32NN011N1N31NN001NN301NN120NN232N2201NN231N1202N1N >853_8_96_F3 T32NN212N2N30NN113NN201NN002NNN20N3010NN3N0N0N0NN0N >853_8_114_F3 T30NN111N1N00NN212NN133NN331NNNN1N2333NN1N3N2N2NN0N Code:
sed -i 's_-1_0_g' Code:
>853_8_25_F3 10 9 0 0 4 4 4 0 4 0 4 4 0 0 4 4 4 0 0 4 8 5 0 0 4 4 4 0 0 4 4 10 0 5 6 4 5 0 0 6 7 4 0 6 8 8 6 0 5 0 >853_8_35_F3 30 25 0 0 28 20 19 0 31 0 28 10 0 0 32 24 23 0 0 22 19 20 0 0 26 24 23 0 0 0 0 17 0 28 31 29 23 0 0 27 0 19 0 27 0 21 0 0 26 0 >853_8_75_F3 30 28 0 0 26 27 22 0 15 0 26 19 0 0 28 26 25 0 0 29 26 28 0 0 24 19 27 0 0 13 11 20 0 13 26 17 21 0 0 24 4 8 0 7 4 7 7 0 7 0 >853_8_96_F3 8 4 0 0 7 4 5 0 4 0 4 4 0 0 7 4 4 0 0 4 4 4 0 0 5 4 4 0 0 0 6 5 0 6 5 4 5 0 0 4 0 4 0 6 0 10 0 0 4 0 >853_8_114_F3 30 31 0 0 19 19 21 0 28 0 28 27 0 0 6 14 6 0 0 7 25 22 0 0 4 29 16 0 0 0 0 4 0 17 4 27 16 0 0 5 0 19 0 8 0 7 0 0 15 0 Code:
[Fri Oct 8 11:17:26 2010] Beginning TopHat run (v1.1.0) ----------------------------------------------- [Fri Oct 8 11:17:26 2010] Preparing output location /home/schaefer/tophat/RBM20/Sample14/ [Fri Oct 8 11:17:26 2010] Checking for Bowtie index files [Fri Oct 8 11:17:26 2010] Checking for reference FASTA file [Fri Oct 8 11:17:26 2010] Checking for Bowtie Bowtie version: 0.12.3.0 [Fri Oct 8 11:17:26 2010] Checking for Samtools Samtools version: 0.1.8.0 [Fri Oct 8 11:17:32 2010] Checking reads min read length: 50bp, max read length: 50bp format: fasta [FAILED] Error: could not execute prep_reads I guess my files should be alright and it must be a problem with the installation? Did you all compile tophat on your system or did you just copy pre-compiled files? |
![]() |
![]() |
![]() |
#13 |
Member
Location: New Jersey Join Date: Aug 2009
Posts: 23
|
![]()
I used the precompiled version and it worked. Our sysadmin is recompiling the code with dcjones patch, so as soon as that is done I will test it with unmodified data.
DerSeb: Did you try removing the reads with the missed colorcalls instead of converting to 'N' to see if the mixed colorspace-basespace is the issue? Last edited by dsidote; 10-08-2010 at 06:38 AM. |
![]() |
![]() |
![]() |
#14 | |
Junior Member
Location: Seattle, WA Join Date: Jul 2010
Posts: 4
|
![]() Quote:
I don't know that you can modify your reads to work around that. |
|
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: Boston area Join Date: Nov 2007
Posts: 747
|
![]()
I finally have worked out a full script which converts my single-end SOLiD data from the Short Read Archive in FASTQ (aside: it would be great if the TopHat team swiped the colorspace reading code from Bowtie, which is much more flexible in the format)
Code:
#!/usr/bin/perl use strict; # reformat single-end SOLiD data from Short Read Archive # to work successfully with patched version of TopHat 1.1.0 # sample bash command line to run TopHat # tophat --color -G hg18.ref-genes.gtf -o SRR040290-tophat -p 8 --quals hg18 SRR040290.csfasta SRR040290.qual 1> tophat.out 2> tophat.err foreach my $arg(@ARGV) { my ($stem)=($arg=~/(SRR[0-9]+)/);; die "Could not identify stem in $arg\n" unless (defined $stem); open(IN,$arg); open(FASTA,">$stem.csfasta"); open(QUAL,">$stem.qual"); while (my $idLineA=<IN>) { chomp($idLineA); my ($id)=($idLineA=~/^.([^ ]+)/); my $seqLine=<IN>; my $idLineB=<IN>; my $qualLine=<IN>; chomp($qualLine); my @qualVals=(); foreach my $qualChar(split(//,$qualLine)) { my $qualVal=ord($qualChar)-33; if ($qualVal<0) { $qualVal=0; # forbid negative qual values print STDERR "Negative qual implied by >$qualChar< for $idLineB\n"; } push(@qualVals,$qualVal); } shift(@qualVals); # dump first qual val print FASTA ">$id\n"; print FASTA $seqLine; print QUAL ">$id\n"; print QUAL join(" ",@qualVals),"\n"; } } Code:
track name=junctions description="TopHat junctions" chr20 204696 205452 JUNC00000001 1 - 204696 205452 255,0,0 2 31,19 0,737 chr20 205747 205877 JUNC00000002 1 - 205747 205877 255,0,0 2 39,11 0,119 chr20 218929 219262 JUNC00000003 1 - 218929 219262 255,0,0 2 19,31 0,302 chr20 275750 277924 JUNC00000004 3 + 275750 277924 255,0,0 2 36,32 0,2142 chr20 277975 278299 JUNC00000005 1 + 277975 278299 255,0,0 2 32,18 0,306 chr20 278448 281888 JUNC00000006 2 + 278448 281888 255,0,0 2 28,35 0,3405 chr20 309778 316694 JUNC00000007 8 + 309778 316694 255,0,0 2 35,40 0,6876 chr20 320182 324880 JUNC00000008 21 + 320182 324880 255,0,0 2 41,39 0,4659 |
![]() |
![]() |
![]() |
#16 |
Junior Member
Location: San Diego, CA Join Date: Oct 2010
Posts: 1
|
![]()
I have installed the patch posted by dcjones on , but when running Tophat on SOLiD colorspace paired-end read data I seem to have uncovered another problem with convert_color_to_bp:
tophat -C -F 0.10 -p 12 --mate-inner-dist 125 --mate-std-dev 25 --microexon-search --GFF ucsc-genes-with-GRCh-IDs.gtf GRCh37-lite_c test_F3.csfasta test_F5-P2.csfasta [Fri Oct 8 15:15:19 2010] Beginning TopHat run (v1.1.0) ----------------------------------------------- [Fri Oct 8 15:15:19 2010] Preparing output location ./tophat_out/ [Fri Oct 8 15:15:19 2010] Checking for Bowtie index files [Fri Oct 8 15:15:19 2010] Checking for reference FASTA file [Fri Oct 8 15:15:19 2010] Checking for Bowtie Bowtie version: 0.12.7.0 [Fri Oct 8 15:15:19 2010] Checking for Samtools Samtools version: 0.1.8.0 [Fri Oct 8 15:15:35 2010] Checking reads min read length: 25bp, max read length: 50bp format: fasta [Fri Oct 8 17:05:10 2010] Reading known junctions from GFF file [Fri Oct 8 18:01:36 2010] Mapping reads against GRCh37-lite_c with Bowtie [Fri Oct 8 23:30:08 2010] Joining segment hits [Sat Oct 9 04:14:44 2010] Mapping reads against GRCh37-lite_c with Bowtie(1/2) [Sat Oct 9 08:28:21 2010] Mapping reads against GRCh37-lite_c with Bowtie(2/2) [Sat Oct 9 12:59:31 2010] Mapping reads against GRCh37-lite_c with Bowtie [Sun Oct 10 00:08:07 2010] Joining segment hits Traceback (most recent call last): File "/usr/local/bin/tophat", line 2174, in <module> sys.exit(main()) File "/usr/local/bin/tophat", line 2133, in main user_supplied_juncs) File "/usr/local/bin/tophat", line 1848, in spliced_alignment segment_len) File "/usr/local/bin/tophat", line 1570, in split_reads split_record(read_name, read_seq, read_quals, output_files, offsets, color) File "/usr/local/bin/tophat", line 1503, in split_record read_seq_temp = convert_color_to_bp(read_seq) File "/usr/local/bin/tophat", line 1477, in convert_color_to_bp base = decode_dic[base+ch] KeyError: '+1' make: *** [tophat_out/accepted_hits.sam] Error 1 |
![]() |
![]() |
![]() |
#17 | |
Member
Location: Singapore Join Date: Oct 2009
Posts: 44
|
![]() Quote:
![]() If this fails I will try with the patch next! |
|
![]() |
![]() |
![]() |
#18 | |
Member
Location: Singapore Join Date: Oct 2009
Posts: 44
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#19 |
Senior Member
Location: Boston area Join Date: Nov 2007
Posts: 747
|
![]()
BUT, the accepted_hits.bam file is empty! What did I do wrong this time?[/QUOTE]
Apparently I have gremlins; another run worked fine. |
![]() |
![]() |
![]() |
#20 |
Member
Location: uk Join Date: Apr 2010
Posts: 43
|
![]()
Version 1.1.1 on the main page apparently includes fixes for these bugs...
|
![]() |
![]() |
![]() |
Tags |
solid, tophat |
Thread Tools | |
|
|