Seqanswers Leaderboard Ad

**krobison** · 10-07-2010, 04:34 AM

Check in the logs subdirectory of the directory you told Tophat to work in; there will be a prep_reads.log there -- post that & we can see if it is the same trouble I am running into or something I haven't (yet) butted heads against.

**DerSeb** · 10-07-2010, 04:43 AM

There is a prep_reads.log file in the logs directory, but it is empty.

The run.log shows how prep_reads was started:

Code:

/usr/local/tophat_1.1.0/prep_reads --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir /home/schaefer/tophat/Sample14/ --max-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --sam-header /home/schaefer/tophat/Sample14/tmp/stub_header.sam --no-microexon-search --integer-quals --color --fastq /home/schaefer/tophat/fastq/Sample14Ns.fastq

**dsidote** · 10-07-2010, 06:06 AM

I am running into the same problem. I converted from csfasta and qual to fastq using solid2fastq from the Bfast package.

The prep_reads.log says

prep_reads v1.1.0 (1604)
---------------------------
Saw ASCII character 29 but expected 33-based Phred qual.
terminate called after throwing an instance of 'int'

Running prep_reads directly give me the following error

~/Desktop > prep_reads test.fastq
prep_reads v1.1.0 (1604)
---------------------------
Error: qual length (62) differs from seq length (51) for fastq record !

**DerSeb** · 10-07-2010, 06:41 AM

Since your quality and sequence files differ in length, maybe you have not changed the negative values in your .qual file? (see above command)

**krobison** · 10-07-2010, 06:52 AM

Does someone (such as the Tophat team) have a small colorspace dataset which works in Tophat that they'd be willing & able to make public? Having a positive control would be awfully handy.

**dsidote** · 10-07-2010, 06:58 AM

Originally posted by DerSeb View Post

Since your quality and sequence files differ in length, maybe you have not changed the negative values in your .qual file? (see above command)

Since I have many more reads then I need I just removed all of the reads with missed colorcalls from the csfasta and qual files. It was about 1% of the total number of reads that got removed.

**dsidote** · 10-07-2010, 09:25 AM

I emailed Daehwan and he said the command is

Code:

tophat --color --quals bowtie_color_index 1.csfasta 1.qual

You don't have to convert to fastq, but you do have to remove the header lines from the csfasta and qual files before running tophat. Also, you have to replace the '.' with N in the csfasta and replace '-1' with 0 in the qual file.

It seems to be running fine following the above.

**dcjones** · 10-07-2010, 11:11 AM

Here is a patch fixing all the colorspace bugs I've encountered so far:
1. prep_reads failing on -1 qualities
2. prep_reads failing on comments
3. prep_reads converting . to N only on the last read

Change a few lines of the source code seemed preferable to changing the reads.

I haven't finished a run yet so I'm not sure there if there are other bugs.

To apply the patch,

Code:

tar xzf tophat-1.1.0.tar.gz
# put tophat-1.1.0.colorspace.patch.gz in tophat-1.1.0
cd tophat-1.1.0
gunzip tophat-1.1.0.colorspace.patch.gz
patch -p0 < tophat-1.1.0.colorspace.patch
./configure ; make ; make install

Attached Files

tophat-1.1.0.colorspace.patch.gz (675 Bytes, 64 views)

**dsidote** · 10-07-2010, 12:11 PM

Thanks dcjones. Much easier than messing with multi-GB files.

**hyjkim** · 10-07-2010, 01:21 PM

Originally posted by dcjones View Post

Here is a patch fixing all the colorspace bugs I've encountered so far:
1. prep_reads failing on -1 qualities
2. prep_reads failing on comments
3. prep_reads converting . to N only on the last read

Change a few lines of the source code seemed preferable to changing the reads.

I haven't finished a run yet so I'm not sure there if there are other bugs.

To apply the patch,

Code:

tar xzf tophat-1.1.0.tar.gz
# put tophat-1.1.0.colorspace.patch.gz in tophat-1.1.0
cd tophat-1.1.0
gunzip tophat-1.1.0.colorspace.patch.gz
patch -p0 < tophat-1.1.0.colorspace.patch
./configure ; make ; make install

Thanks so much for doing this dcjones.

I was finally able to process a small subset of solid data without tophat crashing! One note, comment lines still cause the tophat error

Code:

[Thu Oct  7 14:02:44 2010] Beginning TopHat run (v1.1.0)
-----------------------------------------------
[Thu Oct  7 14:02:44 2010] Preparing output location tophat-out/short/
[Thu Oct  7 14:02:44 2010] Checking for Bowtie index files
[Thu Oct  7 14:02:44 2010] Checking for reference FASTA file
[Thu Oct  7 14:02:44 2010] Checking for Bowtie
        Bowtie version:                  0.12.7.0
[Thu Oct  7 14:02:44 2010] Checking for Samtools
        Samtools version:                0.1.8.0
[Thu Oct  7 14:03:09 2010] Checking reads
Error: file short.csfasta does not appear to be a valid FASTA or FASTQ file

Error encountered parsing file short.csfasta:
 Records in Fastq files should start with '@' character

**DerSeb** · 10-08-2010, 03:47 AM

hello and thx for all the ideas and help. I don't want to mess around with the installation so far and still try to convert my csfasta to Ns only. however, when I use this:

Code:

awk '{if ( (NR-2)%2==0) {gsub(/\./,"N",$0); print $0} else { print $0}}'

to change . to N and this

Code:

sed '1,3d'

to remove header lines

to make my .csfasta:

Code:

>853_8_25_F3
T32NN232N3N10NN123NN202NN101NN010N1120NN012N0002N0N
>853_8_35_F3
T31NN012N0N01NN011NN112NN300NNNN2N2001NN1N0N1N1NN3N
>853_8_75_F3
T32NN011N1N31NN001NN301NN120NN232N2201NN231N1202N1N
>853_8_96_F3
T32NN212N2N30NN113NN201NN002NNN20N3010NN3N0N0N0NN0N
>853_8_114_F3
T30NN111N1N00NN212NN133NN331NNNN1N2333NN1N3N2N2NN0N

and this

Code:

sed -i 's_-1_0_g'

to get my .qual file:

Code:

>853_8_25_F3
10 9 0 0 4 4 4 0 4 0 4 4 0 0 4 4 4 0 0 4 8 5 0 0 4 4 4 0 0 4 4 10 0 5 6 4 5 0 0 6 7 4 0 6 8 8 6 0 5 0 
>853_8_35_F3
30 25 0 0 28 20 19 0 31 0 28 10 0 0 32 24 23 0 0 22 19 20 0 0 26 24 23 0 0 0 0 17 0 28 31 29 23 0 0 27 0 19 0 27 0 21 0 0 26 0 
>853_8_75_F3
30 28 0 0 26 27 22 0 15 0 26 19 0 0 28 26 25 0 0 29 26 28 0 0 24 19 27 0 0 13 11 20 0 13 26 17 21 0 0 24 4 8 0 7 4 7 7 0 7 0 
>853_8_96_F3
8 4 0 0 7 4 5 0 4 0 4 4 0 0 7 4 4 0 0 4 4 4 0 0 5 4 4 0 0 0 6 5 0 6 5 4 5 0 0 4 0 4 0 6 0 10 0 0 4 0 
>853_8_114_F3
30 31 0 0 19 19 21 0 28 0 28 27 0 0 6 14 6 0 0 7 25 22 0 0 4 29 16 0 0 0 0 4 0 17 4 27 16 0 0 5 0 19 0 8 0 7 0 0 15 0

I still get the same error... The log file of prep_reads is empty. There is also an empty file called left_kept_reads.fq in the output directory.

Code:

[Fri Oct  8 11:17:26 2010] Beginning TopHat run (v1.1.0)
-----------------------------------------------
[Fri Oct  8 11:17:26 2010] Preparing output location /home/schaefer/tophat/RBM20/Sample14/
[Fri Oct  8 11:17:26 2010] Checking for Bowtie index files
[Fri Oct  8 11:17:26 2010] Checking for reference FASTA file
[Fri Oct  8 11:17:26 2010] Checking for Bowtie
	Bowtie version:			 0.12.3.0
[Fri Oct  8 11:17:26 2010] Checking for Samtools
	Samtools version:		 0.1.8.0
[Fri Oct  8 11:17:32 2010] Checking reads
	min read length: 50bp, max read length: 50bp
	format:		 fasta
	[FAILED]
Error: could not execute prep_reads

Also both .csfasta and .qual have the same number of lines.

I guess my files should be alright and it must be a problem with the installation?
Did you all compile tophat on your system or did you just copy pre-compiled files?

**dsidote** · 10-08-2010, 05:12 AM

I used the precompiled version and it worked. Our sysadmin is recompiling the code with dcjones patch, so as soon as that is done I will test it with unmodified data.

DerSeb: Did you try removing the reads with the missed colorcalls instead of converting to 'N' to see if the mixed colorspace-basespace is the issue?

**dcjones** · 10-08-2010, 10:58 AM

Originally posted by DerSeb View Post

hello and thx for all the ideas and help. I don't want to mess around with the installation so far and still try to convert my csfasta to Ns only. however, when I use this:

I don't thing there is a problem with the '.'s needing to be 'N's. It expects '.'s in colorspace reads. The problem is that tophat converts the '.'s to 'N's on exactly one read (the last read), and it should not.

I don't know that you can modify your reads to work around that.

**krobison** · 10-09-2010, 04:34 AM

I finally have worked out a full script which converts my single-end SOLiD data from the Short Read Archive in FASTQ (aside: it would be great if the TopHat team swiped the colorspace reading code from Bowtie, which is much more flexible in the format)

Code:

#!/usr/bin/perl
use strict;

# reformat single-end SOLiD data from Short Read Archive
# to work successfully with patched version of TopHat 1.1.0

# sample bash command line to run TopHat
# tophat --color -G hg18.ref-genes.gtf -o SRR040290-tophat -p 8 --quals hg18  SRR040290.csfasta  SRR040290.qual 1> tophat.out 2> tophat.err

foreach my $arg(@ARGV)
{
    my ($stem)=($arg=~/(SRR[0-9]+)/);;
    die "Could not identify stem in $arg\n" unless (defined $stem);
    
    open(IN,$arg);
    open(FASTA,">$stem.csfasta");
    open(QUAL,">$stem.qual");
    while (my $idLineA=<IN>)
    {
	chomp($idLineA);
	my ($id)=($idLineA=~/^.([^ ]+)/);
	my $seqLine=<IN>;
	my $idLineB=<IN>;
	my $qualLine=<IN>;
	chomp($qualLine);
	my @qualVals=();
	foreach my $qualChar(split(//,$qualLine))
	{
	    my $qualVal=ord($qualChar)-33;
	    if ($qualVal<0)
	    {
		$qualVal=0;  # forbid negative qual values
		print STDERR "Negative qual implied by >$qualChar< for $idLineB\n";
	    }
	    push(@qualVals,$qualVal);
	}
	shift(@qualVals); # dump first qual val
	print FASTA ">$id\n";
	print FASTA $seqLine;
	print QUAL ">$id\n";
	print QUAL join(" ",@qualVals),"\n";
    }
}

So now TopHat runs and exits with success. The junctions.bed file has junctions (a few shown)

Code:

track name=junctions description="TopHat junctions"
chr20   204696  205452  JUNC00000001    1       -       204696  205452  255,0,0 2       31,19   0,737
chr20   205747  205877  JUNC00000002    1       -       205747  205877  255,0,0 2       39,11   0,119
chr20   218929  219262  JUNC00000003    1       -       218929  219262  255,0,0 2       19,31   0,302
chr20   275750  277924  JUNC00000004    3       +       275750  277924  255,0,0 2       36,32   0,2142
chr20   277975  278299  JUNC00000005    1       +       277975  278299  255,0,0 2       32,18   0,306
chr20   278448  281888  JUNC00000006    2       +       278448  281888  255,0,0 2       28,35   0,3405
chr20   309778  316694  JUNC00000007    8       +       309778  316694  255,0,0 2       35,40   0,6876
chr20   320182  324880  JUNC00000008    21      +       320182  324880  255,0,0 2       41,39   0,4659

BUT, the accepted_hits.bam file is empty! What did I do wrong this time?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 23 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Reformating SOLiD input for TopHat 1.1

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News