SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie colorspace output has less NT than input reads SongLi Bioinformatics 5 01-20-2011 08:39 AM
Tophat with colorspace zorph RNA Sequencing 1 11-08-2010 10:54 AM
TopHat & Cufflinks failing to assemble full length transcripts jlhaner Bioinformatics 3 10-13-2010 11:46 AM
problems with tophat failing to find bowtie (MacOSX) martinobarenco Bioinformatics 5 09-23-2010 07:03 AM
BWA mapping colorspace reads Todd Scheetz SOLiD 2 08-25-2010 07:16 PM

Reply
 
Thread Tools
Old 10-03-2010, 09:23 PM   #1
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default TopHat 1.1 failing on colorspace SE reads

I'm trying to analyze a single end dataset from SRA with the brand new version of TopHat (1.1). TopHat crashes with the below error message, and looking at this & the code it appears that even with single ends it is trying to run a validation check on the 2nd set of reads


Code:
  File "/usr/local/bin/tophat", line 2093, in main
    params.read_params = check_reads(params.read_params, left_reads_list + "," + right_reads_list)
TypeError: cannot concatenate 'str' and 'NoneType' objects

If I replace line 2093 with

Code:
        if params.skip_check_reads == False:
            if right_reads_list !=None:
                params.read_params = check_reads(params.read_params, left_reads_list + "," + right_reads_list)
            else:
                params.read_params = check_reads(params.read_params, left_reads_list)
Then I get farther but hit a new error

Code:
Length mismatch between sequence and quality strings for SRR040290.1 VAB_ugc_85__100_137__138_121__123_bc_Frag50_solid0032_20090715_ugc_121__1231_49_36 length=50 (51 vs 51).
I'm too worn out to puzzle how to get past that one -- my best guess is this is related to the "extra" colorspace value which bowtie option "--col-keepends" deals with
krobison is offline   Reply With Quote
Old 10-04-2010, 01:45 AM   #2
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

Yep, I just got the same error message (the first one; haven't tried to modify the code). I'm also using single-end color space reads (.csfasta + .qual)
kopi-o is offline   Reply With Quote
Old 10-04-2010, 05:15 AM   #3
Telor
Junior Member
 
Location: Ieland

Join Date: Jul 2010
Posts: 6
Default

Yes, Just to confirm that you are not the only one, I'm getting this error too, but on standard single end reads (not color-space).
Telor is offline   Reply With Quote
Old 10-04-2010, 09:32 AM   #4
sunnyjoy
Junior Member
 
Location: Oxford,oh

Join Date: Sep 2009
Posts: 1
Default

Google search the same error message leads me here. Same problem with single-end color space reads.
sunnyjoy is offline   Reply With Quote
Old 10-04-2010, 09:40 AM   #5
DerSeb
Member
 
Location: Singapore

Join Date: Oct 2009
Posts: 44
Default

same here.
DerSeb is offline   Reply With Quote
Old 10-04-2010, 10:11 AM   #6
Daehwan
Member
 
Location: College Park

Join Date: Oct 2010
Posts: 27
Default

Hi guys,

I'm Daehwan, who made this bug and fixed it, you can grab a fixed version at http://tophat.cbcb.umd.edu/index.html

Thanks
Daehwan is offline   Reply With Quote
Old 10-04-2010, 06:33 PM   #7
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Thanks for the rapid fix!!
krobison is offline   Reply With Quote
Old 10-05-2010, 12:41 AM   #8
DerSeb
Member
 
Location: Singapore

Join Date: Oct 2009
Posts: 44
Default

great, i will try it right now! thx for the great support!
DerSeb is offline   Reply With Quote
Old 10-05-2010, 02:24 AM   #9
Telor
Junior Member
 
Location: Ieland

Join Date: Jul 2010
Posts: 6
Default

Great. Thanks for the speedy fix. It looks like its working fine so far (fingers crossed )
Telor is offline   Reply With Quote
Old 10-05-2010, 09:18 AM   #10
DerSeb
Member
 
Location: Singapore

Join Date: Oct 2009
Posts: 44
Default

Hello. I can now successfully start and get passed the first error encountered here. However, I still run into the next error mentioned above:

Code:
Tue Oct  5 16:10:32 2010] Beginning TopHat run (v1.1.0)
-----------------------------------------------
[Tue Oct  5 16:10:32 2010] Preparing output location /home/schaefer/tophat/RBM20/Sample14//
[Tue Oct  5 16:10:32 2010] Checking for Bowtie index files
[Tue Oct  5 16:10:32 2010] Checking for reference FASTA file
[Tue Oct  5 16:10:32 2010] Checking for Bowtie
	Bowtie version:			 0.12.3.0
[Tue Oct  5 16:10:32 2010] Checking for Samtools
	Samtools version:		 0.1.8.0
[Tue Oct  5 16:10:39 2010] Checking reads

Error encountered parsing file ...fastq:
 Length mismatch between sequence and quality strings for 853_8_25/1 (49 vs 49).
When I check the fastq file, everything seems fine:

Code:
@853_8_25/1
GNNGTGNTNCANNCGTNNGAGNNCACNNACANCCGANNACGNAAAGNAN
+
*""%%%"%"%%""%%%""%)&""%%%""%%+"&'%&""'(%"'))'"&"
@853_8_35/1
CNNACGNANACNNACCNNCCGNNTAANNNNGNGAACNNCNANCNCNNTN
+
:""=54"@"=+""A98""745"";98""""2"=@>8""<"4"<"6"";"
@853_8_75/1
GNNACCNCNTCNNAACNNTACNNCGANNGTGNGGACNNGTCNCGAGNCN
+
="";<7"0";4""=;:"">;=""94<"".,5".;26""9%)"(%(("("
@853_8_96/1
...
Is this error still occuring to someone else?
DerSeb is offline   Reply With Quote
Old 10-05-2010, 09:23 AM   #11
Daehwan
Member
 
Location: College Park

Join Date: Oct 2010
Posts: 27
Default

DerSeb, what's your command?
Daehwan is offline   Reply With Quote
Old 10-05-2010, 10:38 AM   #12
DerSeb
Member
 
Location: Singapore

Join Date: Oct 2009
Posts: 44
Default

This is my command:

Code:
tophat -G /data/genetics/datasets/genome-annotation/ensembl-56/Rattus_norvegicus.RGSC3.4.56.gtf -o /home/schaefer/tophat/Sample14 -C rn4_c /home/schaefer/tophat/fastq/Sample_14_Qual.fastq
DerSeb is offline   Reply With Quote
Old 10-05-2010, 10:48 AM   #13
Daehwan
Member
 
Location: College Park

Join Date: Oct 2010
Posts: 27
Default

Since you are using -C, which is for colorspace read, you need to use colorspace reads instead of nucleotide reads.
Daehwan is offline   Reply With Quote
Old 10-05-2010, 11:10 AM   #14
DerSeb
Member
 
Location: Singapore

Join Date: Oct 2009
Posts: 44
Default

I see... I converted my CS reads to fastq, using scripts supplied with MAQ (solid2fastq.pl or fq_all2std.pl csfa2std). They convert the cs values to letters mimicking a "pseudo" genetic sequence.

I will look into this and see how I can change this.

Thx!
DerSeb is offline   Reply With Quote
Old 10-07-2010, 07:44 AM   #15
DerSeb
Member
 
Location: Singapore

Join Date: Oct 2009
Posts: 44
Default

I have now started a thread dedicated to reformatting SOLiD reads for TopHat:

http://seqanswers.com/forums/showthread.php?p=26692
DerSeb is offline   Reply With Quote
Old 10-07-2010, 10:46 AM   #16
hyjkim
Member
 
Location: Santa Cruz

Join Date: Apr 2010
Posts: 18
Default

I was able to get tophat running on colorspace reads by doing what many others have mentioned: Removing comment lines in .csfasta and .qual; and converting all -1 qualities to 0.

On 5/8 of my datasets, tophat works great. On the remaining 3 I get the following error.

Code:
Traceback (most recent call last):
  File "/share/apps/tophat-1.1.0.Linux_x86_64/tophat", line 2174, in ?
    sys.exit(main())
  File "/share/apps/tophat-1.1.0.Linux_x86_64/tophat", line 2133, in main
    user_supplied_juncs)
  File "/share/apps/tophat-1.1.0.Linux_x86_64/tophat", line 1848, in spliced_alignment
    segment_len)
  File "/share/apps/tophat-1.1.0.Linux_x86_64/tophat", line 1570, in split_reads
    split_record(read_name, read_seq, read_quals, output_files, offsets, color)
  File "/share/apps/tophat-1.1.0.Linux_x86_64/tophat", line 1503, in split_record
    read_seq_temp = convert_color_to_bp(read_seq)
  File "/share/apps/tophat-1.1.0.Linux_x86_64/tophat", line 1477, in convert_color_to_bp
    base = decode_dic[base+ch]
KeyError: 'GN'
Has anyone else seen an error like this? Is there a fix available?
hyjkim is offline   Reply With Quote
Old 10-07-2010, 12:12 PM   #17
dcjones
Junior Member
 
Location: Seattle, WA

Join Date: Jul 2010
Posts: 4
Default

Quote:
Originally Posted by hyjkim View Post
Has anyone else seen an error like this? Is there a fix available?
Yes, I ran into the same thing. I just posted my fix (to the source code) on this thread: http://seqanswers.com/forums/showthread.php?p=26692
dcjones is offline   Reply With Quote
Old 10-21-2010, 10:15 AM   #18
nameeta
Junior Member
 
Location: seattle, wa

Join Date: Apr 2010
Posts: 2
Default

Quote:
Originally Posted by Daehwan View Post
Hi guys,

I'm Daehwan, who made this bug and fixed it, you can grab a fixed version at http://tophat.cbcb.umd.edu/index.html

Thanks
Hi,
I ran tophat-1.1.1 on my single end solid data but I encounter the following error.


Traceback (most recent call last):
File "/mlab/software/tophat-1.1.1.Linux_x86_64/tophat", line 2166, in <module>
sys.exit(main())
File "/mlab/software/tophat-1.1.1.Linux_x86_64/tophat", line 2125, in main
user_supplied_juncs)
File "/mlab/software/tophat-1.1.1.Linux_x86_64/tophat", line 1840, in spliced_alignment
segment_len)
File "/mlab/software/tophat-1.1.1.Linux_x86_64/tophat", line 1562, in split_reads
split_record(read_name, read_seq, read_quals, output_files, offsets, color)
File "/mlab/software/tophat-1.1.1.Linux_x86_64/tophat", line 1495, in split_record
read_seq_temp = convert_color_to_bp(read_seq)
File "/mlab/software/tophat-1.1.1.Linux_x86_64/tophat", line 1469, in convert_color_to_bp
base = decode_dic[base+ch]
KeyError: 'CN'
nameeta is offline   Reply With Quote
Old 10-25-2010, 09:28 PM   #19
hyjkim
Member
 
Location: Santa Cruz

Join Date: Apr 2010
Posts: 18
Default

Hey Nameeta,

I also get the same types of errors with tophat v1.1.1. I am able to run tophat using v1.1.0 and dcjones' patch. I'd recommend you go that route until tophat fixes the solid-style "." wildcard errors.
hyjkim is offline   Reply With Quote
Old 10-26-2010, 09:35 AM   #20
nameeta
Junior Member
 
Location: seattle, wa

Join Date: Apr 2010
Posts: 2
Default

Thanks, I was able to fix it by changing tophat.py and then recompiling.

def convert_color_to_bp(color_seq):
decode_dic = { 'A0':'A', 'A1':'C', 'A2':'G', 'A3':'T', 'A4':'N', 'A.':'N', 'AN':'N',
'C0':'C', 'C1':'A', 'C2':'T', 'C3':'G', 'C4':'N', 'C.':'N', 'CN':'N',
'G0':'G', 'G1':'T', 'G2':'A', 'G3':'C', 'G4':'N', 'G.':'N', 'GN':'N',
'T0':'T', 'T1':'G', 'T2':'C', 'T3':'A', 'T4':'N', 'T.':'N', 'TN':'N',
'N0':'N', 'N1':'N', 'N2':'N', 'N3':'N', 'N4':'N', 'N.':'N', 'NN':'N' }
nameeta is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:51 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO