SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Problems with mapping SOLID color space sequences to hg19 using TopHat davidehs Bioinformatics 1 06-24-2011 02:21 PM
Converting nucleotide-space to color-space javijevi Bioinformatics 7 11-29-2010 02:14 AM
Solid formats translator(base space/color space/double encoded) AronaldJ SOLiD 0 10-26-2010 12:10 AM
Bowtie and Color Space cutcopy11 SOLiD 9 02-05-2010 02:50 AM
direct mapping of color-space data against color-space begsch SOLiD 1 09-09-2009 09:25 PM

Reply
 
Thread Tools
Old 12-21-2010, 11:39 AM   #1
SongLi
Member
 
Location: Durham

Join Date: Oct 2010
Posts: 19
Default TopHat color space

Hi All,

I have trouble using tophat with files downloaded from SRA.

My command is:

./tophat -C -p 5 -o ./825tophat ../bowtie-0.12.7/indexes/ath_gmc_colspace_110510 ~/SRR039825.fastq

Error encountered parsing file /home/SRR039825.fastq:
Length mismatch between sequence and quality strings for SRR039825.1 923_6_55 (36 vs 36).

The sequence is here:
@SRR039825.1 923_6_55
T00310202021210203103230203233012210
+
!;>1<998495<<3$4.40%/87-101*&3,8#%'#

I dig into the code, and find the problem is at line 931-934 of tophat.py, where the length of the sequence has to be 1 character longer than the quality score.

Why is this and how can I fix it?

Thanks,

Song Li
SongLi is offline   Reply With Quote
Old 12-22-2010, 07:09 AM   #2
SongLi
Member
 
Location: Durham

Join Date: Oct 2010
Posts: 19
Default

A little update on this issue:

I wrote a script that trim off the first quality value in my fastq file. Then tophat runs smoothly through the whole analysis.

I am still not sure that's the correct way of solving this problem.

Thanks,


Quote:
Originally Posted by SongLi View Post
Hi All,

I have trouble using tophat with files downloaded from SRA.

My command is:

./tophat -C -p 5 -o ./825tophat ../bowtie-0.12.7/indexes/ath_gmc_colspace_110510 ~/SRR039825.fastq

Error encountered parsing file /home/SRR039825.fastq:
Length mismatch between sequence and quality strings for SRR039825.1 923_6_55 (36 vs 36).

The sequence is here:
@SRR039825.1 923_6_55
T00310202021210203103230203233012210
+
!;>1<998495<<3$4.40%/87-101*&3,8#%'#

I dig into the code, and find the problem is at line 931-934 of tophat.py, where the length of the sequence has to be 1 character longer than the quality score.

Why is this and how can I fix it?

Thanks,

Song Li

Last edited by SongLi; 12-22-2010 at 07:24 AM.
SongLi is offline   Reply With Quote
Old 12-27-2010, 12:08 AM   #3
xinwu
Member
 
Location: Beijing

Join Date: Jul 2010
Posts: 33
Default

Quote:
Originally Posted by SongLi View Post
A little update on this issue:

I wrote a script that trim off the first quality value in my fastq file. Then tophat runs smoothly through the whole analysis.

I am still not sure that's the correct way of solving this problem.

Thanks,
This is due to the format used by NCBI. NCBI transforms all the data from different platforms to a standard FASTQ format.
Tophat uses bowtie for reads mapping and it expects csfasta and qual files if the data is color-spaced. Sequence in csfasta has additional 'T' adapter comparing to qual file, so tophat expects one more base. Just tell bowtie you use fastq format rather than fasta.
xinwu is offline   Reply With Quote
Old 12-28-2010, 09:27 AM   #4
ngsbioinfo
Junior Member
 
Location: SA

Join Date: Dec 2010
Posts: 1
Default

hi all,

I am new to NGS analysis field. I am working on RNA-Seq data, aim it to identify all novel junctions and transcripts. I would appreciate if any one can help me out in using tophat and cufflinks for that matter.

Thanks in advance
ngsbioinfo is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:36 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO