SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Samtools "is recognized as '*'" "truncated file" error axiom7 Bioinformatics 3 11-26-2014 02:53 AM
BWA aligned sam file missing "QNAME" format silentio Bioinformatics 3 05-21-2013 07:19 PM
CummeRbund csDendro error: "need finite ylim values" when replicates=TRUE mebbert Bioinformatics 4 07-18-2012 08:22 AM
"allele balance ratio" and "quality by depth" in VCF files efoss Bioinformatics 2 10-25-2011 11:13 AM
Error in GERALD - "Missing 'chromosome' element" Rachelly Illumina/Solexa 0 10-10-2010 05:47 AM

Reply
 
Thread Tools
Old 08-28-2014, 11:58 AM   #1
worm_picker
Junior Member
 
Location: New York

Join Date: Aug 2014
Posts: 3
Default Old(?) bowtie file: "missing quality values" error in tophat

Hello everyone,

I'm trying to look at some old RNA-seq data that I was able to find on NCBI. The data is available as a bowtie output, and I'm trying to use tophat2 to get transcript data.

It originally looked something like this:

HWI-EAS283:1:1:4:1142#0/1 - chr2 70362272 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*[email protected]/[email protected];[email protected]=CB7B 2 36:G>N,39:C>N
HWI-EAS283:1:1:4:1142#0/1 - chr18 50187254 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*[email protected]/[email protected];[email protected]=CB7B 2 36:G>N,39:C>N

Tophat gave the following error:

Traceback (most recent call last):
File "/opt/local/bin/tophat", line 2346, in <module>
sys.exit(main())
File "/opt/local/bin/tophat", line 2251, in main
params.read_params = check_reads(params.read_params, reads_list)
File "/opt/local/bin/tophat", line 1063, in check_reads
if first_line[0] in "@>":
IndexError: string index out of range


So I figured it must be the lack of an '@' at the beginning of the name of the reads, so I used vim to add an @ to the beginning of every line:


@HWI-EAS283:1:1:4:1142#0/1 - chr2 70362272 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*[email protected]/[email protected];[email protected]=CB7B 2 36:G>N,39:C>N
@HWI-EAS283:1:1:4:1142#0/1 - chr18 50187254 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*[email protected]/[email protected];[email protected]=CB7B 2 36:G>N,39:C>N


Now when I run this, I get the following error, where '###' is the file path, sorry wanted to keep that private :


Error encountered parsing file /#############:
Premature end of file (missing quality values for HWI-EAS283:1:1:4:1142#0/1 - chr70362272 TANTCNTTCCAAGGCTTCTAACATGATGATACTATTTCCTCG B9%<'%<B2;?ACA*[email protected]/[email protected];[email protected]=CB7B 2 36:G>N,39:C>N)


This is the very first line, so it seems to hint at a format error...

I looked at the bowtie manual, and it seems my output differs in one way from the manual's: column 5 should be "read sequence", a '+' or '-' value, not the quality score - meaning in between the read and the quality of my output should be another column with a '+' or '-' value.

Am I missing something here? The bowtie output that I'm downloading looks "mostly" like a bowtie output, but it appears wrong... I tried to see if it was maybe an older format, but I can't find any info on that.

Can anybody help me out?

Thanks in advance!!

-worm_picker
worm_picker is offline   Reply With Quote
Old 08-28-2014, 12:14 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

I don't recognize the format; maybe it was some old bowtie-specific output. If you want to map that, you should convert into fastq format, like this:
@1
5
+
6

...where 1 is the first field (read name), 5 is the 5th field (bases), and 6 is the 6th field (qualities).
Brian Bushnell is offline   Reply With Quote
Old 08-28-2014, 01:39 PM   #3
worm_picker
Junior Member
 
Location: New York

Join Date: Aug 2014
Posts: 3
Default

Thanks for replying, brian.

It's the same as bowtie output normally, and is already mapped, but is just missing a column (I think). On the GEO accession page it claims to be mapped reads from bowtie.

bowtie should be (according to the manual):
1. name
2. strand
3. "contig"
4. 0-based start on contig
5. read
6. read strand
7. quality
8. mismatches (if any)
worm_picker is offline   Reply With Quote
Old 08-28-2014, 02:01 PM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Well, if you want to map it with Tophat, I think you'll have to convert it to fastq first, even if it is (almost) in an old Bowtie format. I doubt you will find any downstream RNA-seq analysis tools that accept those mappings; they generally require sam or bam.
Brian Bushnell is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:39 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO