SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat prep_reads error ega2d RNA Sequencing 4 12-07-2012 08:17 AM
Tophat~Error retrieving prep_reads info ruc9 Bioinformatics 6 02-28-2012 07:56 AM
tophat Error running running 'prep_reads' victoryhe Bioinformatics 2 10-17-2011 04:53 AM
TOPHAT ,Error locating program: prep_reads chenyao Bioinformatics 2 08-17-2011 04:34 AM
another tophat "could not execute prep_reads" error James Bioinformatics 7 11-17-2010 04:49 AM

Reply
 
Thread Tools
Old 05-07-2010, 08:21 AM   #21
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

Quote:
Originally Posted by maubp View Post
I've added [ code ] tags round your FASTQ example for clarity - otherwise the forum messes things up.

The (optional repeated) identifier on the + line doesn't match the (mandatory) identifier on the @ line. Assuming nothing went wrong in the cut and paste into the forum, it looks like something is very wrong with your FASTQ file. This may be what is upsetting tophat.

Actually, the mismatch between the "@" and "+" names should be fine, at least within TopHat. In fact, the program exploits this feature of FASTQ to make analyzing paired and long reads much easier. One of the first things TopHat does is "rename" the user's reads with increasing integer IDs, moving their true names from the "@" field down to the "+" field and rewriting the FASTQ files to a temporary file. Mate pairs from the same fragment get the same ID, making intermediate results for them them much easier to match back up later on.
Cole Trapnell is offline   Reply With Quote
Old 05-07-2010, 08:23 AM   #22
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

Quote:
Originally Posted by clariet View Post
Hi, Cole_Trapnell

Does current version of Tophat support SOLiD data? Thanks

Clariet
Currently, no.
Cole Trapnell is offline   Reply With Quote
Old 05-07-2010, 08:42 AM   #23
clariet
Member
 
Location: NJ

Join Date: Mar 2010
Posts: 18
Default

Thank you. Are you planning to add this sometime soon? My colleague has very good comments on TopHat and I am very much looking forward to using this tool for SOLiD data, which I have access only. Bowtie, as far as I know, supports SOLiD.

Quote:
Originally Posted by Cole Trapnell View Post
Currently, no.
clariet is offline   Reply With Quote
Old 05-12-2010, 01:33 PM   #24
RSK
Junior Member
 
Location: Illinois

Join Date: Jun 2009
Posts: 9
Default Empty (almost) junctions.bed file...

I have a related question.
I mapped Illumina 100nt reads using TopHat in default mode.
The resulting junctions.bed has only a single line, "track name=junctions description="TopHat junctions"."
When I used the resulting accepted_hits.sam file to identify transcripts using Cufflinks, the resulting transcripts.gtf has about 190,000 transcripts. But, all the transcripts have only one exon!
All these things together makes me wonder if there was a problem with mapping across intronic regions in the TopHat/Bowtie stage, with these data.

Please let me know if anyone has any idea on how to deal with this issue. Also please let me know if my description needs any clarifications.

Thanks!
-RSK
RSK is offline   Reply With Quote
Old 07-02-2010, 02:53 PM   #25
maximilianh
Member
 
Location: UK

Join Date: Oct 2009
Posts: 15
Default Same here

Quote:
Originally Posted by bzhang View Post
I did the same conversion and was able to run the downstream analysis with cufflinks and the result seems to be fine. I think this is a safe workaround as 'N' is a legitimate character in the fasta sequence and I assume the alignment software (bowtie) treats it intelligently.
I have used "maq sol2sanger" to convert my fastq file.

I had the same problem and applied that little gawk script (thanks guys, really saved me wasting hours).

And Cole: I am using tophat 1.0.14.
maximilianh is offline   Reply With Quote
Old 07-02-2010, 03:26 PM   #26
thinkRNA
Member
 
Location: Carlsbad,CA

Join Date: Jan 2010
Posts: 94
Default

Quote:
Originally Posted by maximilianh View Post
I have used "maq sol2sanger" to convert my fastq file.

I had the same problem and applied that little gawk script (thanks guys, really saved me wasting hours).

And Cole: I am using tophat 1.0.14.
Where did you get version 1.0.14? I thought the latest that was out was TopHat 1.0.13 (BETA) release 2/5/2010
thinkRNA is offline   Reply With Quote
Old 07-05-2010, 08:08 AM   #27
maximilianh
Member
 
Location: UK

Join Date: Oct 2009
Posts: 15
Default TopHat 1.0.14

Quote:
Originally Posted by thinkRNA View Post
Where did you get version 1.0.14? I thought the latest that was out was TopHat 1.0.13 (BETA) release 2/5/2010
From here: http://tophat.cbcb.umd.edu/index.html
maximilianh is offline   Reply With Quote
Old 07-09-2010, 04:09 AM   #28
maximilianh
Member
 
Location: UK

Join Date: Oct 2009
Posts: 15
Default

Quote:
Originally Posted by shurjo View Post
If this is Illumina data, were your reads processed with pipeline v1.3 or later? If so, you have to include the --solexa-quals option in your TopHat run.
I've had the same problem when pre-processed the illumina file first with maq sol2sanger. I am using tophat 1.4 now and AM NOT PREPROCESSING anymore and it works!! Just use the .txt file

Max
maximilianh is offline   Reply With Quote
Old 08-05-2010, 06:43 AM   #29
sanwen
Member
 
Location: china

Join Date: May 2010
Posts: 12
Default

I did not understand what this paragraph mean in the Manual, i am not a native english speaker.
"Arguments:
<ebwt_base> The basename of the index to be searched. The basename is the name of any of the five index files up to but not including the first period. bowtie first looks in the current directory for the index files, then looks in the indexes subdirectory under the directory where the currently-running bowtie executable is located, then looks in the directory specified in the BOWTIE_INDEXES environment variable. "
what does this paragraph? For example, i have bowtie index (dog.fa, dog.fa.1.ebwt, dog.fa.2.ebwt and so on) in director /home/index, when i try to run the software, i type "tophat -r 200 /home/index/dog.fa test.fq
but it always show :
checking for Bowtie index files
checking for reference FASTA file
Warning:Could not find FASTA file /home/indexdog.fa.fa
Reconstituting reference FASTA file from Bowtie index

What is the problem?
sanwen is offline   Reply With Quote
Old 08-05-2010, 07:59 AM   #30
raela
Member
 
Location: Ithaca, NY

Join Date: Apr 2010
Posts: 39
Default

You would use dog, not dog.fa. It adds the .fa for you when it search.es
raela is offline   Reply With Quote
Old 08-05-2010, 05:40 PM   #31
sanwen
Member
 
Location: china

Join Date: May 2010
Posts: 12
Default

Quote:
Originally Posted by raela View Post
You would use dog, not dog.fa. It adds the .fa for you when it search.es
if i use "dog", it shows that "
could not find Bowtie index files dog.*"

i use "dog.fa" to build the index, so i seem that use "dog.fa" as basename is OK, but it fail.
sanwen is offline   Reply With Quote
Old 08-06-2010, 12:24 AM   #32
sanwen
Member
 
Location: china

Join Date: May 2010
Posts: 12
Default

Quote:
Originally Posted by Cole Trapnell View Post
Can you verify that the FASTQ file is correctly formatted? The fact that TopHat is choosing a seed length of 101bp tells me something's up with that file. The seed length ought to be 25 for 50bp reads or longer. TopHat's FASTQ parser occasionally screws up when FASTQ records are incorrectly formatted or when the read and/or quality sequences span more than one line in the file. We plan to replace the parser in an upcoming version to make it more robust to this kind of thing.
how to set seed length in tophat? i use -l but it fail to work.
sanwen is offline   Reply With Quote
Old 11-15-2012, 07:03 AM   #33
wanfahmi
Member
 
Location: North Sea

Join Date: Apr 2008
Posts: 34
Default

Hey,

I got a problem running my rna-seq data. All software were updated to current version. Here is the error;

[2012-11-15 15:27:42] Beginning TopHat run (v2.0.4)
-----------------------------------------------
[2012-11-15 15:27:42] Checking for Bowtie
Bowtie version: 2.0.0.7
[2012-11-15 15:27:42] Checking for Samtools
Samtools version: 0.1.18.0
[2012-11-15 15:27:42] Checking for Bowtie index files
[2012-11-15 15:27:42] Checking for reference FASTA file
[2012-11-15 15:27:42] Generating SAM header for genome
format: fastq
quality scale: solexa33 (reads generated with GA pipeline version < 1.3)
[2012-11-15 15:27:46] Reading known junctions from GTF file
[2012-11-15 15:28:08] Preparing reads
[FAILED]
Error retrieving prep_reads info.

My command was ;

tophat --solexa-quals -p 2 -G genes.gtf -o S1_R1_thout genome S1_R1.fq S1_R2.fq

Below is my fastq file;

@XXXXXX-HISEQ:2151D4GACXX:4:1101:1414:2232 1:N:0:TGACCA
CCATGCAGAAGGGTACAGTTACATTAAGAACTGAAGTCTTTTAAAAAGCTTTAAACATTCTTTCTTGAACCAAAACATTCGACAAAAGATGCACATGAAA
+
CCCFFFFFGGHHG?FHJEFCHGHIJJJIGJJJIJJCCGIIJIHIJJJIGIIIGIIJJIGIJIJGGIIJGGGIJIJEEEHGAEDB>>CEDDDDDCCCDCDC




Please advice. Thank you
wanfahmi is offline   Reply With Quote
Old 11-27-2012, 04:10 AM   #34
wanfahmi
Member
 
Location: North Sea

Join Date: Apr 2008
Posts: 34
Default

Quote:
[2012-11-27 11:14:53] Beginning TopHat run (v2.0.4)
-----------------------------------------------
[2012-11-27 11:14:53] Checking for Bowtie
Bowtie version: 2.0.0.7
[2012-11-27 11:14:53] Checking for Samtools
Samtools version: 0.1.18.0
[2012-11-27 11:14:53] Checking for Bowtie index files
[2012-11-27 11:14:53] Checking for reference FASTA file
[2012-11-27 11:14:53] Generating SAM header for genome
format: fastq
quality scale: solexa33 (reads generated with GA pipeline version < 1.3)
[2012-11-27 11:14:58] Reading known junctions from GTF file
[2012-11-27 11:15:20] Preparing reads
[FAILED]
Error running 'prep_reads'
Error: qual length (162) differs from seq length (100) for fastq record !
I try it again, run on with a same command, but appeared with different error.
wanfahmi is offline   Reply With Quote
Old 04-09-2013, 01:11 PM   #35
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

For me, it was trying to use too small anchor lengths in the TopHat-Fusion mode.

It seems it wants an integer greater than or equal to 10.


Carmen
carmeyeii is offline   Reply With Quote
Old 10-02-2015, 01:49 AM   #36
dhatziioanou
Junior Member
 
Location: Greece

Join Date: Aug 2012
Posts: 8
Default

Hello, I know this is a pretty old thread but I got the same problem now in 2015 and since I don't have any good quality data to use instead I had a little dig. The index error is a python error caused by the fasq data format having issues, in my case at least I know my index is fine.
My data was already perfectly paired by I passed it through pairfq-lite version 0.14.3 anyway, I got all the data back in two new paired files (the unpaired files were 0 bytes) and now that I'm running htop again it seems to be working, at least its passing that stage and reading the files

Before:
[2015-10-02 10:46:41] Beginning TopHat run (v2.0.9)
-----------------------------------------------
[2015-10-02 10:46:41] Checking for Bowtie
Bowtie version: 2.1.0.0
[2015-10-02 10:46:41] Checking for Samtools
Samtools version: 0.1.19.0
[2015-10-02 10:46:41] Checking for Bowtie index files (genome)..
[2015-10-02 10:46:41] Checking for reference FASTA file
[2015-10-02 10:46:41] Generating SAM header for GRCm38.p4.genome
Traceback (most recent call last):
File "/usr/bin/tophat", line 4072, in <module>
sys.exit(main())
File "/usr/bin/tophat", line 3926, in main
params.read_params = check_reads_format(params, reads_list)
File "/usr/bin/tophat", line 1832, in check_reads_format
freader=FastxReader(zf.file, params.read_params.color, zf.fname)
File "/usr/bin/tophat", line 1577, in __init__
while hlines>0 and self.lastline[0] not in "@>" :
IndexError: string index out of range

After:
[2015-10-02 12:14:34] Beginning TopHat run (v2.0.9)
-----------------------------------------------
[2015-10-02 12:14:34] Checking for Bowtie
Bowtie version: 2.1.0.0
[2015-10-02 12:14:34] Checking for Samtools
Samtools version: 0.1.19.0
[2015-10-02 12:14:34] Checking for Bowtie index files (genome)..
[2015-10-02 12:14:34] Checking for reference FASTA file
[2015-10-02 12:14:34] Generating SAM header for GRCm38.p4.genome
format: fastq
quality scale: phred33 (default)
[2015-10-02 12:14:36] Reading known junctions from GTF file
[2015-10-02 12:14:53] Preparing reads
dhatziioanou is offline   Reply With Quote
Reply

Tags
bowtie, error, prep_reads, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO