Seqanswers Leaderboard Ad

**jdanderson** · 09-22-2010, 08:38 PM

Toph

So I was able to get Tophat to recognize the hg19.fa by executing the commands while in the bowtie/indexes directory.
However, now i get the following error printed up:

tophat -r 220 /home/johnathon/bowtie-0.12.7/indexes/hg19 hg19.fa /home/johnathon/maq-0.7.0_x86_64-linux/s_5_sequence s_5_sequence.fq

[Wed Sep 22 21:27:30 2010] Beginning TopHat run (v1.0.13)
-----------------------------------------------
[Wed Sep 22 21:27:30 2010] Preparing output location ./tophat_out/
[Wed Sep 22 21:27:30 2010] Checking for Bowtie index files
[Wed Sep 22 21:27:30 2010] Checking for reference FASTA file
[Wed Sep 22 21:27:30 2010] Checking for Bowtie
Bowtie version: 0.12.7.0
[Wed Sep 22 21:27:30 2010] Checking reads
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
seed length: 20bp
format: fasta
[FAILED]
Error: could not execute prep_reads

When I check the Tophat_out folder for the log, i find the following out put in the prep_reads.log:

prep_reads v1.0.13
---------------------------
Error: cannot open reads file /home/johnathon/maq-0.7.0_x86_64-linux/s_5_sequence for reading

As I mentioned before, I used MAQ to convert it to a Sanger FASTQ and I also ran Bowtie-Inspector on the hg19 index.

Can anyone provide some guidance?
Anyone else had this issue before?

**jdanderson** · 09-23-2010, 09:18 PM

On the off chance that this thread may benefit someone, I will try to continue to update it.

So I have had mediocre success with reinstalling Bowtie via the source extraction .src.zip) method rather than the pre-compiled version (off a suggestion from a friend) and running my sequence.fq (that had been converted from solexa sequence.txt by the popular script from MAQ) through Bowtie first and then using the output file to input into Tophat.

Now when I run Tophat as mentioned before, I now get a new error:

tophat -r 220 /home/johnathon/bowtie-0.12.7/indexes/hg19 hg19.fa /home/johnathon/bowtie-0.12.7/indexes/s_5_sequence.fq s_5_sequences.fq

[Thu Sep 23 21:23:06 2010] Beginning TopHat run (v1.0.13)
-----------------------------------------------
[Thu Sep 23 21:23:06 2010] Preparing output location ./tophat_out/
[Thu Sep 23 21:23:06 2010] Checking for Bowtie index files
[Thu Sep 23 21:23:06 2010] Checking for reference FASTA file
[Thu Sep 23 21:23:06 2010] Checking for Bowtie
Bowtie version: 0.12.7.0
[Thu Sep 23 21:23:06 2010] Checking reads
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
Warning: found a read < 20bp in hg19.fa
seed length: 20bp
format: fasta
[Thu Sep 23 21:25:53 2010] Mapping reads against hg19 with Bowtie
[Thu Sep 23 21:26:03 2010] Joining segment hits
[Thu Sep 23 21:26:03 2010] Mapping reads against hg19 with Bowtie
[Thu Sep 23 21:26:06 2010] Joining segment hits
[Thu Sep 23 21:26:06 2010] Searching for junctions via segment mapping
Warning: junction database is empty!
[Thu Sep 23 21:27:51 2010] Joining segment hits
[Thu Sep 23 21:27:51 2010] Joining segment hits
[Thu Sep 23 21:27:51 2010] Reporting output tracks
[FAILED]
Error: Report generation failed with err = 1

Although the thread on here entitled "Running ~35 bp and >=50 RNASeq reads" may provide some guidance (only helpful thread i could find). I will try and report back results of trimming. Running ~35 bp and >=50 RNASeq reads

**0xTc0** · 09-24-2010, 01:50 AM

Sry if I misunderstood your problem, but would not be the the right command as simply as:

tophat -r 220 /home/johnathon/bowtie-0.12.7/indexes/hg19 /home/johnathon/maq-0.7.0_x86_64-linux/s_5_sequence.fq

**jdanderson** · 09-24-2010, 07:57 AM

Hello OxTcO,

Thank you for your reply. I appreciate any help I can get! That's an interesting point, as you might be able to guess I do not have a strong computer background.

I have tried your suggested input, but now I get an all together different error printing up:

tophat -r 220 /home/johnathon/bowtie-0.12.7/indexes/hg19 /home/johnathon/bowtie-0.12.7/indexes/s_5_sequence.fq

[Fri Sep 24 08:18:33 2010] Beginning TopHat run (v1.0.13)
-----------------------------------------------
[Fri Sep 24 08:18:33 2010] Preparing output location ./tophat_out/
[Fri Sep 24 08:18:33 2010] Checking for Bowtie index files
[Fri Sep 24 08:18:33 2010] Checking for reference FASTA file
[Fri Sep 24 08:18:33 2010] Checking for Bowtie
Bowtie version: 0.12.7.0
[Fri Sep 24 08:18:33 2010] Checking reads
Error: file /home/johnathon/bowtie-0.12.7/indexes/s_5_sequence.fq does not appear to be a valid FASTA or FASTQ file
seed length: 136bp
format: fastq
quality scale: phred33 (default)
[Fri Sep 24 08:21:16 2010] Mapping reads against hg19 with Bowtie
[Fri Sep 24 08:21:39 2010] Joining segment hits
Traceback (most recent call last):
File "/home/johnathon/tophat-1.0.13/bin/tophat", line 1635, in <module>
sys.exit(main())
File "/home/johnathon/tophat-1.0.13/bin/tophat", line 1595, in main
user_supplied_juncs)
File "/home/johnathon/tophat-1.0.13/bin/tophat", line 1395, in spliced_alignment
segment_len)
File "/home/johnathon/tophat-1.0.13/bin/tophat", line 1085, in split_reads
reads_file = open(reads_filename)
IOError: [Errno 2] No such file or directory: './tophat_out/tmp//left_kept_reads_missing.fq'

Although, there is a left_kept_reads.fq (minus the "missing") in this tophat_out directory, however when i checked it, it was empty.

I checked the tophat_out log directory and opened segment_junc.log and found the following error as well:

segment_juncs v1.0.13
---------------------------
Loading reference sequences...
Loading chr1...done
Loading chr2...done
Loading chr3...done
Loading chr4...done
Loading chr5...done
Loading chr6...done
Loading chr7...done
Loading chr8...done
Loading chr9...done
Loading chr10...done
Loading chr11...done
Loading chr12...done
Loading chr13...done
Loading chr14...done
Loading chr15...done
Loading chr16...done
Loading chr17...done
Loading chr18...done
Loading chr19...done
Loading chr20...done
Loading chr21...done
Loading chr22...done
Loading chrX...done
Loading chrY...done
Loading chrM...done
Found 0 potential split-segment junctions
Indexing extensions in ./tophat_out/tmp//left_kept_reads_missing.fq
Can't open file ./tophat_out/tmp//left_kept_reads_missing.fq for reading, skipping...
Indexing extensions in ./tophat_out/tmp//right_kept_reads_missing.fq
Can't open file ./tophat_out/tmp//right_kept_reads_missing.fq for reading, skipping...
Looking for junctions by island end pairings
Adding hits from segment file 0 to coverage map
Adding hits from segment file 1 to coverage map
Map covers 0 bases
Map covers 0 bases in sufficiently long segments
Map contains 1 good islands
0 are left looking bases
0 are right looking bases
Collecting potential splice sites in islands
reporting synthetic splice junctions...
Found 0 potential island-end pairing junctions
done
Reporting potential splice junctions...done
Reported 0 total possible splices

Any ideas about this? I haven't attempted trimming the s_5_sequence.fq file yet (as per the afore mentioned thread).

**0xTc0** · 09-24-2010, 08:38 AM

Could you send me the first, 4-8 lines of your fasta file. Could it be, that
/bowtie-0.12.7/indexes/s_5_sequence.fq

is indexed or so?

Only the reference sequence (hg19) has to be indexed! For the fragment reads simply use the Fastq file (unedited). Bytheway, if you use Solexa data, consider using the

--solexa-quals Use the Solexa scale for quality values in FASTQ files.

or

--solexa1.3-quals As of the Illumina GA pipeline version 1.3, quality scores are encoded in Phred-scaled base-64. Use this option for FASTQ files from pipeline 1.3 or later.

parameters.

404 Not Found

http://tophat.cbcb.umd.edu/manual.html

Cheers, Michael

**jdanderson** · 09-24-2010, 11:51 AM

Hello Michael,

Thanks again for the reply.

So the Solexa (v1.6) seq.txt file was (supposedly) converted to Sanger fastq via MAQ's fq_all2std.pl sol2std and inspected by bowtie-inspect, hence i thought there would be no need to use the --solexa1.3-quals option. But i'll give it a shot nonetheless and report back the results.

The reason the seq.fq file is in the indexes directory is because, out of frustration, I ran the now converted seq.fq through a bowtie alignment (which seems to have been successful) to try and get a usable file format, and bowtie placed the output into the indexes directory (by default, i did not specify where it should go) and i have not bothered to move it.

Here are the first several lines of s_5_sequence.fq file:

SOLEXA2_0827_FC707M4AAXX:5:1:1004:21272#0/1 - chr1 9795118 GCTCGGGCAAAATGGTGGACGCCACTCAGGCTGATCTTGN A??@AAA@A?@?0>@??A>@A@@AA<>=>?64852/-,.% 0 0:G>N
SOLEXA2_0827_FC707M4AAXX:5:1:1005:9407#0/1 + chr10 52374466 NGGGAATGCCCTGCTGGGCTAACCTGTGTATTACAACGCT %/++-.0503A@?@AA?A>??9?@@>>><<=<5>9>AA@? 0 0:T>N
SOLEXA2_0827_FC707M4AAXX:5:1:1005:1901#0/1 - chrY 21152905 CCACTTTTAGGCTTAGGACCAGGTTCTAACTATCTAAAAN %%%%%%%%%%%%%%%%%%%%BBBB:>>>>>44482+.2/% 0 0:A>N
SOLEXA2_0827_FC707M4AAXX:5:1:1006:1817#0/1 + chr12 117383234 NGGCACCTTCCGGATAGCAGCATCTCTGACTATTCTTGCT %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0 0:G>N
SOLEXA2_0827_FC707M4AAXX:5:1:1006:7942#0/1 - chrM 780 TCAAAACGCTTAGCCTAGCCACACCCCCACGGGAAACAGN %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0 0:C>N

I am not sure exactly how well this matches up with Sanger fastq, but i do notice some variation in the read lengths which is why i was going to try and use FASTX Toolkit's trimming function to standardize the lengths (as per this thread, http://seqanswers.com/forums/showthread.php?p=23066 ).

I will try the above two mentioned methods and report back. Any thoughts in the mean time?

Cheers, Johnathon

**0xTc0** · 09-25-2010, 05:20 AM

That is no FastQ format as Tophat correctly complains. FastQ format looks like this:

@SOLEXA2_0827_FC707M4AAXX:5:1:1004:21272
GCTCGGGCAAAATGGTGGACGCCACTCAGGCTGATCTTGN
+
A??@AAA@A?@?0>@??A>@A@@AA<>=>?64852/-,.%
@SOLEXA2_0827_FC707M4AAXX:5:1:1005:9407
NGGGAATGCCCTGCTGGGCTAACCTGTGTATTACAACGCT
+
%/++-.0503A@?@AA?A>??9?@@>>><<=<5>9>AA@?

Try to convert your input file.

FASTQ Format

http://maq.sourceforge.net/fastq.shtml

HTH, Michael ;-)

**jdanderson** · 09-25-2010, 09:56 AM

Hello Michael,

Thanks again for all the valuable input; I am grateful.

So i did actually use MAQ-0.7.0_x86_64linux fq_all2std.pl sol2std to convert the seq.txt file.

It seems to me that part of the difference in the two formats is the inclusion of a location id (eg #0/1- chr1 9795118) and the trailing "0 0:G>N".

The reason for the chr id tag is because these files have already been aligned using Gerald in the Solexa pipeline (v1.6). Ultimately I am trying to use Cufflinks to analyze the expression levels of these RNA-seq samples. I am simply using maq, bowtie and tophat to try and convert the file format accordingly for cufflinks. The file format i need for cufflinks is the SAM format and cufflinks suggests using tophat to acquire this.

The samples were run and initially analyzed by our sequencing core here at UCD. The set of output files they give you access to is limited; ie you only receive the aligned files like seq.txt, export.txt etc..

Since this is the case, do you think i need to chomp off the chr location id? I only need to convert the seq.txt into a SAM format... do you have any ideas for a simpler way of doing this?

Also, I just converted the export.txt file using MAQ's fq_all2std.pl export2std to see if this is any help.
Here is a sample of the output, which i think looks a bit closer to fastq:

@SOLEXA2_0827:5:1:1004:21272/1
NCAAGATCAGCCTGAGTGGCGTCCACCATTTTGCCCGAGC
+
%.,-/25846?>=><AA@@A@>A??@>0?@?A@AAA@??A
@SOLEXA2_0827:5:1:1005:9407/1
NGGGAATGCCCTGCTGGGCTAACCTGTGTATTACAACGCT
+
%/++-.0503A@?@AA?A>??9?@@>>><<=<5>9>AA@?
@SOLEXA2_0827:5:1:1005:1901/1
NTTTTAGATAGTTAGAACCTGGTCCTAAGCCTAAAAGTGG

I am in the middle of running tophat on this newly converted s_5_export.fq file. I will post results when it's finished.

**jdanderson** · 09-26-2010, 01:49 PM

Hello,

Just wanted to report that MAQ's fq_all2std.pl export2std conversion worked perfectly, unlike the sol2std command that i had used previously. The export2std command was the appropriate one to use due to the fact my files had already been aligned to hg19 via Solexa (v1.6) pipeline program Gerald. The location id needed to be deleted and I believe (after a cursory perusal) that it also helped out with read length uniformity (in addition to the PHRED quality score adjustment)

I also successfully ran Cufflinks with this Tophat output and have visualized the data in UCSC Genome Browser and I am attempting to reformat the Cufflinks output to use in IGV from the Broad Institute (and in R when I work up the courage).

I also want to say thank you to Michael from Leipzig, you were very gracious in helping out a fledgling student. Also, thank you to everyone who posts a thread with a problem and the kind people who reply; this has been very helpful to me.

**nuclearriot** · 09-27-2010, 11:37 AM

tophat help!

Hello, I have no experience with any sort of programming, but am attempting to use top hat. can you please take me through a successful run on the sample data set. this is how i begin:

i have made a folder called top hat with the following folders: tophat-1.0.14.OSX_x86_64, bowtie-0.12.7, and test_data.

i begin in the terminal on mac and enter
cd desktop
cd tophat
cd tophat-1.0.14.OSX_x86_64

now when i run ./tophat -r 20 test_ref reads_1.fq reads_2.fq

an error message appears

Macintosh-99:~ common$ cd desktop
Macintosh-99:desktop common$ cd tophat
Macintosh-99:tophat common$ cd /Users/common/Desktop/tophat/tophat-1.0.14.OSX_x86_64
Macintosh-99:tophat-1.0.14.OSX_x86_64 common$ ./tophat
tophat:
TopHat maps short sequences from spliced transcripts to whole genomes.

Usage:
tophat [options] <bowtie_index> <reads1[,reads2,...,readsN]> [reads1[,reads2,...,readsN]]

Options:
-v/--version
-o/--output-dir <string> [ default: ./tophat_out ]
-a/--min-anchor <int> [ default: 8 ]
-m/--splice-mismatches <0-2> [ default: 0 ]
-i/--min-intron <int> [ default: 50 ]
-I/--max-intron <int> [ default: 500000 ]
-g/--max-multihits <int> [ default: 40 ]
-F/--min-isoform-fraction <float> [ default: 0.15 ]
--solexa-quals
--solexa1.3-quals (same as phred64-quals)
--phred64-quals (same as solexa1.3-quals)
-p/--num-threads <int> [ default: 1 ]
-G/--GFF <filename>
-j/--raw-juncs <filename>
-r/--mate-inner-dist <int>
--mate-std-dev <int> [ default: 20 ]
--no-novel-juncs
--no-gff-juncs
--no-coverage-search
--coverage-search
--no-closure-search
--closure-search
--fill-gaps
--microexon-search
--butterfly-search
--no-butterfly-search
--keep-tmp

Advanced Options:

--segment-mismatches <int> [ default: 2 ]
--segment-length <int> [ default: 25 ]
--min-closure-exon <int> [ default: 100 ]
--min-closure-intron <int> [ default: 50 ]
--max-closure-intron <int> [ default: 5000 ]
--min-coverage-intron <int> [ default: 50 ]
--max-coverage-intron <int> [ default: 20000 ]
--min-segment-intron <int> [ default: 50 ]
--max-segment-intron <int> [ default: 500000 ]

SAM Header Options (for embedding sequencing run metadata in output):
--rg-id <string> (read group ID)
--rg-sample <string> (sample ID)
--rg-library <string> (library ID)
--rg-description <string> (descriptive string, no tabs allowed)
--rg-platform-unit <string> (e.g Illumina lane ID)
--rg-center <string> (sequencing center name)
--rg-date <string> (ISO 8601 date of the sequencing run)
--rg-platform <string> (Sequencing platform descriptor)

for detailed help see http://tophat.cbcb.umd.edu/manual.html
Macintosh-99:tophat-1.0.14.OSX_x86_64 common$ cd ..
Macintosh-99:tophat common$ cd /Users/common/Desktop/tophat/bowtie-0.12.7
Macintosh-99:bowtie-0.12.7 common$ ./bowtie
No index, query, or output file specified!
Usage:
bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]

<m1> Comma-separated list of files containing upstream mates (or the
sequences themselves, if -c is set) paired with mates in <m2>
<m2> Comma-separated list of files containing downstream mates (or the
sequences themselves if -c is set) paired with mates in <m1>
<r> Comma-separated list of files containing Crossbow-style reads. Can be
a mixture of paired and unpaired. Specify "-" for stdin.
<s> Comma-separated list of files containing unpaired reads, or the
sequences themselves, if -c is set. Specify "-" for stdin.
<hit> File to write hits to (default: stdout)
Input:
-q query input files are FASTQ .fq/.fastq (default)
-f query input files are (multi-)FASTA .fa/.mfa
-r query input files are raw one-sequence-per-line
-c query sequences given on cmd line (as <mates>, <singles>)
-C reads and index are in colorspace
-Q/--quals <file> QV file(s) corresponding to CSFASTA inputs; use with -f -C
--Q1/--Q2 <file> same as -Q, but for mate files 1 and 2 respectively
-s/--skip <int> skip the first <int> reads/pairs in the input
-u/--qupto <int> stop after first <int> reads/pairs (excl. skipped reads)
-5/--trim5 <int> trim <int> bases from 5' (left) end of reads
-3/--trim3 <int> trim <int> bases from 3' (right) end of reads
--phred33-quals input quals are Phred+33 (default)
--phred64-quals input quals are Phred+64 (same as --solexa1.3-quals)
--solexa-quals input quals are from GA Pipeline ver. < 1.3
--solexa1.3-quals input quals are from GA Pipeline ver. >= 1.3
--integer-quals qualities are given as space-separated integers (not ASCII)
Alignment:
-v <int> report end-to-end hits w/ <=v mismatches; ignore qualities
or
-n/--seedmms <int> max mismatches in seed (can be 0-3, default: -n 2)
-e/--maqerr <int> max sum of mismatch quals across alignment for -n (def: 70)
-l/--seedlen <int> seed length for -n (default: 28)
--nomaqround disable Maq-like quality rounding for -n (nearest 10 <= 30)
-I/--minins <int> minimum insert size for paired-end alignment (default: 0)
-X/--maxins <int> maximum insert size for paired-end alignment (default: 250)
--fr/--rf/--ff -1, -2 mates align fw/rev, rev/fw, fw/fw (default: --fr)
--nofw/--norc do not align to forward/reverse-complement reference strand
--maxbts <int> max # backtracks for -n 2/3 (default: 125, 800 for --best)
--pairtries <int> max # attempts to find mate for anchor hit (default: 100)
-y/--tryhard try hard to find valid alignments, at the expense of speed
--chunkmbs <int> max megabytes of RAM for best-first search frames (def: 64)
Reporting:
-k <int> report up to <int> good alignments per read (default: 1)
-a/--all report all alignments per read (much slower than low -k)
-m <int> suppress all alignments if > <int> exist (def: no limit)
-M <int> like -m, but reports 1 random hit (MAPQ=0); requires --best
--best hits guaranteed best stratum; ties broken by quality
--strata hits in sub-optimal strata aren't reported (requires --best)
Output:
-t/--time print wall-clock time taken by search phases
-B/--offbase <int> leftmost ref offset = <int> in bowtie output (default: 0)
--quiet print nothing but the alignments
--refout write alignments to files refXXXXX.map, 1 map per reference
--refidx refer to ref. seqs by 0-based index rather than name
--al <fname> write aligned reads/pairs to file(s) <fname>
--un <fname> write unaligned reads/pairs to file(s) <fname>
--max <fname> write reads/pairs over -m limit to file(s) <fname>
--suppress <cols> suppresses given columns (comma-delim'ed) in default output
--fullref write entire ref name (default: only up to 1st space)
Colorspace:
--snpphred <int> Phred penalty for SNP when decoding colorspace (def: 30)
or
--snpfrac <dec> approx. fraction of SNP bases (e.g. 0.001); sets --snpphred
--col-cseq print aligned colorspace seqs as colors, not decoded bases
--col-cqual print original colorspace quals, not decoded quals
--col-keepends keep nucleotides at extreme ends of decoded alignment
SAM:
-S/--sam write hits in SAM format
--mapq <int> default mapping quality (MAPQ) to print for SAM alignments
--sam-nohead supppress header lines (starting with @) for SAM output
--sam-nosq supppress @SQ header lines for SAM output
--sam-RG <text> add <text> (usually "lab=value") to @RG line of SAM header
Performance:
-o/--offrate <int> override offrate of index; must be >= index's offrate
-p/--threads <int> number of alignment threads to launch (default: 1)
--mm use memory-mapped I/O for index; many 'bowtie's can share
--shmem use shared mem for index; many 'bowtie's can share
Other:
--seed <int> seed for random number generator
--verbose verbose output (for debugging)
--version print version information and quit
-h/--help print this usage message
Macintosh-99:bowtie-0.12.7 common$ cd ..
Macintosh-99:tophat common$ cd tophat-1.0.14.OSX_x86_64/
Macintosh-99:tophat-1.0.14.OSX_x86_64 common$ ./tophat -r 20 test_ref reads_1.fq reads_2.fq

[Mon Sep 27 15:35:35 2010] Beginning TopHat run (v1.0.14)
-----------------------------------------------
[Mon Sep 27 15:35:35 2010] Preparing output location ./tophat_out/
[Mon Sep 27 15:35:35 2010] Checking for Bowtie index files
Error: Could not find Bowtie index files test_ref.*
Macintosh-99:tophat-1.0.14.OSX_x86_64 common$

i have placed the test_ref files in the same folder as the bowtie index files.

can i have a more step by step procedure on the test_data? i am sure i will be able to extrapolate from there.

**nuclearriot** · 09-27-2010, 11:41 AM

also, the data i am attempting to eventually analyze is from the gerald pipeline: s_N_ sequence.txt which is a fastq format

@GIRG_FC30MG9:1:1:872:535
GTTTTGGAAATGGGAGAATAGATTCCCCTTAAACT
+GIRG_FC30MG9:1:1:872:535
YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYRRRRR

which should be compatible for alignment with bowtie against the rn4.ebwt (rattus prebuilt index)

i hope to get alignment and splice information for viewing on ucsc genome browser.

will this be a completely task? i would appreciate some feedback. thanks!

**jdanderson** · 09-27-2010, 01:49 PM

Hello Nuclearriot,

So your first output simply shows that when you're in the tophat directory the tophat command will work. This demonstrates that your download and installation probably went okay for tophat. However, in order to get the computer to recognize that command (tophat) you need to tell it where to find it when you're in directories other than the tophat directory. This is commonly referred to as putting the command into your "PATH environmental variable"... if this hasn't been done yet, it will eventually need to be. Let me know if you want some guidance on this, i would be more than happy to help.

As for your second output, the computer doesn't seem to be able to find your test_data files. This could be because you are not in the bowtie/indexes directory where you put them, which is why the tophat manual tells you to "cd test_data"... they want you to be in the test_data directory when you run this command (which is why it is important you have your tophat commands in your "PATH environmental variable")...

As for your last post, yes you will be able to visualize this on the UCSC browser as a "custom track"... you can upload the file and even save your session if you wanted to, to let others see it.

Let me know if this helps or not, and if you have any more questions.

Regards-
Johnathon

**nuclearriot** · 09-27-2010, 03:53 PM

Johnathon,

Thank you so much for your response. Your advice is highly appreciated.
Yes, please help me set up the PATH environmental variables.

-Shan

**jdanderson** · 09-27-2010, 06:39 PM

Hello Shan,

So it looks like your using a Mac OS. Let me preface anything I'm about to say by stating that I am not that familiar with Macs. I run on a Linux based OS (Ubuntu 10.04.1). The way in which I put the commands into the PATH is maybe slightly different than what you will have to do because of this. The following is a link that seemed to be useful for Mac users:

OS X: Change your PATH environment variable

http://www.tech-recipes.com/rx/2621/os_x_change_path_environment_variable/

When you run a command from a UNIX or UNIX-like shell, the shell looks for the executable file using the directories listed in your PATH variable as a map. For convenience, adding directories to this environment variable means you don’t have to go hunting for a file each time you run it. Following these directions […]

The process appears to be rather similar to what I did. The above link talks about how you get into your home directory and open your .profile file with a text editor; the link mentions vi and TextEdit for Macs). You then find the export PATH= line and add/the/dir/to/your/command. NOTE, for the version of tophat that I used, tophat-1.0.13, the appropriate directory was the bin directory since this contained all the pertinent commands/files... e.g. export PATH=/home/johnathon/tophat-1.0.13/bin. It looks like you are using tophat 1.0.14, so it might be slightly different. You can tell by getting into the various directories and seeing where the important commands are located within tophat 1.0.14.

This should be a permanent solution to the issue; ie every time you login into your terminal it should be able to find where the tophat command is from any directory you try to execute it from. It should also be noted that it might be useful to put some of the common bowtie directories in there as well, eg indexes and reads, for future use (also cufflinks if your doing rna-seq analysis).

Let me know if this helps you at all and if you are able to get it execute properly.

Also, it may be helpful to look at the following link as well. It's a primer for UNIX and PERL for biologists by Dr. Korf at UCD. (PERL is a common bioinformatic scripting langauge):

Unix & Perl Primer for Biologists

http://korflab.ucdavis.edu/Unix_and_Perl/index.html

Regards,
Johnathon

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 23 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Tophat error message

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News