Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat error message

    Hello,
    So I am trying to use Tophat to align an RNA-seq data file (post Gerald) against h19 genome. When I enter the following:

    tophat -r 220 /home/johnathon/bowtie-0.12.7/indexes/hg19 hg19.fa /home/johnathon/maq-0.7.0_x86_64-linux/s_5_sequence.fq

    This output prints up:

    [Wed Sep 22 15:17:32 2010] Beginning TopHat run (v1.0.13)
    -----------------------------------------------
    [Wed Sep 22 15:17:32 2010] Preparing output location ./tophat_out/
    [Wed Sep 22 15:17:32 2010] Checking for Bowtie index files
    [Wed Sep 22 15:17:32 2010] Checking for reference FASTA file
    [Wed Sep 22 15:17:32 2010] Checking for Bowtie
    Bowtie version: 0.12.7.0
    [Wed Sep 22 15:17:32 2010] Checking reads
    Error: could not open file hg19.fa

    I got the UCSC h19 genome from the pre-built indexes on Tophat's homepage. The first time i ran it, Tophat reconstituted the fasta file, hg19.fa in its Tophat_out folder, which i then moved into Bowtie's indexes folder.

    I have also changed the Gerald output file s_5_sequence.txt to s_sequence.fq via MAQ's fq_all2std.pl sol2std command.


    I am using:
    4GB RAM w/ Quad processor
    Ubuntu 10.04.1-desktop-amd64
    Bowtie 0.12.7-linux-x86_64
    MAQ 0.7.0-x86-64_linux
    Tophat 1.0.13
    UCSC hg19.ebwt (from Tophat homepage)
    Solexa pipeline 1.6 (i believe)

    Bowtie and Tophat seemed to install okay and passed the test runs (ala their manuals), and they are both in the PATH (after some toil).

    Can anyone provide some input on this. I haven't noticed any threads on this so far. I would be very appreciative. Forgive my naive question, I am a newbie.
    Last edited by jdanderson; 09-22-2010, 03:08 PM.

  • #2
    Toph

    So I was able to get Tophat to recognize the hg19.fa by executing the commands while in the bowtie/indexes directory.
    However, now i get the following error printed up:

    tophat -r 220 /home/johnathon/bowtie-0.12.7/indexes/hg19 hg19.fa /home/johnathon/maq-0.7.0_x86_64-linux/s_5_sequence s_5_sequence.fq

    [Wed Sep 22 21:27:30 2010] Beginning TopHat run (v1.0.13)
    -----------------------------------------------
    [Wed Sep 22 21:27:30 2010] Preparing output location ./tophat_out/
    [Wed Sep 22 21:27:30 2010] Checking for Bowtie index files
    [Wed Sep 22 21:27:30 2010] Checking for reference FASTA file
    [Wed Sep 22 21:27:30 2010] Checking for Bowtie
    Bowtie version: 0.12.7.0
    [Wed Sep 22 21:27:30 2010] Checking reads
    Warning: found a read < 20bp in hg19.fa
    Warning: found a read < 20bp in hg19.fa
    Warning: found a read < 20bp in hg19.fa
    Warning: found a read < 20bp in hg19.fa
    Warning: found a read < 20bp in hg19.fa
    Warning: found a read < 20bp in hg19.fa
    Warning: found a read < 20bp in hg19.fa
    Warning: found a read < 20bp in hg19.fa
    Warning: found a read < 20bp in hg19.fa
    Warning: found a read < 20bp in hg19.fa
    Warning: found a read < 20bp in hg19.fa
    Warning: found a read < 20bp in hg19.fa
    seed length: 20bp
    format: fasta
    [FAILED]
    Error: could not execute prep_reads


    When I check the Tophat_out folder for the log, i find the following out put in the prep_reads.log:

    prep_reads v1.0.13
    ---------------------------
    Error: cannot open reads file /home/johnathon/maq-0.7.0_x86_64-linux/s_5_sequence for reading


    As I mentioned before, I used MAQ to convert it to a Sanger FASTQ and I also ran Bowtie-Inspector on the hg19 index.

    Can anyone provide some guidance?
    Anyone else had this issue before?

    Comment


    • #3
      On the off chance that this thread may benefit someone, I will try to continue to update it.

      So I have had mediocre success with reinstalling Bowtie via the source extraction .src.zip) method rather than the pre-compiled version (off a suggestion from a friend) and running my sequence.fq (that had been converted from solexa sequence.txt by the popular script from MAQ) through Bowtie first and then using the output file to input into Tophat.

      Now when I run Tophat as mentioned before, I now get a new error:

      tophat -r 220 /home/johnathon/bowtie-0.12.7/indexes/hg19 hg19.fa /home/johnathon/bowtie-0.12.7/indexes/s_5_sequence.fq s_5_sequences.fq

      [Thu Sep 23 21:23:06 2010] Beginning TopHat run (v1.0.13)
      -----------------------------------------------
      [Thu Sep 23 21:23:06 2010] Preparing output location ./tophat_out/
      [Thu Sep 23 21:23:06 2010] Checking for Bowtie index files
      [Thu Sep 23 21:23:06 2010] Checking for reference FASTA file
      [Thu Sep 23 21:23:06 2010] Checking for Bowtie
      Bowtie version: 0.12.7.0
      [Thu Sep 23 21:23:06 2010] Checking reads
      Warning: found a read < 20bp in hg19.fa
      Warning: found a read < 20bp in hg19.fa
      Warning: found a read < 20bp in hg19.fa
      Warning: found a read < 20bp in hg19.fa
      Warning: found a read < 20bp in hg19.fa
      Warning: found a read < 20bp in hg19.fa
      Warning: found a read < 20bp in hg19.fa
      Warning: found a read < 20bp in hg19.fa
      Warning: found a read < 20bp in hg19.fa
      Warning: found a read < 20bp in hg19.fa
      Warning: found a read < 20bp in hg19.fa
      Warning: found a read < 20bp in hg19.fa
      seed length: 20bp
      format: fasta
      [Thu Sep 23 21:25:53 2010] Mapping reads against hg19 with Bowtie
      [Thu Sep 23 21:26:03 2010] Joining segment hits
      [Thu Sep 23 21:26:03 2010] Mapping reads against hg19 with Bowtie
      [Thu Sep 23 21:26:06 2010] Joining segment hits
      [Thu Sep 23 21:26:06 2010] Searching for junctions via segment mapping
      Warning: junction database is empty!
      [Thu Sep 23 21:27:51 2010] Joining segment hits
      [Thu Sep 23 21:27:51 2010] Joining segment hits
      [Thu Sep 23 21:27:51 2010] Reporting output tracks
      [FAILED]
      Error: Report generation failed with err = 1


      Although the thread on here entitled "Running ~35 bp and >=50 RNASeq reads" may provide some guidance (only helpful thread i could find). I will try and report back results of trimming. Running ~35 bp and >=50 RNASeq reads

      Comment


      • #4
        Sry if I misunderstood your problem, but would not be the the right command as simply as:

        tophat -r 220 /home/johnathon/bowtie-0.12.7/indexes/hg19 /home/johnathon/maq-0.7.0_x86_64-linux/s_5_sequence.fq

        Comment


        • #5
          Hello OxTcO,

          Thank you for your reply. I appreciate any help I can get! That's an interesting point, as you might be able to guess I do not have a strong computer background.

          I have tried your suggested input, but now I get an all together different error printing up:

          tophat -r 220 /home/johnathon/bowtie-0.12.7/indexes/hg19 /home/johnathon/bowtie-0.12.7/indexes/s_5_sequence.fq

          [Fri Sep 24 08:18:33 2010] Beginning TopHat run (v1.0.13)
          -----------------------------------------------
          [Fri Sep 24 08:18:33 2010] Preparing output location ./tophat_out/
          [Fri Sep 24 08:18:33 2010] Checking for Bowtie index files
          [Fri Sep 24 08:18:33 2010] Checking for reference FASTA file
          [Fri Sep 24 08:18:33 2010] Checking for Bowtie
          Bowtie version: 0.12.7.0
          [Fri Sep 24 08:18:33 2010] Checking reads
          Error: file /home/johnathon/bowtie-0.12.7/indexes/s_5_sequence.fq does not appear to be a valid FASTA or FASTQ file
          seed length: 136bp
          format: fastq
          quality scale: phred33 (default)
          [Fri Sep 24 08:21:16 2010] Mapping reads against hg19 with Bowtie
          [Fri Sep 24 08:21:39 2010] Joining segment hits
          Traceback (most recent call last):
          File "/home/johnathon/tophat-1.0.13/bin/tophat", line 1635, in <module>
          sys.exit(main())
          File "/home/johnathon/tophat-1.0.13/bin/tophat", line 1595, in main
          user_supplied_juncs)
          File "/home/johnathon/tophat-1.0.13/bin/tophat", line 1395, in spliced_alignment
          segment_len)
          File "/home/johnathon/tophat-1.0.13/bin/tophat", line 1085, in split_reads
          reads_file = open(reads_filename)
          IOError: [Errno 2] No such file or directory: './tophat_out/tmp//left_kept_reads_missing.fq'


          Although, there is a left_kept_reads.fq (minus the "missing") in this tophat_out directory, however when i checked it, it was empty.

          I checked the tophat_out log directory and opened segment_junc.log and found the following error as well:



          segment_juncs v1.0.13
          ---------------------------
          Loading reference sequences...
          Loading chr1...done
          Loading chr2...done
          Loading chr3...done
          Loading chr4...done
          Loading chr5...done
          Loading chr6...done
          Loading chr7...done
          Loading chr8...done
          Loading chr9...done
          Loading chr10...done
          Loading chr11...done
          Loading chr12...done
          Loading chr13...done
          Loading chr14...done
          Loading chr15...done
          Loading chr16...done
          Loading chr17...done
          Loading chr18...done
          Loading chr19...done
          Loading chr20...done
          Loading chr21...done
          Loading chr22...done
          Loading chrX...done
          Loading chrY...done
          Loading chrM...done
          Found 0 potential split-segment junctions
          Indexing extensions in ./tophat_out/tmp//left_kept_reads_missing.fq
          Can't open file ./tophat_out/tmp//left_kept_reads_missing.fq for reading, skipping...
          Indexing extensions in ./tophat_out/tmp//right_kept_reads_missing.fq
          Can't open file ./tophat_out/tmp//right_kept_reads_missing.fq for reading, skipping...
          Looking for junctions by island end pairings
          Adding hits from segment file 0 to coverage map
          Adding hits from segment file 1 to coverage map
          Map covers 0 bases
          Map covers 0 bases in sufficiently long segments
          Map contains 1 good islands
          0 are left looking bases
          0 are right looking bases
          Collecting potential splice sites in islands
          reporting synthetic splice junctions...
          Found 0 potential island-end pairing junctions
          done
          Reporting potential splice junctions...done
          Reported 0 total possible splices




          Any ideas about this? I haven't attempted trimming the s_5_sequence.fq file yet (as per the afore mentioned thread).
          Last edited by jdanderson; 09-24-2010, 12:12 PM. Reason: mispelled word

          Comment


          • #6
            Could you send me the first, 4-8 lines of your fasta file. Could it be, that
            /bowtie-0.12.7/indexes/s_5_sequence.fq

            is indexed or so?

            Only the reference sequence (hg19) has to be indexed! For the fragment reads simply use the Fastq file (unedited). Bytheway, if you use Solexa data, consider using the

            --solexa-quals Use the Solexa scale for quality values in FASTQ files.

            or

            --solexa1.3-quals As of the Illumina GA pipeline version 1.3, quality scores are encoded in Phred-scaled base-64. Use this option for FASTQ files from pipeline 1.3 or later.

            parameters.



            Cheers, Michael

            Comment


            • #7
              Hello Michael,

              Thanks again for the reply.

              So the Solexa (v1.6) seq.txt file was (supposedly) converted to Sanger fastq via MAQ's fq_all2std.pl sol2std and inspected by bowtie-inspect, hence i thought there would be no need to use the --solexa1.3-quals option. But i'll give it a shot nonetheless and report back the results.

              The reason the seq.fq file is in the indexes directory is because, out of frustration, I ran the now converted seq.fq through a bowtie alignment (which seems to have been successful) to try and get a usable file format, and bowtie placed the output into the indexes directory (by default, i did not specify where it should go) and i have not bothered to move it.

              Here are the first several lines of s_5_sequence.fq file:

              SOLEXA2_0827_FC707M4AAXX:5:1:1004:21272#0/1 - chr1 9795118 GCTCGGGCAAAATGGTGGACGCCACTCAGGCTGATCTTGN A??@AAA@A?@?0>@??A>@A@@AA<>=>?64852/-,.% 0 0:G>N
              SOLEXA2_0827_FC707M4AAXX:5:1:1005:9407#0/1 + chr10 52374466 NGGGAATGCCCTGCTGGGCTAACCTGTGTATTACAACGCT %/++-.0503A@?@AA?A>??9?@@>>><<=<5>9>AA@? 0 0:T>N
              SOLEXA2_0827_FC707M4AAXX:5:1:1005:1901#0/1 - chrY 21152905 CCACTTTTAGGCTTAGGACCAGGTTCTAACTATCTAAAAN %%%%%%%%%%%%%%%%%%%%BBBB:>>>>>44482+.2/% 0 0:A>N
              SOLEXA2_0827_FC707M4AAXX:5:1:1006:1817#0/1 + chr12 117383234 NGGCACCTTCCGGATAGCAGCATCTCTGACTATTCTTGCT %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0 0:G>N
              SOLEXA2_0827_FC707M4AAXX:5:1:1006:7942#0/1 - chrM 780 TCAAAACGCTTAGCCTAGCCACACCCCCACGGGAAACAGN %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0 0:C>N


              I am not sure exactly how well this matches up with Sanger fastq, but i do notice some variation in the read lengths which is why i was going to try and use FASTX Toolkit's trimming function to standardize the lengths (as per this thread, http://seqanswers.com/forums/showthread.php?p=23066 ).

              I will try the above two mentioned methods and report back. Any thoughts in the mean time?

              Cheers, Johnathon
              Last edited by jdanderson; 09-24-2010, 12:13 PM. Reason: added my name

              Comment


              • #8
                That is no FastQ format as Tophat correctly complains. FastQ format looks like this:

                @SOLEXA2_0827_FC707M4AAXX:5:1:1004:21272
                GCTCGGGCAAAATGGTGGACGCCACTCAGGCTGATCTTGN
                +
                A??@AAA@A?@?0>@??A>@A@@AA<>=>?64852/-,.%
                @SOLEXA2_0827_FC707M4AAXX:5:1:1005:9407
                NGGGAATGCCCTGCTGGGCTAACCTGTGTATTACAACGCT
                +
                %/++-.0503A@?@AA?A>??9?@@>>><<=<5>9>AA@?

                Try to convert your input file.



                HTH, Michael ;-)

                Comment


                • #9
                  Hello Michael,

                  Thanks again for all the valuable input; I am grateful.

                  So i did actually use MAQ-0.7.0_x86_64linux fq_all2std.pl sol2std to convert the seq.txt file.

                  It seems to me that part of the difference in the two formats is the inclusion of a location id (eg #0/1- chr1 9795118) and the trailing "0 0:G>N".

                  The reason for the chr id tag is because these files have already been aligned using Gerald in the Solexa pipeline (v1.6). Ultimately I am trying to use Cufflinks to analyze the expression levels of these RNA-seq samples. I am simply using maq, bowtie and tophat to try and convert the file format accordingly for cufflinks. The file format i need for cufflinks is the SAM format and cufflinks suggests using tophat to acquire this.

                  The samples were run and initially analyzed by our sequencing core here at UCD. The set of output files they give you access to is limited; ie you only receive the aligned files like seq.txt, export.txt etc..

                  Since this is the case, do you think i need to chomp off the chr location id? I only need to convert the seq.txt into a SAM format... do you have any ideas for a simpler way of doing this?

                  Also, I just converted the export.txt file using MAQ's fq_all2std.pl export2std to see if this is any help.
                  Here is a sample of the output, which i think looks a bit closer to fastq:

                  @SOLEXA2_0827:5:1:1004:21272/1
                  NCAAGATCAGCCTGAGTGGCGTCCACCATTTTGCCCGAGC
                  +
                  %.,-/25846?>=><AA@@A@>A??@>0?@?A@AAA@??A
                  @SOLEXA2_0827:5:1:1005:9407/1
                  NGGGAATGCCCTGCTGGGCTAACCTGTGTATTACAACGCT
                  +
                  %/++-.0503A@?@AA?A>??9?@@>>><<=<5>9>AA@?
                  @SOLEXA2_0827:5:1:1005:1901/1
                  NTTTTAGATAGTTAGAACCTGGTCCTAAGCCTAAAAGTGG


                  I am in the middle of running tophat on this newly converted s_5_export.fq file. I will post results when it's finished.

                  Comment


                  • #10
                    Hello,

                    Just wanted to report that MAQ's fq_all2std.pl export2std conversion worked perfectly, unlike the sol2std command that i had used previously. The export2std command was the appropriate one to use due to the fact my files had already been aligned to hg19 via Solexa (v1.6) pipeline program Gerald. The location id needed to be deleted and I believe (after a cursory perusal) that it also helped out with read length uniformity (in addition to the PHRED quality score adjustment)

                    I also successfully ran Cufflinks with this Tophat output and have visualized the data in UCSC Genome Browser and I am attempting to reformat the Cufflinks output to use in IGV from the Broad Institute (and in R when I work up the courage).

                    I also want to say thank you to Michael from Leipzig, you were very gracious in helping out a fledgling student. Also, thank you to everyone who posts a thread with a problem and the kind people who reply; this has been very helpful to me.

                    Comment


                    • #11
                      tophat help!

                      Hello, I have no experience with any sort of programming, but am attempting to use top hat. can you please take me through a successful run on the sample data set. this is how i begin:

                      i have made a folder called top hat with the following folders: tophat-1.0.14.OSX_x86_64, bowtie-0.12.7, and test_data.

                      i begin in the terminal on mac and enter
                      cd desktop
                      cd tophat
                      cd tophat-1.0.14.OSX_x86_64

                      now when i run ./tophat -r 20 test_ref reads_1.fq reads_2.fq

                      an error message appears

                      Macintosh-99:~ common$ cd desktop
                      Macintosh-99:desktop common$ cd tophat
                      Macintosh-99:tophat common$ cd /Users/common/Desktop/tophat/tophat-1.0.14.OSX_x86_64
                      Macintosh-99:tophat-1.0.14.OSX_x86_64 common$ ./tophat
                      tophat:
                      TopHat maps short sequences from spliced transcripts to whole genomes.

                      Usage:
                      tophat [options] <bowtie_index> <reads1[,reads2,...,readsN]> [reads1[,reads2,...,readsN]]

                      Options:
                      -v/--version
                      -o/--output-dir <string> [ default: ./tophat_out ]
                      -a/--min-anchor <int> [ default: 8 ]
                      -m/--splice-mismatches <0-2> [ default: 0 ]
                      -i/--min-intron <int> [ default: 50 ]
                      -I/--max-intron <int> [ default: 500000 ]
                      -g/--max-multihits <int> [ default: 40 ]
                      -F/--min-isoform-fraction <float> [ default: 0.15 ]
                      --solexa-quals
                      --solexa1.3-quals (same as phred64-quals)
                      --phred64-quals (same as solexa1.3-quals)
                      -p/--num-threads <int> [ default: 1 ]
                      -G/--GFF <filename>
                      -j/--raw-juncs <filename>
                      -r/--mate-inner-dist <int>
                      --mate-std-dev <int> [ default: 20 ]
                      --no-novel-juncs
                      --no-gff-juncs
                      --no-coverage-search
                      --coverage-search
                      --no-closure-search
                      --closure-search
                      --fill-gaps
                      --microexon-search
                      --butterfly-search
                      --no-butterfly-search
                      --keep-tmp

                      Advanced Options:

                      --segment-mismatches <int> [ default: 2 ]
                      --segment-length <int> [ default: 25 ]
                      --min-closure-exon <int> [ default: 100 ]
                      --min-closure-intron <int> [ default: 50 ]
                      --max-closure-intron <int> [ default: 5000 ]
                      --min-coverage-intron <int> [ default: 50 ]
                      --max-coverage-intron <int> [ default: 20000 ]
                      --min-segment-intron <int> [ default: 50 ]
                      --max-segment-intron <int> [ default: 500000 ]

                      SAM Header Options (for embedding sequencing run metadata in output):
                      --rg-id <string> (read group ID)
                      --rg-sample <string> (sample ID)
                      --rg-library <string> (library ID)
                      --rg-description <string> (descriptive string, no tabs allowed)
                      --rg-platform-unit <string> (e.g Illumina lane ID)
                      --rg-center <string> (sequencing center name)
                      --rg-date <string> (ISO 8601 date of the sequencing run)
                      --rg-platform <string> (Sequencing platform descriptor)

                      for detailed help see http://tophat.cbcb.umd.edu/manual.html
                      Macintosh-99:tophat-1.0.14.OSX_x86_64 common$ cd ..
                      Macintosh-99:tophat common$ cd /Users/common/Desktop/tophat/bowtie-0.12.7
                      Macintosh-99:bowtie-0.12.7 common$ ./bowtie
                      No index, query, or output file specified!
                      Usage:
                      bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]

                      <m1> Comma-separated list of files containing upstream mates (or the
                      sequences themselves, if -c is set) paired with mates in <m2>
                      <m2> Comma-separated list of files containing downstream mates (or the
                      sequences themselves if -c is set) paired with mates in <m1>
                      <r> Comma-separated list of files containing Crossbow-style reads. Can be
                      a mixture of paired and unpaired. Specify "-" for stdin.
                      <s> Comma-separated list of files containing unpaired reads, or the
                      sequences themselves, if -c is set. Specify "-" for stdin.
                      <hit> File to write hits to (default: stdout)
                      Input:
                      -q query input files are FASTQ .fq/.fastq (default)
                      -f query input files are (multi-)FASTA .fa/.mfa
                      -r query input files are raw one-sequence-per-line
                      -c query sequences given on cmd line (as <mates>, <singles>)
                      -C reads and index are in colorspace
                      -Q/--quals <file> QV file(s) corresponding to CSFASTA inputs; use with -f -C
                      --Q1/--Q2 <file> same as -Q, but for mate files 1 and 2 respectively
                      -s/--skip <int> skip the first <int> reads/pairs in the input
                      -u/--qupto <int> stop after first <int> reads/pairs (excl. skipped reads)
                      -5/--trim5 <int> trim <int> bases from 5' (left) end of reads
                      -3/--trim3 <int> trim <int> bases from 3' (right) end of reads
                      --phred33-quals input quals are Phred+33 (default)
                      --phred64-quals input quals are Phred+64 (same as --solexa1.3-quals)
                      --solexa-quals input quals are from GA Pipeline ver. < 1.3
                      --solexa1.3-quals input quals are from GA Pipeline ver. >= 1.3
                      --integer-quals qualities are given as space-separated integers (not ASCII)
                      Alignment:
                      -v <int> report end-to-end hits w/ <=v mismatches; ignore qualities
                      or
                      -n/--seedmms <int> max mismatches in seed (can be 0-3, default: -n 2)
                      -e/--maqerr <int> max sum of mismatch quals across alignment for -n (def: 70)
                      -l/--seedlen <int> seed length for -n (default: 28)
                      --nomaqround disable Maq-like quality rounding for -n (nearest 10 <= 30)
                      -I/--minins <int> minimum insert size for paired-end alignment (default: 0)
                      -X/--maxins <int> maximum insert size for paired-end alignment (default: 250)
                      --fr/--rf/--ff -1, -2 mates align fw/rev, rev/fw, fw/fw (default: --fr)
                      --nofw/--norc do not align to forward/reverse-complement reference strand
                      --maxbts <int> max # backtracks for -n 2/3 (default: 125, 800 for --best)
                      --pairtries <int> max # attempts to find mate for anchor hit (default: 100)
                      -y/--tryhard try hard to find valid alignments, at the expense of speed
                      --chunkmbs <int> max megabytes of RAM for best-first search frames (def: 64)
                      Reporting:
                      -k <int> report up to <int> good alignments per read (default: 1)
                      -a/--all report all alignments per read (much slower than low -k)
                      -m <int> suppress all alignments if > <int> exist (def: no limit)
                      -M <int> like -m, but reports 1 random hit (MAPQ=0); requires --best
                      --best hits guaranteed best stratum; ties broken by quality
                      --strata hits in sub-optimal strata aren't reported (requires --best)
                      Output:
                      -t/--time print wall-clock time taken by search phases
                      -B/--offbase <int> leftmost ref offset = <int> in bowtie output (default: 0)
                      --quiet print nothing but the alignments
                      --refout write alignments to files refXXXXX.map, 1 map per reference
                      --refidx refer to ref. seqs by 0-based index rather than name
                      --al <fname> write aligned reads/pairs to file(s) <fname>
                      --un <fname> write unaligned reads/pairs to file(s) <fname>
                      --max <fname> write reads/pairs over -m limit to file(s) <fname>
                      --suppress <cols> suppresses given columns (comma-delim'ed) in default output
                      --fullref write entire ref name (default: only up to 1st space)
                      Colorspace:
                      --snpphred <int> Phred penalty for SNP when decoding colorspace (def: 30)
                      or
                      --snpfrac <dec> approx. fraction of SNP bases (e.g. 0.001); sets --snpphred
                      --col-cseq print aligned colorspace seqs as colors, not decoded bases
                      --col-cqual print original colorspace quals, not decoded quals
                      --col-keepends keep nucleotides at extreme ends of decoded alignment
                      SAM:
                      -S/--sam write hits in SAM format
                      --mapq <int> default mapping quality (MAPQ) to print for SAM alignments
                      --sam-nohead supppress header lines (starting with @) for SAM output
                      --sam-nosq supppress @SQ header lines for SAM output
                      --sam-RG <text> add <text> (usually "lab=value") to @RG line of SAM header
                      Performance:
                      -o/--offrate <int> override offrate of index; must be >= index's offrate
                      -p/--threads <int> number of alignment threads to launch (default: 1)
                      --mm use memory-mapped I/O for index; many 'bowtie's can share
                      --shmem use shared mem for index; many 'bowtie's can share
                      Other:
                      --seed <int> seed for random number generator
                      --verbose verbose output (for debugging)
                      --version print version information and quit
                      -h/--help print this usage message
                      Macintosh-99:bowtie-0.12.7 common$ cd ..
                      Macintosh-99:tophat common$ cd tophat-1.0.14.OSX_x86_64/
                      Macintosh-99:tophat-1.0.14.OSX_x86_64 common$ ./tophat -r 20 test_ref reads_1.fq reads_2.fq

                      [Mon Sep 27 15:35:35 2010] Beginning TopHat run (v1.0.14)
                      -----------------------------------------------
                      [Mon Sep 27 15:35:35 2010] Preparing output location ./tophat_out/
                      [Mon Sep 27 15:35:35 2010] Checking for Bowtie index files
                      Error: Could not find Bowtie index files test_ref.*
                      Macintosh-99:tophat-1.0.14.OSX_x86_64 common$

                      i have placed the test_ref files in the same folder as the bowtie index files.

                      can i have a more step by step procedure on the test_data? i am sure i will be able to extrapolate from there.

                      Comment


                      • #12
                        also, the data i am attempting to eventually analyze is from the gerald pipeline: s_N_ sequence.txt which is a fastq format

                        @GIRG_FC30MG9:1:1:872:535
                        GTTTTGGAAATGGGAGAATAGATTCCCCTTAAACT
                        +GIRG_FC30MG9:1:1:872:535
                        YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYRRRRR

                        which should be compatible for alignment with bowtie against the rn4.ebwt (rattus prebuilt index)

                        i hope to get alignment and splice information for viewing on ucsc genome browser.

                        will this be a completely task? i would appreciate some feedback. thanks!

                        Comment


                        • #13
                          Hello Nuclearriot,

                          So your first output simply shows that when you're in the tophat directory the tophat command will work. This demonstrates that your download and installation probably went okay for tophat. However, in order to get the computer to recognize that command (tophat) you need to tell it where to find it when you're in directories other than the tophat directory. This is commonly referred to as putting the command into your "PATH environmental variable"... if this hasn't been done yet, it will eventually need to be. Let me know if you want some guidance on this, i would be more than happy to help.

                          As for your second output, the computer doesn't seem to be able to find your test_data files. This could be because you are not in the bowtie/indexes directory where you put them, which is why the tophat manual tells you to "cd test_data"... they want you to be in the test_data directory when you run this command (which is why it is important you have your tophat commands in your "PATH environmental variable")...

                          As for your last post, yes you will be able to visualize this on the UCSC browser as a "custom track"... you can upload the file and even save your session if you wanted to, to let others see it.

                          Let me know if this helps or not, and if you have any more questions.

                          Regards-
                          Johnathon

                          Comment


                          • #14
                            Johnathon,

                            Thank you so much for your response. Your advice is highly appreciated.
                            Yes, please help me set up the PATH environmental variables.

                            -Shan

                            Comment


                            • #15
                              Hello Shan,

                              So it looks like your using a Mac OS. Let me preface anything I'm about to say by stating that I am not that familiar with Macs. I run on a Linux based OS (Ubuntu 10.04.1). The way in which I put the commands into the PATH is maybe slightly different than what you will have to do because of this. The following is a link that seemed to be useful for Mac users:

                              When you run a command from a UNIX or UNIX-like shell, the shell looks for the executable file using the directories listed in your PATH variable as a map. For convenience, adding directories to this environment variable means you don’t have to go hunting for a file each time you run it. Following these directions […]


                              The process appears to be rather similar to what I did. The above link talks about how you get into your home directory and open your .profile file with a text editor; the link mentions vi and TextEdit for Macs). You then find the export PATH= line and add/the/dir/to/your/command. NOTE, for the version of tophat that I used, tophat-1.0.13, the appropriate directory was the bin directory since this contained all the pertinent commands/files... e.g. export PATH=/home/johnathon/tophat-1.0.13/bin. It looks like you are using tophat 1.0.14, so it might be slightly different. You can tell by getting into the various directories and seeing where the important commands are located within tophat 1.0.14.

                              This should be a permanent solution to the issue; ie every time you login into your terminal it should be able to find where the tophat command is from any directory you try to execute it from. It should also be noted that it might be useful to put some of the common bowtie directories in there as well, eg indexes and reads, for future use (also cufflinks if your doing rna-seq analysis).

                              Let me know if this helps you at all and if you are able to get it execute properly.

                              Also, it may be helpful to look at the following link as well. It's a primer for UNIX and PERL for biologists by Dr. Korf at UCD. (PERL is a common bioinformatic scripting langauge):



                              Regards,
                              Johnathon

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              23 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X