Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • LeonDK
    Member
    • Sep 2014
    • 69

    TopHat2 on multiple samples, avoid building Bowtie index from genes.fa each time?

    Hi all,

    Ultimately aiming at differential expression, I'm mapping human RNAseq read using tophat2 with the following command:
    Code:
    tophat2 --num-threads 12 --GTF /Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf /Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome myfastq_R1.fastq.gz myfastq_R2.fastq.gz
    Is it really necessary to:
    Code:
    [2014-09-18 10:38:45] Building transcriptome data files /tmp/genes
    [2014-09-18 10:39:21] Building Bowtie index from genes.fa
    Foreach sample? I mean - The different samples are all mapped using the same Bowtie2Index/genome files and the same Genes/genes.gtf files?

    Cheers,
    Leon
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Have a look at the --transcriptome-index option, which is what you're looking for.

    Comment

    • LeonDK
      Member
      • Sep 2014
      • 69

      #3
      Originally posted by dpryan View Post
      Have a look at the --transcriptome-index option, which is what you're looking for.
      Hi dpryan,

      Thanks for input reg. the --transcriptome-index option for tophat2. I looked it up in the TopHat2 manual. For other users, which may encounter the same challenge - The trick is to run this command first:
      Code:
      tophat2 -G iGenomes/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf --transcriptome-index=transcriptome_data/known iGenomes/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome
      and then subsequently call tophat2 with this command:
      Code:
      tophat2 --num-threads 12 --transcriptome-index=transcriptome_data/known iGenomes/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome myfastq_R1.fastq.gz myfastq_R2.fastq.gz
      After running the above command, you'll see
      Code:
      [2014-09-18 12:12:04] Using pre-built transcriptome data..
      Which is significantly faster, when running multiple samples.

      The UCSC/hg19 data can retrieved like so:
      Code:
      wget ftp://igenome:[email protected]/Homo_sapiens/UCSC/hg19/Homo_sapiens_UCSC_hg19.tar.gz
      Cheers,
      Leon

      Comment

      • konika
        Member
        • Sep 2010
        • 14

        #4
        tophat not creating transcriptome indexes

        Hi
        In my case The following command doesnt start tophat2. tophat2 just shows me the available options, like I have used a wrong option somewhere. Does anyone has an idea whats wrong here
        The command I use:
        tophat2 -G /home/chawla/rna_seq_pipeline/gff/mouse_ensembl.gff --transcriptome-index=tdata /home/chawla/rna_seq_pipeline/gff/mouse_ensembl

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #5
          Originally posted by konika View Post
          Hi
          In my case The following command doesnt start tophat2. tophat2 just shows me the available options, like I have used a wrong option somewhere. Does anyone has an idea whats wrong here
          The command I use:
          tophat2 -G /home/chawla/rna_seq_pipeline/gff/mouse_ensembl.gff --transcriptome-index=tdata /home/chawla/rna_seq_pipeline/gff/mouse_ensembl
          You have to point tophat2 process to the indexes for the full genome. It appears that you are including a gff file instead of the bowtie2 indexes at the end of your command. Refer to LeonDK's example in posts above.

          Comment

          • konika
            Member
            • Sep 2010
            • 14

            #6
            Thanks, it was actually old version of tophat that also needs an input read to create transcriptome indexes.

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            30 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            96 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            116 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            109 views
            0 reactions
            Last Post SEQadmin2  
            Working...