SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   RNA Sequencing (http://seqanswers.com/forums/forumdisplay.php?f=26)
-   -   TopHat2 on multiple samples, avoid building Bowtie index from genes.fa each time? (http://seqanswers.com/forums/showthread.php?t=46731)

LeonDK 09-18-2014 01:20 AM

TopHat2 on multiple samples, avoid building Bowtie index from genes.fa each time?
 
Hi all,

Ultimately aiming at differential expression, I'm mapping human RNAseq read using tophat2 with the following command:
Code:

tophat2 --num-threads 12 --GTF /Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf /Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome myfastq_R1.fastq.gz myfastq_R2.fastq.gz
Is it really necessary to:
Code:

[2014-09-18 10:38:45] Building transcriptome data files /tmp/genes
[2014-09-18 10:39:21] Building Bowtie index from genes.fa

Foreach sample? I mean - The different samples are all mapped using the same Bowtie2Index/genome files and the same Genes/genes.gtf files?

Cheers,
Leon

dpryan 09-18-2014 01:33 AM

Have a look at the --transcriptome-index option, which is what you're looking for.

LeonDK 09-18-2014 02:41 AM

Quote:

Originally Posted by dpryan (Post 150195)
Have a look at the --transcriptome-index option, which is what you're looking for.

Hi dpryan,

Thanks for input reg. the --transcriptome-index option for tophat2. I looked it up in the TopHat2 manual. For other users, which may encounter the same challenge - The trick is to run this command first:
Code:

tophat2 -G iGenomes/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf --transcriptome-index=transcriptome_data/known iGenomes/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome
and then subsequently call tophat2 with this command:
Code:

tophat2 --num-threads 12 --transcriptome-index=transcriptome_data/known iGenomes/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome myfastq_R1.fastq.gz myfastq_R2.fastq.gz
After running the above command, you'll see
Code:

[2014-09-18 12:12:04] Using pre-built transcriptome data..
Which is significantly faster, when running multiple samples.

The UCSC/hg19 data can retrieved like so:
Code:

wget ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/Homo_sapiens/UCSC/hg19/Homo_sapiens_UCSC_hg19.tar.gz
Cheers,
Leon

konika 08-11-2015 05:48 AM

tophat not creating transcriptome indexes
 
Hi
In my case The following command doesnt start tophat2. tophat2 just shows me the available options, like I have used a wrong option somewhere. Does anyone has an idea whats wrong here
The command I use:
tophat2 -G /home/chawla/rna_seq_pipeline/gff/mouse_ensembl.gff --transcriptome-index=tdata /home/chawla/rna_seq_pipeline/gff/mouse_ensembl

GenoMax 08-11-2015 06:14 AM

Quote:

Originally Posted by konika (Post 178893)
Hi
In my case The following command doesnt start tophat2. tophat2 just shows me the available options, like I have used a wrong option somewhere. Does anyone has an idea whats wrong here
The command I use:
tophat2 -G /home/chawla/rna_seq_pipeline/gff/mouse_ensembl.gff --transcriptome-index=tdata /home/chawla/rna_seq_pipeline/gff/mouse_ensembl

You have to point tophat2 process to the indexes for the full genome. It appears that you are including a gff file instead of the bowtie2 indexes at the end of your command. Refer to LeonDK's example in posts above.

konika 08-11-2015 06:37 AM

Thanks, it was actually old version of tophat that also needs an input read to create transcriptome indexes.


All times are GMT -8. The time now is 11:59 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.