SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   transcriptome_index Bowtie2 Index (http://seqanswers.com/forums/showthread.php?t=60872)

imsharmanitin 06-30-2015 03:38 AM

transcriptome_index Bowtie2 Index
 
Dear all,

As far as understand Tophat2 creates an transcriptome_index which also includes Bowtie2 Index files using GTF and the genome file. Then why do we need to supply Bowtie2 Index to create transcriptome index for the first time.

I tried to create transcriptome index without supplying bowtie2 index files and it gave following error.

tophat2 -G ../data/Homo_sapiens.GRCh37.75.gtf --transcriptome-index ./transcriptome_index/ ../data/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa

[2015-06-30 12:25:22] Building transcriptome files with TopHat v2.0.13
-----------------------------------------------
[2015-06-30 12:25:22] Checking for Bowtie
Bowtie version: 2.2.4.0
[2015-06-30 12:25:24] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (../data/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.*.bt2)



However, after that I created bowtie2 index files and ran same process and no error was produced.

GenoMax 06-30-2015 03:46 AM

Tophat2 uses the entire human genome index files and then creates a "transcriptome-only" subset using information in the GTF file. You only need to do this once, as you discovered.

imsharmanitin 06-30-2015 05:07 AM

Quote:

Originally Posted by GenoMax (Post 176548)
Tophat2 uses the entire human genome index files and then creates a "transcriptome-only" subset using information in the GTF file. You only need to do this once, as you discovered.


so this will be the workflow:

1) generate whole genome index using bowtie2
Homo_sapiens.GRCh37.75.dna.primary_assembly.fa
Homo_sapiens.GRCh37.75.gtf


2) the using tophat 2 to create a "transcriptome-only" subset using the whole genome index flie created in step 1

3) point the tophat 2 to the directory containing "transcriptome-only" subset using

--transcriptome-index <directory containing "transcriptome-only" subset>

for subsequent runs

GenoMax 06-30-2015 05:11 AM

That is correct. When you are using --transcriptome-only option consider additional options (e.g. -T) that become relevant (scroll down to find the --transcriptome-only option section): https://ccb.jhu.edu/software/tophat/manual.shtml#toph.

imsharmanitin 06-30-2015 04:43 PM

Quote:

Originally Posted by GenoMax (Post 176556)
That is correct. When you are using --transcriptome-only option consider additional options (e.g. -T) that become relevant (scroll down to find the --transcriptome-only option section): https://ccb.jhu.edu/software/tophat/manual.shtml#toph.

If i understood correctly, when we use the --transcriptome-index without -T then the reads will be first mapped to the transcriptome-index and the reads which fail to match will be mapped to genome (just like using only -G option)

on the other hand if i use -T then mapping will happen only with transcriptome and report only those mappings as genomic mappings.

if i want to map to genome only then use of -G and -T should be avoided.

Also as far i have read and understood mapping to both transcriptome and genome will be an ideal approach i.e. use of -T should be avoided. Am I right?


All times are GMT -8. The time now is 09:41 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.