SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Could not find Bowtie2 index files (genome.*.bt2) Juntheboon Bioinformatics 10 03-03-2015 11:25 AM
Top Hat and library option Fiki RNA Sequencing 0 06-26-2012 05:10 AM
Top Hat and Illumina standard protocol Fiki RNA Sequencing 0 06-26-2012 04:43 AM
comparing Bowtie/DESeq and Top-Hat/Cufflinks results maryb Bioinformatics 5 03-13-2012 07:46 AM
Questions about Top-Hat and Paired End Reads Cirno Bioinformatics 4 12-03-2011 11:07 AM

Reply
 
Thread Tools
Old 05-12-2016, 07:15 PM   #1
aurorasea1
Junior Member
 
Location: Earth

Join Date: May 2016
Posts: 5
Default Running Top-HAT/Bowtie2 index base name & transcript files help

Hello, I am new to running RNAseq data and I am getting confused about the terminologies used for running the program.
Right now, I am trying to run Tophat & bowtie2 using the Ugene software's workflow.
It requires me to enter:
1. Bowtie index base name
2. Known transcript file
3. Raw junctions

UGENE's software tutorial page is not very detailed in the instructions and so I visited Bowtie's website.
I found these files available for download
A) H. sapiens NCBI GRCh38 (ftp://ftp.ncbi.nlm.nih.gov/genomes/a...e_index.tar.gz) - >3.5gb size file
and also this:
B) H. sapiens, EMSEMBL GrCH37 (ftp://igenome:G3nom3s4u@ussd-ftp.ill..._GRCh37.tar.gz) -> more than 18gb size file

May I know if it's correct to use (A) as index file, and call it GrCH38 as bowtie index base name?

And is it correct to call (B) the transcript file ?

As for "raw junctions", where can I find the list of raw junctions?
Would really appreciate your help.
aurorasea1 is offline   Reply With Quote
Old 05-13-2016, 04:22 AM   #2
Michael.Ante
Senior Member
 
Location: Vienna

Join Date: Oct 2011
Posts: 123
Default

Hi,

A) and B) are different releases of the Human gene assembly. So don't mix them.
If you want to have a bowtie index and its corresponding transcriptome index, download the Fasta files and the GTF from the same source. I'd recommend the ENSEMBL annotation (see fasta files here and the gtf here). There are some discussions ongoing whether to use the primary assembly (all chromosomes) or the toplevel assembly (all chromosomes plus patches and haplotype sequences). For the beginning, I'd start with the primary one.

After building the index with bowtie2-build (say you name it GRCh38.84), you can create the index for the transcripts with Tophat2.
Code:
tophat -G Homo_sapiens.GRCh38.84.gtf --transcriptome-index=transcriptome_data/known GRCh38.84
Cheers,

Michael
Michael.Ante is offline   Reply With Quote
Old 05-14-2016, 12:28 AM   #3
aurorasea1
Junior Member
 
Location: Earth

Join Date: May 2016
Posts: 5
Default

Quote:
Originally Posted by Michael.Ante View Post
Hi,

A) and B) are different releases of the Human gene assembly. So don't mix them.
If you want to have a bowtie index and its corresponding transcriptome index, download the Fasta files and the GTF from the same source. I'd recommend the ENSEMBL annotation (see fasta files here and the gtf here). There are some discussions ongoing whether to use the primary assembly (all chromosomes) or the toplevel assembly (all chromosomes plus patches and haplotype sequences). For the beginning, I'd start with the primary one.

After building the index with bowtie2-build (say you name it GRCh38.84), you can create the index for the transcripts with Tophat2.
Code:
tophat -G Homo_sapiens.GRCh38.84.gtf --transcriptome-index=transcriptome_data/known GRCh38.84
Cheers,

Michael
Hi Michael, thanks.
I'm running on Ugene but I've got these error results.
[2016-05-14 16:19:04] Beginning TopHat run (v2.0.9)
-----------------------------------------------
[2016-05-14 16:19:04] Checking for Bowtie
Bowtie version: 2.1.0.0
[2016-05-14 16:19:04] Checking for Samtools
Samtools version: 0.1.19.0
[2016-05-14 16:19:04] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (/Users/*.bt2)

Now it's asking for *bt2 file and I'm at loss of what type of file I should be using to run the analysis properly.
Thank you for all experts here for your useful tips.
aurorasea1 is offline   Reply With Quote
Old 05-15-2016, 11:01 AM   #4
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

You need to make a bowtie index of the reference genome with the bowtie2-build command before you run tophat. This will produce some 6 files with suffixes like .1.bt2, .2.bt2, .3.bt2, .4.bt2, .rev1.bt2 and .rev2.bt2, with the name of the genome as prefix.

you need to specify the path to the genome index files and the prefix of the genome index files in your tophat command,
mastal is offline   Reply With Quote
Old 07-08-2016, 05:58 AM   #5
maxter
Junior Member
 
Location: Santiago de Chile

Join Date: Apr 2016
Posts: 2
Default

Hi,

In UGENE, Settings>Preferences>External Tools, yo have to put the path for every program (tophat, bowtie, etc).

For the index, you can do it first, before running all the workflow in Tools>NGS data analysis>Build index for reads mapping. That work for me, now i'm just trying to figure out how to retrieve all the information of the Tuxedo protocol.

Regards
maxter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:48 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO