SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat memory usage during "Searching for junctions via segment mapping" biznatch RNA Sequencing 9 02-18-2013 09:47 AM
Tophat v1.1 with GTF files hyjkim Bioinformatics 7 12-17-2012 07:11 AM
Tophat usage for paired-end reads vkartha Introductions 2 03-20-2012 10:53 PM
segment_juncs memory usage while running Tophat genec Bioinformatics 1 11-22-2011 07:09 AM
TopHat approximate run time & memory usage? xinchen Bioinformatics 4 05-18-2010 02:47 AM

Reply
 
Thread Tools
Old 08-23-2012, 02:14 AM   #1
wariobrega
Member
 
Location: Quadraro, Rome

Join Date: Jul 2012
Posts: 11
Default GTF usage in Tophat

I am trying to use Tophat to find novel splicing junction on a zebrafish RNAseq done with the Illumina CAGE-protocol. I am quite novel to the usage of tophat, and I am making several trials to find the best options combination for my samples, yet I don't completely understand the -GTF (paired with the --transcriptome-index options).

as stated in the Tophat manual for the --GTF option:
Quote:
Supply TopHat with a set of gene model annotations and/or known transcripts, as a GTF 2.2 or GFF3 formatted file. If this option is provided, TopHat will first extract the transcript sequences and use Bowtie to align reads to this virtual transcriptome first. Only the reads that do not fully map to the transcriptome will then be mapped on the genome. The reads that did map on the transcriptome will be converted to genomic mappings (spliced as needed) and merged with the novel mappings and junctions in the final tophat output.
Please note that the values in the first column of the provided GTF/GFF file (column which indicates the chromosome or contig on which the feature is located), must match the name of the reference sequence in the Bowtie index you are using with TopHat. You can get a list of the sequence names in a Bowtie index by typing:

bowtie-inspect --names your_index

So before using a known annotation file with this option please make sure that the 1st column in the annotation file uses the exact same chromosome/contig names (case sensitive) as shown by the bowtie-inspect command above.
As far as I understand, tophat use a GTF file to build an index (if the gtf file matches with the bowtie index in terms of position and sequence). this Index can be re-used sing the -transcriptome-index option.

After that, TH aligns the reads against this "GTF index", and discards all the reads that perfectly matches this index, focusing on the reads that not align to the index to find new splicing sites. Is this correct? If it is, then two questions raise up:

1) will the reads be aligned against this GTF index without even try to splice them? The perfect match happens before the splicing algorithm?

2) which reference will be better to use with this option? a reference genome or a reference trascriptome? And why?

thanks for your answers!

Daniele
wariobrega is offline   Reply With Quote
Old 08-24-2012, 12:45 AM   #2
glados
Member
 
Location: Aperture Science

Join Date: Mar 2012
Posts: 59
Default

As I've understood it, tophat first uses the information in the annotation gtf to map all the reads that match to all the known genes. After that you'll be left with a bunch of reads that did not match known genes, they will be mapped as usual to the genome. Possibly they represent novel genes or something else. You have to use a reference gene model for this, i.e. the known transciptome, the genome you are suppose to supply to tophat in the form of bowtie index.
glados is offline   Reply With Quote
Old 09-03-2012, 06:45 AM   #3
wariobrega
Member
 
Location: Quadraro, Rome

Join Date: Jul 2012
Posts: 11
Default

Quote:
Originally Posted by glados View Post
As I've understood it, tophat first uses the information in the annotation gtf to map all the reads that match to all the known genes. After that you'll be left with a bunch of reads that did not match known genes, they will be mapped as usual to the genome. Possibly they represent novel genes or something else. You have to use a reference gene model for this, i.e. the known transciptome, the genome you are suppose to supply to tophat in the form of bowtie index.
Ty Glados, I found out what was not working!
wariobrega is offline   Reply With Quote
Old 12-17-2012, 07:11 AM   #4
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

So you can supply TopHat with a GTF file of annotated transcripts, which, using the --GTF option, will be the first place where reads are mapped, followed by the whole genome, with or without novel junction discovery in this second stage. As I understand it, this is after TopHat 1.4.
I'm curious to know how t was before 1.4. I think you could already give TopHat a GTF file, but it used it second. Am I right? If so, what is the difference between using it [the GTF file] first and using it second after the genome?
carmeyeii is offline   Reply With Quote
Old 02-26-2015, 01:07 AM   #5
archana2287
Junior Member
 
Location: INDIA

Join Date: Feb 2015
Posts: 5
Unhappy

Hello everyone

In tophat manual it is given that

-T/--transcriptome-only Only align the reads to the transcriptome and report only those mappings as genomic mappings.

how does it differ from -G . ( As -G do the same , extract the reads mapped against the given transcript present in the GTF file )


I did mapping in two different ways ..
Tophat Mapping without -T

python tophat.py -p 8 -G jsn.gff -o LIB_SG323_FJSN_Trans refernece.fa 1_fastq_1 1_fastq_2

and with -T and -G ,

python tophat.py -p 8 -T -G jsn.gff -o LIB_SG323_FJSN_Trans refernece.fa 1_fastq_1 1_fastq_2

I got the difference in FPKM values . How running tophat with first command differ from the second one??
archana2287 is offline   Reply With Quote
Old 02-26-2015, 10:42 AM   #6
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Let's look at the manual about the '-G' option

Quote:
Supply TopHat with a set of gene model annotations and/or known transcripts, as a GTF 2.2 or GFF3 formatted file. If this option is provided, TopHat will first extract the transcript sequences and use Bowtie to align reads to this virtual transcriptome first. Only the reads that do not fully map to the transcriptome will then be mapped on the genome. The reads that did map on the transcriptome will be converted to genomic mappings (spliced as needed) and merged with the novel mappings and junctions in the final tophat output.
Compare to '-T'

Quote:
Only align the reads to the transcriptome and report only those mappings as genomic mappings.
I hope that it is obvious that the two map reads in different ways. The first should be a super-set of the second.
westerman is offline   Reply With Quote
Reply

Tags
bowtie, gtf, tophat, transcriptome

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:01 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO