SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to run Tophat with annotation file masylichu Bioinformatics 2 09-06-2011 07:25 PM
[tophat-fusion]The priority of annotation is refGene or ensGene?‏ louis7781x Bioinformatics 0 08-15-2011 05:21 AM
tophat/cufflinks for novel genome annotation darked89 Bioinformatics 1 11-18-2010 06:53 AM
GO annotation Ramet Bioinformatics 0 09-07-2010 02:14 AM
Annotation johnsequence Bioinformatics 1 02-10-2010 04:02 AM

Reply
 
Thread Tools
Old 08-05-2011, 05:19 AM   #1
louis7781x
Member
 
Location: Hong Kong

Join Date: Oct 2010
Posts: 74
Default tophat with/without annotation,and cufflink with annotation?

Hi,

I want to know if I

(1)run tophat with annotation and use accepted_hits.bam to run cufflink with annotation

(2)run tophat without annotation and use accepted_hits.bam to run cufflink with annotation

Does any one know what is difference in output result?

Thanks

Best Regard!
louis7781x is offline   Reply With Quote
Old 08-10-2011, 07:35 PM   #2
rhcr56
Junior Member
 
Location: Columbia, MO

Join Date: Aug 2011
Posts: 7
Default Annotation

I am interested in this issue as well. The manual describes that TopHat will use exon records in the annotation file to build a set of known splice junctions for each gene, and will attempt to align reads to these junctions. I have a suspicion that without supplying annotation the software might be more sensitive to finding novel junctions, but that's just a guess. To test this, I have recently run several lanes/samples of data using both criteria with TopHat and the last of the Cufflinks data should be completed by this weekend. Is there anything specific in which you are interested? If so, let me know and I'd gladly look into it.
rhcr56 is offline   Reply With Quote
Old 08-10-2011, 07:36 PM   #3
rhcr56
Junior Member
 
Location: Columbia, MO

Join Date: Aug 2011
Posts: 7
Default Annotation

Also, keep me in the loop and let me know if you find anything. Thanks!

Last edited by rhcr56; 08-10-2011 at 07:41 PM.
rhcr56 is offline   Reply With Quote
Old 08-11-2011, 01:21 AM   #4
chenyao
Member
 
Location: Beijing

Join Date: Jul 2011
Posts: 74
Default

how can you use tophat without annotation? You need reference to map the reads.
chenyao is offline   Reply With Quote
Old 08-11-2011, 01:26 AM   #5
louis7781x
Member
 
Location: Hong Kong

Join Date: Oct 2010
Posts: 74
Default

Quote:
Originally Posted by chenyao View Post
how can you use tophat without annotation? You need reference to map the reads.
You can see paper or ask them why that can detect splicing junction without annotation.


http://bioinformatics.oxfordjournals.org/content/25/9/1105.abstract


TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.

You can choose with annotation or without annotation while using tophat.

Of course it need to reference sequence;Annotation is mean gtf file,not reference seqerence.

Last edited by louis7781x; 08-11-2011 at 01:36 AM.
louis7781x is offline   Reply With Quote
Old 08-11-2011, 01:34 AM   #6
chenyao
Member
 
Location: Beijing

Join Date: Jul 2011
Posts: 74
Default

Quote:
Originally Posted by louis7781x View Post
You can see paper or ask them why that can detect splicing junction without annotation.


http://bioinformatics.oxfordjournals.org/content/25/9/1105.abstract


TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.

You can choose with annotation or without annotation while using tophat.

Of course it need to reference sequence;Annotation is mean gtf file,not reference seqerence.
But I don't see any command of tophat which provide an option to supply the annotation file.
chenyao is offline   Reply With Quote
Old 08-11-2011, 01:35 AM   #7
louis7781x
Member
 
Location: Hong Kong

Join Date: Oct 2010
Posts: 74
Default

Quote:
Originally Posted by chenyao View Post
But I don't see any command of tophat which provide an option to supply the annotation file.
Supplying your own junctions:

The options below allow you validate your own junctions with your RNA-Seq data. Note that the chromosome names in the files provided with the options below must match the names in the Bowtie index. These names are case-senstitive.

-G/--GTF <GTF 2.2 file>

Supply TopHat with a list of gene model annotations. TopHat will use the exon records in this file to build a set of known splice junctions for each gene, and will attempt to align reads to these junctions even if they would not normally be covered by the initial mapping.
-j/--raw-juncs <.juncs file>

Supply TopHat with a list of raw junctions. Junctions are specified one per line, in a tab-delimited format. Records look like:
<chrom> <left> <right> <+/->

left and right are zero-based coordinates, and specify the last character of the left sequenced to be spliced to the first character of the right sequence, inclusive. That is, the last and the first positions of the flanking exons. Users can convert junctions.bed (one of the TopHat outputs) to this format using bed_to_juncs < junctions.bed > new_list.juncs where bed_to_juncs can be found under the same folder as tophat
--no-novel-juncs Only look for reads across junctions indicated in the supplied GFF or junctions file. (ignored without -G/-j)
louis7781x is offline   Reply With Quote
Old 08-19-2011, 12:06 PM   #8
schaffer
Member
 
Location: San Diego

Join Date: Apr 2009
Posts: 12
Default TopHat with or without GTF file?

I have been using TopHat without a GTF file and counted the mapped reads with htseq. The results looked good. Now I am wondering whether I should be using a GTF file? I will run again using GTF file to look at differences. Anyone else notice which is best?
schaffer is offline   Reply With Quote
Old 08-19-2011, 01:35 PM   #9
emilyjia2000
Member
 
Location: usa

Join Date: May 2011
Posts: 59
Default

I run tophat without annotation, percentage of alignment is not as good as other alignment, say Elandrna. of course, when we test the alignment coverage, we chose refseq annotation. Anyone knows what's going on?
emilyjia2000 is offline   Reply With Quote
Old 08-25-2011, 09:03 AM   #10
schaffer
Member
 
Location: San Diego

Join Date: Apr 2009
Posts: 12
Default a few more reads with annotion file

Hi,
I found a few more reads when designating an annotation file.
Lana
schaffer is offline   Reply With Quote
Old 08-31-2011, 03:32 PM   #11
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

The advantage of using the annotation GTF is for the mapping of reads from low expressed transcripts that cross an exon-exon junction. In the de novo mode a junction is only defined if a read flanks on both sides by at least 8 bp. (you can modify this setting:

-a/--min-anchor-length <int> The "anchor length". TopHat will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. This must be at least 3 and the default is 8.

So when you provide the GTF many of the reads in low coverage regions now align even if only 4 bp exist in one exon as the GTF basically says this is a known junction, go ahead and align to it.

In all my comparisons, the alignment rate is slightly better when using the GTF. The rate of improvement depends on the number of reads as a low read count increases the number of undefined junctions due to the anchor-length setting. As the number of reads increase so does the chance of defining the same junction in de novo mode. Obviously, the read length used also makes a difference, so the longer your read length, the less the GTF annotation improved the percent aligned.

By default I always use the GTF option.
Jon_Keats is offline   Reply With Quote
Old 09-13-2011, 08:14 AM   #12
pageskipro
Junior Member
 
Location: Boulder, Colorado

Join Date: Jan 2011
Posts: 8
Default

What comparison (tools files) are you using to determine better alignment rates? For example do you look at coverageBed output or FPKM, and confidence levels?

Can you discuss a bit on the tools and resources you use to compare your tophat/cufflink results with and with out using a GTF?

Thanks!

Cynthia
pageskipro is offline   Reply With Quote
Old 09-13-2011, 10:00 AM   #13
schaffer
Member
 
Location: San Diego

Join Date: Apr 2009
Posts: 12
Default

Cynthia,
I use HTSeq to count the numbers of reads per gene.

http://www-huber.embl.de/users/ander...c/history.html

Then I see that I am getting more reads per some genes when I use the GTP option.

Lana
schaffer is offline   Reply With Quote
Old 09-13-2011, 10:27 AM   #14
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

In my case just samtools flagstat or picard collect alignment summary metrics

Code:
samtools flagstat MyTophatBam.bam > MyMetrics.txt

or

java -Xmx2g -jar CollectAlignmentSummaryMetrics.jar INPUT=MyTophatBam.bam OUTPUT=MyMetrics.txt VALIDATION_STRINGENCY=SILENT REFERENCE_SEQUENCE=MyGenome.fa ASSUME_SORTED=true IS_BISULFITE_SEQUENCED=false
Jon_Keats is offline   Reply With Quote
Old 10-16-2011, 11:50 PM   #15
songyj
Member
 
Location: china

Join Date: Sep 2011
Posts: 15
Default some confusion

Quote:
Originally Posted by Jon_Keats View Post
The advantage of using the annotation GTF is for the mapping of reads from low expressed transcripts that cross an exon-exon junction. In the de novo mode a junction is only defined if a read flanks on both sides by at least 8 bp. (you can modify this setting:

-a/--min-anchor-length <int> The "anchor length". TopHat will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. This must be at least 3 and the default is 8.

So when you provide the GTF many of the reads in low coverage regions now align even if only 4 bp exist in one exon as the GTF basically says this is a known junction, go ahead and align to it.

In all my comparisons, the alignment rate is slightly better when using the GTF. The rate of improvement depends on the number of reads as a low read count increases the number of undefined junctions due to the anchor-length setting. As the number of reads increase so does the chance of defining the same junction in de novo mode. Obviously, the read length used also makes a difference, so the longer your read length, the less the GTF annotation improved the percent aligned.

By default I always use the GTF option.
sorry i may confused "anchor-length " vs "segmet-length" ..
if i used "--segment-length" 25, means a reads cut into segment at least this length, but "--min-anchor-length " 8 sure smaller this length
is says actually my reads also can cut up into 8bp to supported a junction not must longer my segment-length?

thanks

song
songyj is offline   Reply With Quote
Old 03-08-2012, 07:07 AM   #16
debzootec
Junior Member
 
Location: Brasil

Join Date: Aug 2011
Posts: 1
Default TopHat

Someone has to run the template file TopHat???? Problems: with GTF
debzootec is offline   Reply With Quote
Old 03-20-2012, 06:06 AM   #17
biokari
Junior Member
 
Location: France

Join Date: Feb 2011
Posts: 4
Default

Hi,

I dont have a GTF file, but I have a multifasta file with genes annotation. You know if can i use this file for mapping? how? I have converter this fasta file in GTF format? you know how?

Thanks
biokari is offline   Reply With Quote
Old 08-01-2012, 01:34 AM   #18
dvanic
Member
 
Location: Sydney, Australia

Join Date: Jan 2012
Posts: 61
Default

Hi!
Wanted to bump this thread up a bit:
What happens if you have pseudogenes? Will using tophat with a reference annotation (ex. the Gencode12 Comprehensive, which does not include pseudogenes), bias you against proper mapping of reads to these regions?
dvanic is offline   Reply With Quote
Old 08-07-2012, 02:23 PM   #19
drosoform
Junior Member
 
Location: Oregon

Join Date: Apr 2012
Posts: 6
Default

Quote:
Originally Posted by dvanic View Post
Hi!
Wanted to bump this thread up a bit:
What happens if you have pseudogenes? Will using tophat with a reference annotation (ex. the Gencode12 Comprehensive, which does not include pseudogenes), bias you against proper mapping of reads to these regions?
I am also wondering about this and would be interested in hearing an answer.
drosoform is offline   Reply With Quote
Old 04-05-2013, 07:09 AM   #20
blanco
Member
 
Location: Iceland

Join Date: Apr 2012
Posts: 28
Default

I would also be interested in a clear answer to this question.

My worry is that by supplying tophat with an annotation file if it is then biased toward aligning to genes in the annotation instead of potentially novel genes.
blanco is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:11 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO