![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to run Tophat with annotation file | masylichu | Bioinformatics | 2 | 09-06-2011 08:25 PM |
[tophat-fusion]The priority of annotation is refGene or ensGene? | louis7781x | Bioinformatics | 0 | 08-15-2011 06:21 AM |
tophat/cufflinks for novel genome annotation | darked89 | Bioinformatics | 1 | 11-18-2010 07:53 AM |
GO annotation | Ramet | Bioinformatics | 0 | 09-07-2010 03:14 AM |
Annotation | johnsequence | Bioinformatics | 1 | 02-10-2010 05:02 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Hong Kong Join Date: Oct 2010
Posts: 74
|
![]()
Hi,
I want to know if I (1)run tophat with annotation and use accepted_hits.bam to run cufflink with annotation (2)run tophat without annotation and use accepted_hits.bam to run cufflink with annotation Does any one know what is difference in output result? Thanks Best Regard! |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: Columbia, MO Join Date: Aug 2011
Posts: 7
|
![]()
I am interested in this issue as well. The manual describes that TopHat will use exon records in the annotation file to build a set of known splice junctions for each gene, and will attempt to align reads to these junctions. I have a suspicion that without supplying annotation the software might be more sensitive to finding novel junctions, but that's just a guess. To test this, I have recently run several lanes/samples of data using both criteria with TopHat and the last of the Cufflinks data should be completed by this weekend. Is there anything specific in which you are interested? If so, let me know and I'd gladly look into it.
|
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Columbia, MO Join Date: Aug 2011
Posts: 7
|
![]()
Also, keep me in the loop and let me know if you find anything. Thanks!
Last edited by rhcr56; 08-10-2011 at 08:41 PM. |
![]() |
![]() |
![]() |
#4 |
Member
Location: Beijing Join Date: Jul 2011
Posts: 74
|
![]()
how can you use tophat without annotation? You need reference to map the reads.
|
![]() |
![]() |
![]() |
#5 | |
Member
Location: Hong Kong Join Date: Oct 2010
Posts: 74
|
![]() Quote:
http://bioinformatics.oxfordjournals.org/content/25/9/1105.abstract TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. You can choose with annotation or without annotation while using tophat. Of course it need to reference sequence;Annotation is mean gtf file,not reference seqerence. Last edited by louis7781x; 08-11-2011 at 02:36 AM. |
|
![]() |
![]() |
![]() |
#6 | |
Member
Location: Beijing Join Date: Jul 2011
Posts: 74
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#7 | |
Member
Location: Hong Kong Join Date: Oct 2010
Posts: 74
|
![]() Quote:
The options below allow you validate your own junctions with your RNA-Seq data. Note that the chromosome names in the files provided with the options below must match the names in the Bowtie index. These names are case-senstitive. -G/--GTF <GTF 2.2 file> Supply TopHat with a list of gene model annotations. TopHat will use the exon records in this file to build a set of known splice junctions for each gene, and will attempt to align reads to these junctions even if they would not normally be covered by the initial mapping. -j/--raw-juncs <.juncs file> Supply TopHat with a list of raw junctions. Junctions are specified one per line, in a tab-delimited format. Records look like: <chrom> <left> <right> <+/-> left and right are zero-based coordinates, and specify the last character of the left sequenced to be spliced to the first character of the right sequence, inclusive. That is, the last and the first positions of the flanking exons. Users can convert junctions.bed (one of the TopHat outputs) to this format using bed_to_juncs < junctions.bed > new_list.juncs where bed_to_juncs can be found under the same folder as tophat --no-novel-juncs Only look for reads across junctions indicated in the supplied GFF or junctions file. (ignored without -G/-j) |
|
![]() |
![]() |
![]() |
#8 |
Member
Location: San Diego Join Date: Apr 2009
Posts: 12
|
![]()
I have been using TopHat without a GTF file and counted the mapped reads with htseq. The results looked good. Now I am wondering whether I should be using a GTF file? I will run again using GTF file to look at differences. Anyone else notice which is best?
|
![]() |
![]() |
![]() |
#9 |
Member
Location: usa Join Date: May 2011
Posts: 59
|
![]()
I run tophat without annotation, percentage of alignment is not as good as other alignment, say Elandrna. of course, when we test the alignment coverage, we chose refseq annotation. Anyone knows what's going on?
|
![]() |
![]() |
![]() |
#10 |
Member
Location: San Diego Join Date: Apr 2009
Posts: 12
|
![]()
Hi,
I found a few more reads when designating an annotation file. Lana |
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: Phoenix, AZ Join Date: Mar 2010
Posts: 279
|
![]()
The advantage of using the annotation GTF is for the mapping of reads from low expressed transcripts that cross an exon-exon junction. In the de novo mode a junction is only defined if a read flanks on both sides by at least 8 bp. (you can modify this setting:
-a/--min-anchor-length <int> The "anchor length". TopHat will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. This must be at least 3 and the default is 8. So when you provide the GTF many of the reads in low coverage regions now align even if only 4 bp exist in one exon as the GTF basically says this is a known junction, go ahead and align to it. In all my comparisons, the alignment rate is slightly better when using the GTF. The rate of improvement depends on the number of reads as a low read count increases the number of undefined junctions due to the anchor-length setting. As the number of reads increase so does the chance of defining the same junction in de novo mode. Obviously, the read length used also makes a difference, so the longer your read length, the less the GTF annotation improved the percent aligned. By default I always use the GTF option. |
![]() |
![]() |
![]() |
#12 |
Junior Member
Location: Boulder, Colorado Join Date: Jan 2011
Posts: 8
|
![]()
What comparison (tools files) are you using to determine better alignment rates? For example do you look at coverageBed output or FPKM, and confidence levels?
Can you discuss a bit on the tools and resources you use to compare your tophat/cufflink results with and with out using a GTF? Thanks! Cynthia |
![]() |
![]() |
![]() |
#13 |
Member
Location: San Diego Join Date: Apr 2009
Posts: 12
|
![]()
Cynthia,
I use HTSeq to count the numbers of reads per gene. http://www-huber.embl.de/users/ander...c/history.html Then I see that I am getting more reads per some genes when I use the GTP option. Lana |
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: Phoenix, AZ Join Date: Mar 2010
Posts: 279
|
![]()
In my case just samtools flagstat or picard collect alignment summary metrics
Code:
samtools flagstat MyTophatBam.bam > MyMetrics.txt or java -Xmx2g -jar CollectAlignmentSummaryMetrics.jar INPUT=MyTophatBam.bam OUTPUT=MyMetrics.txt VALIDATION_STRINGENCY=SILENT REFERENCE_SEQUENCE=MyGenome.fa ASSUME_SORTED=true IS_BISULFITE_SEQUENCED=false |
![]() |
![]() |
![]() |
#15 | |
Member
Location: china Join Date: Sep 2011
Posts: 15
|
![]() ![]() Quote:
if i used "--segment-length" 25, means a reads cut into segment at least this length, but "--min-anchor-length " 8 sure smaller this length is says actually my reads also can cut up into 8bp to supported a junction not must longer my segment-length? thanks song |
|
![]() |
![]() |
![]() |
#16 |
Junior Member
Location: Brasil Join Date: Aug 2011
Posts: 1
|
![]()
Someone has to run the template file TopHat???? Problems: with GTF
![]() |
![]() |
![]() |
![]() |
#17 |
Junior Member
Location: France Join Date: Feb 2011
Posts: 4
|
![]()
Hi,
I dont have a GTF file, but I have a multifasta file with genes annotation. You know if can i use this file for mapping? how? I have converter this fasta file in GTF format? you know how? Thanks |
![]() |
![]() |
![]() |
#18 |
Member
Location: Sydney, Australia Join Date: Jan 2012
Posts: 61
|
![]()
Hi!
Wanted to bump this thread up a bit: What happens if you have pseudogenes? Will using tophat with a reference annotation (ex. the Gencode12 Comprehensive, which does not include pseudogenes), bias you against proper mapping of reads to these regions? |
![]() |
![]() |
![]() |
#19 |
Junior Member
Location: Oregon Join Date: Apr 2012
Posts: 6
|
![]()
I am also wondering about this and would be interested in hearing an answer.
|
![]() |
![]() |
![]() |
#20 |
Member
Location: Iceland Join Date: Apr 2012
Posts: 28
|
![]()
I would also be interested in a clear answer to this question.
My worry is that by supplying tophat with an annotation file if it is then biased toward aligning to genes in the annotation instead of potentially novel genes. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|