Seqanswers Leaderboard Ad

**rhcr56** · 08-10-2011, 07:35 PM

Annotation

I am interested in this issue as well. The manual describes that TopHat will use exon records in the annotation file to build a set of known splice junctions for each gene, and will attempt to align reads to these junctions. I have a suspicion that without supplying annotation the software might be more sensitive to finding novel junctions, but that's just a guess. To test this, I have recently run several lanes/samples of data using both criteria with TopHat and the last of the Cufflinks data should be completed by this weekend. Is there anything specific in which you are interested? If so, let me know and I'd gladly look into it.

**rhcr56** · 08-10-2011, 07:36 PM

Annotation

Also, keep me in the loop and let me know if you find anything. Thanks!

**chenyao** · 08-11-2011, 01:21 AM

how can you use tophat without annotation? You need reference to map the reads.

**louis7781x** · 08-11-2011, 01:26 AM

Originally posted by chenyao View Post

how can you use tophat without annotation? You need reference to map the reads.

You can see paper or ask them why that can detect splicing junction without annotation.

http://bioinformatics.oxfordjournals.org/content/25/9/1105.abstract

TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.

You can choose with annotation or without annotation while using tophat.

Of course it need to reference sequence;Annotation is mean gtf file,not reference seqerence.

**chenyao** · 08-11-2011, 01:34 AM

Originally posted by louis7781x View Post

You can see paper or ask them why that can detect splicing junction without annotation.

http://bioinformatics.oxfordjournals.org/content/25/9/1105.abstract

TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.

You can choose with annotation or without annotation while using tophat.

Of course it need to reference sequence;Annotation is mean gtf file,not reference seqerence.

But I don't see any command of tophat which provide an option to supply the annotation file.

**louis7781x** · 08-11-2011, 01:35 AM

Originally posted by chenyao View Post

But I don't see any command of tophat which provide an option to supply the annotation file.

Supplying your own junctions:

The options below allow you validate your own junctions with your RNA-Seq data. Note that the chromosome names in the files provided with the options below must match the names in the Bowtie index. These names are case-senstitive.

-G/--GTF <GTF 2.2 file>

Supply TopHat with a list of gene model annotations. TopHat will use the exon records in this file to build a set of known splice junctions for each gene, and will attempt to align reads to these junctions even if they would not normally be covered by the initial mapping.
-j/--raw-juncs <.juncs file>

Supply TopHat with a list of raw junctions. Junctions are specified one per line, in a tab-delimited format. Records look like:
<chrom> <left> <right> <+/->

left and right are zero-based coordinates, and specify the last character of the left sequenced to be spliced to the first character of the right sequence, inclusive. That is, the last and the first positions of the flanking exons. Users can convert junctions.bed (one of the TopHat outputs) to this format using bed_to_juncs < junctions.bed > new_list.juncs where bed_to_juncs can be found under the same folder as tophat
--no-novel-juncs Only look for reads across junctions indicated in the supplied GFF or junctions file. (ignored without -G/-j)

**schaffer** · 08-19-2011, 12:06 PM

TopHat with or without GTF file?

I have been using TopHat without a GTF file and counted the mapped reads with htseq. The results looked good. Now I am wondering whether I should be using a GTF file? I will run again using GTF file to look at differences. Anyone else notice which is best?

**emilyjia2000** · 08-19-2011, 01:35 PM

I run tophat without annotation, percentage of alignment is not as good as other alignment, say Elandrna. of course, when we test the alignment coverage, we chose refseq annotation. Anyone knows what's going on?

**schaffer** · 08-25-2011, 09:03 AM

a few more reads with annotion file

Hi,
I found a few more reads when designating an annotation file.
Lana

**Jon_Keats** · 08-31-2011, 03:32 PM

The advantage of using the annotation GTF is for the mapping of reads from low expressed transcripts that cross an exon-exon junction. In the de novo mode a junction is only defined if a read flanks on both sides by at least 8 bp. (you can modify this setting:

-a/--min-anchor-length <int> The "anchor length". TopHat will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. This must be at least 3 and the default is 8.

So when you provide the GTF many of the reads in low coverage regions now align even if only 4 bp exist in one exon as the GTF basically says this is a known junction, go ahead and align to it.

In all my comparisons, the alignment rate is slightly better when using the GTF. The rate of improvement depends on the number of reads as a low read count increases the number of undefined junctions due to the anchor-length setting. As the number of reads increase so does the chance of defining the same junction in de novo mode. Obviously, the read length used also makes a difference, so the longer your read length, the less the GTF annotation improved the percent aligned.

By default I always use the GTF option.

**pageskipro** · 09-13-2011, 08:14 AM

What comparison (tools files) are you using to determine better alignment rates? For example do you look at coverageBed output or FPKM, and confidence levels?

Can you discuss a bit on the tools and resources you use to compare your tophat/cufflink results with and with out using a GTF?

Thanks!

Cynthia

**schaffer** · 09-13-2011, 10:00 AM

Cynthia,
I use HTSeq to count the numbers of reads per gene.

http://www-huber.embl.de/users/anders/HTSeq/doc/history.html

Then I see that I am getting more reads per some genes when I use the GTP option.

Lana

**Jon_Keats** · 09-13-2011, 10:27 AM

In my case just samtools flagstat or picard collect alignment summary metrics

Code:

samtools flagstat MyTophatBam.bam > MyMetrics.txt

or

java -Xmx2g -jar CollectAlignmentSummaryMetrics.jar INPUT=MyTophatBam.bam OUTPUT=MyMetrics.txt VALIDATION_STRINGENCY=SILENT REFERENCE_SEQUENCE=MyGenome.fa ASSUME_SORTED=true IS_BISULFITE_SEQUENCED=false

**songyj** · 10-16-2011, 11:50 PM

some confusion

Originally posted by Jon_Keats View Post

The advantage of using the annotation GTF is for the mapping of reads from low expressed transcripts that cross an exon-exon junction. In the de novo mode a junction is only defined if a read flanks on both sides by at least 8 bp. (you can modify this setting:

-a/--min-anchor-length <int> The "anchor length". TopHat will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. This must be at least 3 and the default is 8.

So when you provide the GTF many of the reads in low coverage regions now align even if only 4 bp exist in one exon as the GTF basically says this is a known junction, go ahead and align to it.

In all my comparisons, the alignment rate is slightly better when using the GTF. The rate of improvement depends on the number of reads as a low read count increases the number of undefined junctions due to the anchor-length setting. As the number of reads increase so does the chance of defining the same junction in de novo mode. Obviously, the read length used also makes a difference, so the longer your read length, the less the GTF annotation improved the percent aligned.

By default I always use the GTF option.

sorry i may confused "anchor-length " vs "segmet-length" ..
if i used "--segment-length" 25, means a reads cut into segment at least this length, but "--min-anchor-length " 8 sure smaller this length
is says actually my reads also can cut up into 8bp to supported a junction not must longer my segment-length?

thanks

song

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 26 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 28 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

tophat with/without annotation,and cufflink with annotation?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News