Seqanswers Leaderboard Ad

**pmiguel** · 07-07-2011, 03:14 AM

You can find it here:

http://www.ensembl.org/info/data/ftp/index.html

--
Phillip

**HSV-1** · 10-18-2012, 06:39 AM

the same is much bigger than the one from ucsc, why?

**mbblack** · 10-19-2012, 06:32 AM

Originally posted by HSV-1 View Post

the same is much bigger than the one from ucsc, why?

How did you get your one from UCSC? If you make a RefGene based GTF from TableBrowser, it only includes coding features. The pre-built GTF from Ensembl includes all coding and non-coding features. Plus the actual annotations are longer text strings (all the Ensembl accessions for gene ID, exon ID, transcript ID, name, biotype,...) so in raw text the Ensembl file will be larger.

Also note that the UCSC file uses the notation "chr1", etc while the fist column in the Ensembl will just be "1" etc (some software will expect the prefix "chr").

**HSV-1** · 10-19-2012, 05:48 PM

This is probably the reason.
How to fix?
From the same sequence data with ensemble gft I should get more accepted hits by tophat .

Originally posted by mbblack View Post

How did you get your one from UCSC? If you make a RefGene based GTF from TableBrowser, it only includes coding features. The pre-built GTF from Ensembl includes all coding and non-coding features. Plus the actual annotations are longer text strings (all the Ensembl accessions for gene ID, exon ID, transcript ID, name, biotype,...) so in raw text the Ensembl file will be larger.

Also note that the UCSC file uses the notation "chr1", etc while the fist column in the Ensembl will just be "1" etc (some software will expect the prefix "chr").

**mbblack** · 10-22-2012, 06:17 AM

Originally posted by HSV-1 View Post

From the same sequence data with ensemble gft I should get more accepted hits by tophat .

No, not for a reasonably mature genome such as the Rat. Ensembl's build may include a handful of novel and/or predicted coding genes, but not many. Ensembl Rat rel. 66.34 had 22,938 coding genes, 22,921 of which were known and have Refseq annotation (I only know this as I'm writing up data that used 66.34 as the reference - you would have to look on Ensembl's web site for the stats for the current release).

The annotation really should not have any significant affect on your summarized mapping results for a mature feature set like the Rat - it would only matter if there were a large number of novel, unknown or predicted genes in one annotation versus another, or if the splice boundaries of the annotation features were still largely undetermined. But once summarized by gene, your mapped count data should be unaffected given the genome build is fairly well characterized and stable at this point.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

GFT file for rat

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News