Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • which is the main transcript

    For a specific gene, how can I know which is the main transcript in the ensembl? there is a list of few transcripts and not always the first one or the longest one is the one which is considered the main one, so which is the definition for the most important transcript?

  • #2
    There isn't necessarily a "most important" transcript. The most relevant one will vary by tissue, developmental stage and treatment condition.

    Comment


    • #3
      transcripts ensembl

      but where can i found this information? in the ensembl I only see the list of the transcripts. However, in articles in the literature usually only one of the transcripts is adressed

      Comment


      • #4
        Depends on the organism you're working on. Some of them have expression databases, others don't you'll have to find that out and check them (if they exist).

        Comment


        • #5
          You could refer back to the transcript that codes for the protein in RefSeq or CCDS (if you want just "one" transcript). If this is a non-human/mouse gene then CCDS won't work.

          Comment


          • #6
            I am working on human, in the gene i am working on there are several transcripts, and there is more than one which is protein coding, but all the work in the litareature relates only to one of them. The difference between the transcripts is one exon.Both transcripts have ccds

            Comment


            • #7
              Can you clarify what exactly you are looking to do with this?

              If only one protein is referred to in literature then perhaps that is the dominant isoform. As Devon mentioned there could be tissue/cell/development specific need for other versions.

              You could look in Illumina Bodymap data (or a more specific place like the TCGA data) to see if there is evidence for presence of specific versions in different tissues/conditions.

              Comment


              • #8
                transcript

                exactly, I think one of the transcripts is the dominant form, I just wonder how one can know which one is the dominant?

                Comment


                • #9
                  This may help.

                  Explore and download data on alternative splicing annotations and principal isoforms with the APPRIS Database, WebServer and WebServices.


                  {APPRIS}
                  Annotating principal splice isoforms

                  Comment


                  • #10
                    The exact term is the "canonical transcript", which is generally the longest transcript.
                    You'll find many posts on this somewhat controversial topic, if you google "canonical transcript".
                    Different databases may also not have the same canonical transcript for a given gene.

                    Here is the definition of the canonical transcript from Ensembl.
                    "For human, the canonical transcript for a gene is set according to the following hierarchy: 1. Longest CCDS translation with no stop codons. 2. If no (1), choose the longest Ensembl/Havana merged translation with no stop codons. 3. If no (2), choose the longest translation with no stop codons. 4. If no translation, choose the longest non-protein-coding transcript."


                    Strangely enough, you can't get the canonical transcript ID through Ensembl's biomaRt.
                    You can get it using the Ensembl Perl (ugh!) API though.


                    UCSC, on the other hand, has a table called knownCanonical, which you can download with the UCSC Table Browser.
                    I generally prefer using Ensembl, but in this case, UCSC is the one that provides the simplest method to get the canonical transcript.
                    Their method for defining the canonical transcript is murky though, since it is not always the longest transcript. There is some human curation involved.

                    Comment


                    • #11
                      @turnersd: Thanks for sharing that site.

                      Looking at example they list on the site there are still 2 principal isoforms listed for this gene (http://appris.bioinfo.cnio.es/#/data...099899?db=hg38) so @litali may be left with the same conundrum

                      Comment


                      • #12
                        @GenoMax

                        The UCSC Genome Group suggests just picking one at random.

                        "Thank you for your question about the knownCanonical table. Unfortunately, the issue of a gene being assigned multiple transcripts is still present in our most recent versions of the knownCanonical table. We are looking at different solutions to this complex problem, and hope to have this resolved in a future version of the UCSC Genes track. For the transcript you mentioned in your email, one of our engineers suggests arbitrarily choosing which of the two transcripts to keep and which to discard.

                        [...]

                        Matthew Speir
                        UCSC Genome Bioinformatics Group"


                        Edit: Just clicked on the link you provided. The two longest transcripts have exactly the same length, so they're obviously both reported as being the canonical, or principal, transcript. So, the simplest method computationally to determine the canonical transcript, is simply to report the longest transcript. If there is a tie in length, simply report the first transcript in numeric order. Biologically, it doesn't make much sense, but it is computationally simple. Given that different databases report different transcripts, even this algorithm with not always return the same canonical transcript for different databases.
                        Last edited by blancha; 10-12-2015, 05:00 AM. Reason: Added information on example given by GenoMax

                        Comment


                        • #13
                          I've posted the APRIS flags for principal isoforms below.
                          Really, I think the algorithm I posted above is the most computationally straightforward manner of identifying the canonical transcript.

                          Explore and download data on alternative splicing annotations and principal isoforms with the APPRIS Database, WebServer and WebServices.


                          Principal Isoform flags

                          APPRIS selects a single CDS variant for each gene as the 'PRINCIPAL' isoform based on the range of protein features. Principal isoforms are tagged with the numbers 1 to 5, with 1 being the most reliable. The definition of the flags are as follows:

                          PRINCIPAL:1

                          Transcript(s) expected to code for the main functional isoform based solely on the core modules in the APPRIS database. The APPRIS core modules map protein structural and functional information and cross-species conservation to the annotated variants.
                          PRINCIPAL:2

                          Where the APPRIS core modules are unable to choose a clear principal variant (approximately 25% of human protein coding genes), the database chooses two or more of the CDS variants as "candidates" to be the principal variant.

                          If one (but no more than one) of these candidates has a distinct CCDS identifier it is selected as the principal variant for that gene. A CCDS identifier shows that there is consensus between RefSeq and GENCODE/Ensembl for that variant, guaranteeing that the variant has cDNA support.
                          PRINCIPAL:3

                          Where the APPRIS core modules are unable to choose a clear principal variant and there more than one of the variants have distinct CCDS identifiers, APPRIS selects the variant with lowest CCDS identifier as the principal variant. The lower the CCDS identifier, the earlier it was annotated.

                          Consensus CDS annotated earlier are likely to have more cDNA evidence. Consecutive CCDS identifiers are not included in this flag, since they will have been annotated in the same release of CCDS. These are distinguished with the next flag.
                          PRINCIPAL:4

                          Where the APPRIS core modules are unable to choose a clear principal CDS and there is more than one variant with a distinct (but consecutive) CCDS identifiers, APPRIS selects the longest CCDS isoform as the principal variant.
                          PRINCIPAL:5

                          Where the APPRIS core modules are unable to choose a clear principal variant and none of the candidate variants are annotated by CCDS, APPRIS selects the longest of the candidate isoforms as the principal variant.

                          Comment


                          • #14
                            We are leaving @litali more or less where (s)he was when this thread was started.

                            Or maybe not. Dare we say that the most "important" transcript is the one most abundant/prevalent.

                            The possibility remains that the longest canonical variant may not be the most prevalent.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            8 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            8 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            66 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X