Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GTF difference in CDS and exon start position

    Hi all,

    I have a question about the annotation of features in some GTF files. So basically, I've seen that quite often the CDS and exon features assigned to a certain exon have different start positions. So for example, within a gene transcript (importantly the same transcript) there can be a number of features annotated to exon 3. This may include CDS and exon, however I've noticed that these may differ in start position. As this is not the first or last exon, I guess it is not the 3 or 5 prime UTR, so what is it?

    I know CDS is the coding sequence, and exon is the exon but if part of the exon does to code, why is it included as opposed to having both the exon and CDS at the same position. Other than a coding region/feature.

    Also, in the below example of what I mean, why is the start codon in exon 3? Why would this not be the first exon of that transcript if coding only starts there? Why are the previous exons annotated?

    Any help on this would be great as I'm trying to write a script to pull out the exome using a GTF.

    Code:
    chr3    protein_coding  exon    195880  195990  .       +       .        gene_id "ENSBTAG00000000584"; transcript_id "ENSBTAT00000056645"; exon_number "1"; gene_biotype "protein_coding";
    chr3    protein_coding  exon    202306  202479  .       +       .        gene_id "ENSBTAG00000000584"; transcript_id "ENSBTAT00000056645"; exon_number "2"; gene_biotype "protein_coding";
    chr3    protein_coding  exon    204057  204213  .       +       .        gene_id "ENSBTAG00000000584"; transcript_id "ENSBTAT00000056645"; exon_number "3"; gene_biotype "protein_coding";
    chr3    protein_coding  CDS     204069  204213  .       +       0        gene_id "ENSBTAG00000000584"; transcript_id "ENSBTAT00000056645"; exon_number "3"; gene_biotype "protein_coding"; protein_id "ENSBTAP00000051775";
    chr3    protein_coding  start_codon     204069  204071  .       +       0        gene_id "ENSBTAG00000000584"; transcript_id "ENSBTAT00000056645"; exon_number "3"; gene_biotype "protein_coding";
    chr3    protein_coding  exon    206914  208046  .       +       .        gene_id "ENSBTAG00000000584"; transcript_id "ENSBTAT00000056645"; exon_number "4"; gene_biotype "protein_coding";
    chr3    protein_coding  CDS     206914  208046  .       +       2        gene_id "ENSBTAG00000000584"; transcript_id "ENSBTAT00000056645"; exon_number "4"; gene_biotype "protein_coding"; protein_id "ENSBTAP00000051775";
    chr3    protein_coding  exon    208701  208733  .       +       .        gene_id "ENSBTAG00000000584"; transcript_id "ENSBTAT00000056645"; exon_number "5"; gene_biotype "protein_coding";
    chr3    protein_coding  CDS     208701  208733  .       +       0        gene_id "ENSBTAG00000000584"; transcript_id "ENSBTAT00000056645"; exon_number "5"; gene_biotype "protein_coding"; protein_id "ENSBTAP00000051775";
    regards,
    Anthony

  • #2
    Hi Anthony,

    According to the annotation your transcript has 2 introns in its 5'UTR and the start codon in the third exon. This means that the 5'UTR includes exon1, exon2 and part of the exon3 (from 204057 to 204068). The coding sequence starts in the third exon, at position 204069.

    Also, in the below example of what I mean, why is the start codon in exon 3?
    Because that is where translation starts in this transcript (according to the annotation).

    Why would this not be the first exon of that transcript if coding only starts there?
    Because the exonic structure is usually identified from the genomic alignment of the transcript sequence, not from the protein sequence. Exons are numbered according to their positions within the mRNA molecule. "Exon" refers to transcription and "CDS" to translation. These are two different biological mechanisms, there is no reason why translation should occur within the first exon (remember that exons are concatenated in a processed mRNA). Things can be a bit more complicated when it comes to introns within the 3'UTR because of the nonsense mediated decay but that is another story.

    Why are the previous exons annotated?
    Why not? An exon is defined as being (part of) a transcribed molecule, this does not rely on any protein-coding property. Moreover, non-coding RNA is important. After all, keep in mind that >95% of the RNA material in a human cell does not code for any protein...

    Hope it helps,
    s.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Advancing Precision Medicine for Rare Diseases in Children
      by seqadmin




      Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
      12-16-2024, 07:57 AM
    • seqadmin
      Recent Advances in Sequencing Technologies
      by seqadmin



      Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

      Long-Read Sequencing
      Long-read sequencing has seen remarkable advancements,...
      12-02-2024, 01:49 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 12-17-2024, 10:28 AM
    0 responses
    28 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 12-13-2024, 08:24 AM
    0 responses
    43 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 12-12-2024, 07:41 AM
    0 responses
    29 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 12-11-2024, 07:45 AM
    0 responses
    42 views
    0 likes
    Last Post seqadmin  
    Working...
    X