Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • yu_chem
    Member
    • Mar 2015
    • 23

    How to decide the side (5' or 3') close to TSS in GTF

    Hi everybody

    I’d like to ask you about TSS (transcription start site) annotation
    Now I’m analyzing Total RNA-seq data by ENCODE project.
    I will check reported relationship between gene expression and DNA methylation state on upstream of gene.

    So I have to know the approximate coordinates of TSS on targeted genes. I don’t need exact coordinates by CAGE-seq and so

    Until now,
    I downloaded the GTF annotation file from ENCODE project (from below URL)

    and I checked the contents like below

    And then, at present, I’m searching How to decide the side (5’ or 3’), which is close to TSS
    chr1 HAVANA gene 11869 (5’) 14409 (3’)

    Is it right that the side close to coordinates of exon_number1 is near TSS?

    If annotation data or easy method exists, Would you tell me about it ?

    Best regards

    ##description: evidence-based annotation of the human genome (GRCh38), version 24 (Ensembl 83)
    ##provider: GENCODE
    ##contact: [email protected]
    ##format: gtf
    ##date: 2015-12-03
    chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
    chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
    chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
    chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
  • mastal
    Senior Member
    • Mar 2009
    • 666

    #2
    In your gtf file, compare the exon annotations of genes that are on the + strand with those that are on the - strand. The gene you have shown above is on the + strand.

    Comment

    • yu_chem
      Member
      • Mar 2015
      • 23

      #3
      Dear mastal

      Thank you for your answer and sorry my basic question.
      Do you mean that as transcription mechanism (from below site), when strand of certain gene is decided, the side close to TSS is decided with "no exception" ?

      That is, If strand of gene is +, 5' side. If strand of gene is -, 3' side

      Sorry, I should be more careful

      Thank you for quick answer.

      Comment

      • mastal
        Senior Member
        • Mar 2009
        • 666

        #4
        I meant that you should check whether, for genes on the - strand, the exon closest to the rightmost end of the gene is labelled as exon1 or not.

        Usually the chromosomal coordinates for a gene are given from left to right, so for genes on the minus strand, the transcript start coordinates are lower than the transcript end coordinates.

        Comment

        • yu_chem
          Member
          • Mar 2015
          • 23

          #5
          Dear mastal

          Thank you for answer.
          I checked , for genes on the minus strand, the exon closest to the rightmost end ((B) column in below expamle) of the gene is labelled as exon_number 1 (like below example)
          and
          the transcript start coordinates (below A column) are lower than the transcript end coordinates (below B column).

          That is, This GTF file follows the standard (you said that Usually the chromosomal coordinates for a gene are given from left to right.)

          So I should interpret, If strand of gene is +, coordinates of (A) column is close to TSS. If strand of gene is -, coordinates of (B) column is close to TSS.

          Is it correct?

          Thank you for your help in advance.


          chr1 HAVANA gene 800879(column A) 817712(column B) . - . gene_id "ENSG00000230092.7"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; level 2; tag "overlapping_locus"; havana_gene "OTTHUMG00000002403.3";
          chr1 HAVANA transcript 800879 817712 . - . gene_id "ENSG00000230092.7"; transcript_id "ENST00000447500.4"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "RP11-206L10.8-002"; level 2; tag "basic"; transcript_support_level "5"; havana_gene "OTTHUMG00000002403.3"; havana_transcript "OTTHUMT00000448550.2";
          chr1 HAVANA exon 817373 817712 . - . gene_id "ENSG00000230092.7"; transcript_id "ENST00000447500.4"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "RP11-206L10.8-002"; exon_number 1; exon_id "ENSE00001746491.1"; level 2; tag "basic"; transcript_support_level "5"; havana_gene "OTTHUMG00000002403.3"; havana_transcript "OTTHUMT00000448550.2";
          chr1 HAVANA exon 810067 810170 . - . gene_id "ENSG00000230092.7"; transcript_id "ENST00000447500.4"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "RP11-206L10.8-002"; exon_number 2; exon_id "ENSE00001674926.2"; level 2; tag "basic"; transcript_support_level "5"; havana_gene "OTTHUMG00000002403.3"; havana_transcript "OTTHUMT00000448550.2";

          Comment

          • mastal
            Senior Member
            • Mar 2009
            • 666

            #6
            Originally posted by yu_chem View Post
            So I should interpret, If strand of gene is +, coordinates of (A) column is close to TSS. If strand of gene is -, coordinates of (B) column is close to TSS.

            Is it correct?
            Yes. that is correct.

            Comment

            • yu_chem
              Member
              • Mar 2015
              • 23

              #7
              Dear mastal

              I really appreciate your sincere response for my question.

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Today, 10:09 AM
              0 responses
              9 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, Yesterday, 08:59 AM
              0 responses
              16 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              24 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              21 views
              0 reactions
              Last Post SEQadmin2  
              Working...