Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to decide the side (5' or 3') close to TSS in GTF

    Hi everybody

    I’d like to ask you about TSS (transcription start site) annotation
    Now I’m analyzing Total RNA-seq data by ENCODE project.
    I will check reported relationship between gene expression and DNA methylation state on upstream of gene.

    So I have to know the approximate coordinates of TSS on targeted genes. I don’t need exact coordinates by CAGE-seq and so

    Until now,
    I downloaded the GTF annotation file from ENCODE project (from below URL)

    and I checked the contents like below

    And then, at present, I’m searching How to decide the side (5’ or 3’), which is close to TSS
    chr1 HAVANA gene 11869 (5’) 14409 (3’)

    Is it right that the side close to coordinates of exon_number1 is near TSS?

    If annotation data or easy method exists, Would you tell me about it ?

    Best regards

    ##description: evidence-based annotation of the human genome (GRCh38), version 24 (Ensembl 83)
    ##provider: GENCODE
    ##contact: [email protected]
    ##format: gtf
    ##date: 2015-12-03
    chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
    chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
    chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
    chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";

  • #2
    In your gtf file, compare the exon annotations of genes that are on the + strand with those that are on the - strand. The gene you have shown above is on the + strand.

    Comment


    • #3
      Dear mastal

      Thank you for your answer and sorry my basic question.
      Do you mean that as transcription mechanism (from below site), when strand of certain gene is decided, the side close to TSS is decided with "no exception" ?

      That is, If strand of gene is +, 5' side. If strand of gene is -, 3' side

      Sorry, I should be more careful

      Thank you for quick answer.

      Comment


      • #4
        I meant that you should check whether, for genes on the - strand, the exon closest to the rightmost end of the gene is labelled as exon1 or not.

        Usually the chromosomal coordinates for a gene are given from left to right, so for genes on the minus strand, the transcript start coordinates are lower than the transcript end coordinates.

        Comment


        • #5
          Dear mastal

          Thank you for answer.
          I checked , for genes on the minus strand, the exon closest to the rightmost end ((B) column in below expamle) of the gene is labelled as exon_number 1 (like below example)
          and
          the transcript start coordinates (below A column) are lower than the transcript end coordinates (below B column).

          That is, This GTF file follows the standard (you said that Usually the chromosomal coordinates for a gene are given from left to right.)

          So I should interpret, If strand of gene is +, coordinates of (A) column is close to TSS. If strand of gene is -, coordinates of (B) column is close to TSS.

          Is it correct?

          Thank you for your help in advance.


          chr1 HAVANA gene 800879(column A) 817712(column B) . - . gene_id "ENSG00000230092.7"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; level 2; tag "overlapping_locus"; havana_gene "OTTHUMG00000002403.3";
          chr1 HAVANA transcript 800879 817712 . - . gene_id "ENSG00000230092.7"; transcript_id "ENST00000447500.4"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "RP11-206L10.8-002"; level 2; tag "basic"; transcript_support_level "5"; havana_gene "OTTHUMG00000002403.3"; havana_transcript "OTTHUMT00000448550.2";
          chr1 HAVANA exon 817373 817712 . - . gene_id "ENSG00000230092.7"; transcript_id "ENST00000447500.4"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "RP11-206L10.8-002"; exon_number 1; exon_id "ENSE00001746491.1"; level 2; tag "basic"; transcript_support_level "5"; havana_gene "OTTHUMG00000002403.3"; havana_transcript "OTTHUMT00000448550.2";
          chr1 HAVANA exon 810067 810170 . - . gene_id "ENSG00000230092.7"; transcript_id "ENST00000447500.4"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "RP11-206L10.8-002"; exon_number 2; exon_id "ENSE00001674926.2"; level 2; tag "basic"; transcript_support_level "5"; havana_gene "OTTHUMG00000002403.3"; havana_transcript "OTTHUMT00000448550.2";

          Comment


          • #6
            Originally posted by yu_chem View Post
            So I should interpret, If strand of gene is +, coordinates of (A) column is close to TSS. If strand of gene is -, coordinates of (B) column is close to TSS.

            Is it correct?
            Yes. that is correct.

            Comment


            • #7
              Dear mastal

              I really appreciate your sincere response for my question.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                Yesterday, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 06:57 AM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 07:17 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-02-2024, 08:06 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-30-2024, 12:17 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Working...
              X