Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • yu_chem
    Member
    • Mar 2015
    • 23

    How to decide the side (5' or 3') close to TSS in GTF

    Hi everybody

    I’d like to ask you about TSS (transcription start site) annotation
    Now I’m analyzing Total RNA-seq data by ENCODE project.
    I will check reported relationship between gene expression and DNA methylation state on upstream of gene.

    So I have to know the approximate coordinates of TSS on targeted genes. I don’t need exact coordinates by CAGE-seq and so

    Until now,
    I downloaded the GTF annotation file from ENCODE project (from below URL)

    and I checked the contents like below

    And then, at present, I’m searching How to decide the side (5’ or 3’), which is close to TSS
    chr1 HAVANA gene 11869 (5’) 14409 (3’)

    Is it right that the side close to coordinates of exon_number1 is near TSS?

    If annotation data or easy method exists, Would you tell me about it ?

    Best regards

    ##description: evidence-based annotation of the human genome (GRCh38), version 24 (Ensembl 83)
    ##provider: GENCODE
    ##contact: [email protected]
    ##format: gtf
    ##date: 2015-12-03
    chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
    chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
    chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
    chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
  • mastal
    Senior Member
    • Mar 2009
    • 666

    #2
    In your gtf file, compare the exon annotations of genes that are on the + strand with those that are on the - strand. The gene you have shown above is on the + strand.

    Comment

    • yu_chem
      Member
      • Mar 2015
      • 23

      #3
      Dear mastal

      Thank you for your answer and sorry my basic question.
      Do you mean that as transcription mechanism (from below site), when strand of certain gene is decided, the side close to TSS is decided with "no exception" ?

      That is, If strand of gene is +, 5' side. If strand of gene is -, 3' side

      Sorry, I should be more careful

      Thank you for quick answer.

      Comment

      • mastal
        Senior Member
        • Mar 2009
        • 666

        #4
        I meant that you should check whether, for genes on the - strand, the exon closest to the rightmost end of the gene is labelled as exon1 or not.

        Usually the chromosomal coordinates for a gene are given from left to right, so for genes on the minus strand, the transcript start coordinates are lower than the transcript end coordinates.

        Comment

        • yu_chem
          Member
          • Mar 2015
          • 23

          #5
          Dear mastal

          Thank you for answer.
          I checked , for genes on the minus strand, the exon closest to the rightmost end ((B) column in below expamle) of the gene is labelled as exon_number 1 (like below example)
          and
          the transcript start coordinates (below A column) are lower than the transcript end coordinates (below B column).

          That is, This GTF file follows the standard (you said that Usually the chromosomal coordinates for a gene are given from left to right.)

          So I should interpret, If strand of gene is +, coordinates of (A) column is close to TSS. If strand of gene is -, coordinates of (B) column is close to TSS.

          Is it correct?

          Thank you for your help in advance.


          chr1 HAVANA gene 800879(column A) 817712(column B) . - . gene_id "ENSG00000230092.7"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; level 2; tag "overlapping_locus"; havana_gene "OTTHUMG00000002403.3";
          chr1 HAVANA transcript 800879 817712 . - . gene_id "ENSG00000230092.7"; transcript_id "ENST00000447500.4"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "RP11-206L10.8-002"; level 2; tag "basic"; transcript_support_level "5"; havana_gene "OTTHUMG00000002403.3"; havana_transcript "OTTHUMT00000448550.2";
          chr1 HAVANA exon 817373 817712 . - . gene_id "ENSG00000230092.7"; transcript_id "ENST00000447500.4"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "RP11-206L10.8-002"; exon_number 1; exon_id "ENSE00001746491.1"; level 2; tag "basic"; transcript_support_level "5"; havana_gene "OTTHUMG00000002403.3"; havana_transcript "OTTHUMT00000448550.2";
          chr1 HAVANA exon 810067 810170 . - . gene_id "ENSG00000230092.7"; transcript_id "ENST00000447500.4"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "RP11-206L10.8-002"; exon_number 2; exon_id "ENSE00001674926.2"; level 2; tag "basic"; transcript_support_level "5"; havana_gene "OTTHUMG00000002403.3"; havana_transcript "OTTHUMT00000448550.2";

          Comment

          • mastal
            Senior Member
            • Mar 2009
            • 666

            #6
            Originally posted by yu_chem View Post
            So I should interpret, If strand of gene is +, coordinates of (A) column is close to TSS. If strand of gene is -, coordinates of (B) column is close to TSS.

            Is it correct?
            Yes. that is correct.

            Comment

            • yu_chem
              Member
              • Mar 2015
              • 23

              #7
              Dear mastal

              I really appreciate your sincere response for my question.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                Yesterday, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM
              • SEQadmin2
                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                by SEQadmin2

                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                05-06-2026, 09:04 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 12:03 PM
              0 responses
              17 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, Yesterday, 11:40 AM
              0 responses
              13 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-28-2026, 11:40 AM
              0 responses
              29 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-26-2026, 10:12 AM
              0 responses
              31 views
              0 reactions
              Last Post SEQadmin2  
              Working...