Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to decide the side (5' or 3') close to TSS in GTF

    Hi everybody

    I’d like to ask you about TSS (transcription start site) annotation
    Now I’m analyzing Total RNA-seq data by ENCODE project.
    I will check reported relationship between gene expression and DNA methylation state on upstream of gene.

    So I have to know the approximate coordinates of TSS on targeted genes. I don’t need exact coordinates by CAGE-seq and so

    Until now,
    I downloaded the GTF annotation file from ENCODE project (from below URL)

    and I checked the contents like below

    And then, at present, I’m searching How to decide the side (5’ or 3’), which is close to TSS
    chr1 HAVANA gene 11869 (5’) 14409 (3’)

    Is it right that the side close to coordinates of exon_number1 is near TSS?

    If annotation data or easy method exists, Would you tell me about it ?

    Best regards

    ##description: evidence-based annotation of the human genome (GRCh38), version 24 (Ensembl 83)
    ##provider: GENCODE
    ##contact: [email protected]
    ##format: gtf
    ##date: 2015-12-03
    chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
    chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
    chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
    chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";

  • #2
    In your gtf file, compare the exon annotations of genes that are on the + strand with those that are on the - strand. The gene you have shown above is on the + strand.

    Comment


    • #3
      Dear mastal

      Thank you for your answer and sorry my basic question.
      Do you mean that as transcription mechanism (from below site), when strand of certain gene is decided, the side close to TSS is decided with "no exception" ?

      That is, If strand of gene is +, 5' side. If strand of gene is -, 3' side

      Sorry, I should be more careful

      Thank you for quick answer.

      Comment


      • #4
        I meant that you should check whether, for genes on the - strand, the exon closest to the rightmost end of the gene is labelled as exon1 or not.

        Usually the chromosomal coordinates for a gene are given from left to right, so for genes on the minus strand, the transcript start coordinates are lower than the transcript end coordinates.

        Comment


        • #5
          Dear mastal

          Thank you for answer.
          I checked , for genes on the minus strand, the exon closest to the rightmost end ((B) column in below expamle) of the gene is labelled as exon_number 1 (like below example)
          and
          the transcript start coordinates (below A column) are lower than the transcript end coordinates (below B column).

          That is, This GTF file follows the standard (you said that Usually the chromosomal coordinates for a gene are given from left to right.)

          So I should interpret, If strand of gene is +, coordinates of (A) column is close to TSS. If strand of gene is -, coordinates of (B) column is close to TSS.

          Is it correct?

          Thank you for your help in advance.


          chr1 HAVANA gene 800879(column A) 817712(column B) . - . gene_id "ENSG00000230092.7"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; level 2; tag "overlapping_locus"; havana_gene "OTTHUMG00000002403.3";
          chr1 HAVANA transcript 800879 817712 . - . gene_id "ENSG00000230092.7"; transcript_id "ENST00000447500.4"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "RP11-206L10.8-002"; level 2; tag "basic"; transcript_support_level "5"; havana_gene "OTTHUMG00000002403.3"; havana_transcript "OTTHUMT00000448550.2";
          chr1 HAVANA exon 817373 817712 . - . gene_id "ENSG00000230092.7"; transcript_id "ENST00000447500.4"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "RP11-206L10.8-002"; exon_number 1; exon_id "ENSE00001746491.1"; level 2; tag "basic"; transcript_support_level "5"; havana_gene "OTTHUMG00000002403.3"; havana_transcript "OTTHUMT00000448550.2";
          chr1 HAVANA exon 810067 810170 . - . gene_id "ENSG00000230092.7"; transcript_id "ENST00000447500.4"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "RP11-206L10.8-002"; exon_number 2; exon_id "ENSE00001674926.2"; level 2; tag "basic"; transcript_support_level "5"; havana_gene "OTTHUMG00000002403.3"; havana_transcript "OTTHUMT00000448550.2";

          Comment


          • #6
            Originally posted by yu_chem View Post
            So I should interpret, If strand of gene is +, coordinates of (A) column is close to TSS. If strand of gene is -, coordinates of (B) column is close to TSS.

            Is it correct?
            Yes. that is correct.

            Comment


            • #7
              Dear mastal

              I really appreciate your sincere response for my question.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X