Hi everybody
I’d like to ask you about TSS (transcription start site) annotation
Now I’m analyzing Total RNA-seq data by ENCODE project.
I will check reported relationship between gene expression and DNA methylation state on upstream of gene.
So I have to know the approximate coordinates of TSS on targeted genes. I don’t need exact coordinates by CAGE-seq and so
Until now,
I downloaded the GTF annotation file from ENCODE project (from below URL)
and I checked the contents like below
And then, at present, I’m searching How to decide the side (5’ or 3’), which is close to TSS
chr1 HAVANA gene 11869 (5’) 14409 (3’)
Is it right that the side close to coordinates of exon_number1 is near TSS?
If annotation data or easy method exists, Would you tell me about it ?
Best regards
##description: evidence-based annotation of the human genome (GRCh38), version 24 (Ensembl 83)
##provider: GENCODE
##contact: [email protected]
##format: gtf
##date: 2015-12-03
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
I’d like to ask you about TSS (transcription start site) annotation
Now I’m analyzing Total RNA-seq data by ENCODE project.
I will check reported relationship between gene expression and DNA methylation state on upstream of gene.
So I have to know the approximate coordinates of TSS on targeted genes. I don’t need exact coordinates by CAGE-seq and so
Until now,
I downloaded the GTF annotation file from ENCODE project (from below URL)
and I checked the contents like below
And then, at present, I’m searching How to decide the side (5’ or 3’), which is close to TSS
chr1 HAVANA gene 11869 (5’) 14409 (3’)
Is it right that the side close to coordinates of exon_number1 is near TSS?
If annotation data or easy method exists, Would you tell me about it ?
Best regards
##description: evidence-based annotation of the human genome (GRCh38), version 24 (Ensembl 83)
##provider: GENCODE
##contact: [email protected]
##format: gtf
##date: 2015-12-03
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
Comment