SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Arabidopsis gtf file with tss and p ids wacguy Bioinformatics 2 07-21-2013 08:08 PM
How to decide whether a hit is unique or not? Jamawoko Bioinformatics 5 10-19-2012 01:32 AM
How to decide on a reference genome User1234567 De novo discovery 1 05-02-2012 01:44 AM
TSS plot. Drop of the signal at TSS. neurongs Epigenetics 7 04-19-2012 07:21 AM
to decide whether a mutation is non-synonimous or not zslee Bioinformatics 8 12-11-2009 04:55 AM

Reply
 
Thread Tools
Old 05-19-2017, 06:09 AM   #1
yu_chem
Member
 
Location: Japan

Join Date: Mar 2015
Posts: 17
Default How to decide the side (5' or 3') close to TSS in GTF

Hi everybody

Id like to ask you about TSS (transcription start site) annotation
Now Im analyzing Total RNA-seq data by ENCODE project.
I will check reported relationship between gene expression and DNA methylation state on upstream of gene.

So I have to know the approximate coordinates of TSS on targeted genes. I dont need exact coordinates by CAGE-seq and so

Until now,
I downloaded the GTF annotation file from ENCODE project (from below URL)
https://www.encodeproject.org/data-s...nce-sequences/
and I checked the contents like below

And then, at present, Im searching How to decide the side (5 or 3), which is close to TSS
chr1 HAVANA gene 11869 (5) 14409 (3)

Is it right that the side close to coordinates of exon_number1 is near TSS?

If annotation data or easy method exists, Would you tell me about it ?

Best regards

##description: evidence-based annotation of the human genome (GRCh38), version 24 (Ensembl 83)
##provider: GENCODE
##contact: [email protected]
##format: gtf
##date: 2015-12-03
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
yu_chem is offline   Reply With Quote
Old 05-19-2017, 06:54 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 662
Default

In your gtf file, compare the exon annotations of genes that are on the + strand with those that are on the - strand. The gene you have shown above is on the + strand.
mastal is offline   Reply With Quote
Old 05-19-2017, 08:42 AM   #3
yu_chem
Member
 
Location: Japan

Join Date: Mar 2015
Posts: 17
Default

Dear mastal

Thank you for your answer and sorry my basic question.
Do you mean that as transcription mechanism (from below site), when strand of certain gene is decided, the side close to TSS is decided with "no exception" ?

That is, If strand of gene is +, 5' side. If strand of gene is -, 3' side

Sorry, I should be more careful

Thank you for quick answer.

https://www.ncbi.nlm.nih.gov/books/N...ort=objectonly
yu_chem is offline   Reply With Quote
Old 05-19-2017, 10:28 AM   #4
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 662
Default

I meant that you should check whether, for genes on the - strand, the exon closest to the rightmost end of the gene is labelled as exon1 or not.

Usually the chromosomal coordinates for a gene are given from left to right, so for genes on the minus strand, the transcript start coordinates are lower than the transcript end coordinates.
mastal is offline   Reply With Quote
Old 05-20-2017, 02:45 AM   #5
yu_chem
Member
 
Location: Japan

Join Date: Mar 2015
Posts: 17
Default

Dear mastal

Thank you for answer.
I checked , for genes on the minus strand, the exon closest to the rightmost end ((B) column in below expamle) of the gene is labelled as exon_number 1 (like below example)
and
the transcript start coordinates (below A column) are lower than the transcript end coordinates (below B column).

That is, This GTF file follows the standard (you said that Usually the chromosomal coordinates for a gene are given from left to right.)

So I should interpret, If strand of gene is +, coordinates of (A) column is close to TSS. If strand of gene is -, coordinates of (B) column is close to TSS.

Is it correct?

Thank you for your help in advance.


chr1 HAVANA gene 800879(column A) 817712(column B) . - . gene_id "ENSG00000230092.7"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; level 2; tag "overlapping_locus"; havana_gene "OTTHUMG00000002403.3";
chr1 HAVANA transcript 800879 817712 . - . gene_id "ENSG00000230092.7"; transcript_id "ENST00000447500.4"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "RP11-206L10.8-002"; level 2; tag "basic"; transcript_support_level "5"; havana_gene "OTTHUMG00000002403.3"; havana_transcript "OTTHUMT00000448550.2";
chr1 HAVANA exon 817373 817712 . - . gene_id "ENSG00000230092.7"; transcript_id "ENST00000447500.4"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "RP11-206L10.8-002"; exon_number 1; exon_id "ENSE00001746491.1"; level 2; tag "basic"; transcript_support_level "5"; havana_gene "OTTHUMG00000002403.3"; havana_transcript "OTTHUMT00000448550.2";
chr1 HAVANA exon 810067 810170 . - . gene_id "ENSG00000230092.7"; transcript_id "ENST00000447500.4"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "RP11-206L10.8"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "RP11-206L10.8-002"; exon_number 2; exon_id "ENSE00001674926.2"; level 2; tag "basic"; transcript_support_level "5"; havana_gene "OTTHUMG00000002403.3"; havana_transcript "OTTHUMT00000448550.2";
yu_chem is offline   Reply With Quote
Old 05-20-2017, 02:56 AM   #6
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 662
Default

Quote:
Originally Posted by yu_chem View Post
So I should interpret, If strand of gene is +, coordinates of (A) column is close to TSS. If strand of gene is -, coordinates of (B) column is close to TSS.

Is it correct?
Yes. that is correct.
mastal is offline   Reply With Quote
Old 05-20-2017, 06:56 AM   #7
yu_chem
Member
 
Location: Japan

Join Date: Mar 2015
Posts: 17
Default

Dear mastal

I really appreciate your sincere response for my question.
yu_chem is offline   Reply With Quote
Reply

Tags
annotation, gtf, method, tss

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO