SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Map exon position to the Protein position janeela Bioinformatics 0 03-31-2011 07:31 AM
Bioinformatics Analyst position feizj Academic/Non-Profit Jobs 0 03-28-2011 11:53 AM
Bioinfo postdoc position available testhere Academic/Non-Profit Jobs 0 01-05-2011 06:43 PM
Convert chromosomal position to gene sequence position Stephanbio Bioinformatics 5 12-21-2010 07:12 AM
Bionfirmatician position available vhio Academic/Non-Profit Jobs 0 11-21-2010 08:54 AM

Reply
 
Thread Tools
Old 12-16-2010, 11:30 AM   #1
Boel
Member
 
Location: Stockholm, Sweden

Join Date: Oct 2009
Posts: 62
Default Converting DNA position to transcript position

Hi Friends,

I have a simple problem that many of you must have considered before me. I have a DNA position showing variation (~SNP) within an exon of a gene/transcript. Is there already a script out there to convert a DNA position to a "transcript position" given a GTF file? Would be really happy to use that script in that case.

Thanks!
Boel
Boel is offline   Reply With Quote
Old 12-16-2010, 12:52 PM   #2
husamia
Member
 
Location: cinci

Join Date: Apr 2010
Posts: 66
Default

I think you mean you have chromosomal position such as chr1:222222 and dna change A>T and you want to know the coding sequence change with respect to start of a coding sequence like ATG. If this isn't what you want, then give example. If this is what you want, there is problem in that there may be more than one version of the coding sequence called isoform you have to decide which isoform you want thats probably why no tool will do this automatically. I have done it by myself based on data from ensembl definition of exons, i found errors in ucsc browser which is another place you can go. The problem is I want highly accurate manually annotated exons ensembl worked best for me. There are alot of other issues that I won't go into. its not as straght forward as seems to be most people have genes of interest in which case you have to prepare it yourself.
husamia is offline   Reply With Quote
Old 12-16-2010, 01:01 PM   #3
Boel
Member
 
Location: Stockholm, Sweden

Join Date: Oct 2009
Posts: 62
Default

Hi husamia, and thanks for your reply.
No, I am not interested in the coding consequence, just interested in the position in the transcript, in the mRNA sequence.

Like if the DNA pos. is chr1:30000, and this falls within the gene X's first exon, that I want to know the position in the mRNA position (pos 1 if gene X start at pos chr1:30000) . If a gene has several isoforms this will be reflected in my GTF file. A fairly simple mathematical exercise, just very nitty gritty to do, hence just wanted to hear if someone had a simple script. Thanks though.
Boel is offline   Reply With Quote
Old 12-16-2010, 01:46 PM   #4
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

I had to do this exact exercise myself (though going further, to the amino acid as husamia described). I wrote my own script but it is not simple. It makes use of the BioPerl module Bio::Coordinate::GeneMapper which is meant for these types of transformations between coordinate spaces. But to use it everything must be a Bio::SeqFeature object. Since I was working in Arabidopsis I already had a Bio::DB::SeqFeature database of TAIR9 set up (back end for GBrowse). If you are conversant with some serious BioPerl I could offer some guidance.
kmcarr is offline   Reply With Quote
Old 12-16-2010, 01:57 PM   #5
Boel
Member
 
Location: Stockholm, Sweden

Join Date: Oct 2009
Posts: 62
Default

Hi kmcarr,

I'm looking into biopython, and there is some functionality there. Might cross over to BioPerl if I feel the need later on. Thanks a lot.
Boel is offline   Reply With Quote
Old 12-17-2010, 01:53 AM   #6
joa_ds
Member
 
Location: belgium

Join Date: Dec 2008
Posts: 52
Default

drop me an email @ joachim dot deschrijver at ugent dot be

I have such a script ready in Perl that you could use
joa_ds is offline   Reply With Quote
Old 01-05-2011, 01:46 AM   #7
Giulietta
Junior Member
 
Location: UK

Join Date: Nov 2010
Posts: 8
Default

Ensembl's variant effect predictor may be of use, here. If you enter in a genomic position and allele(s) it will let you know the position in the cDNA and the protein (if there is one) and the amino acid change. Have a look at the example:

http://www.ensembl.org/info/website/upload/var.html

It's available online, or through the API:

http://www.ensembl.org/tools.html

Email us at helpdesk@ensembl.org for more help.
Giulietta is offline   Reply With Quote
Old 01-05-2011, 04:58 AM   #8
husamia
Member
 
Location: cinci

Join Date: Apr 2010
Posts: 66
Default

Quote:
Originally Posted by Giulietta View Post
Ensembl's variant effect predictor may be of use, here. If you enter in a genomic position and allele(s) it will let you know the position in the cDNA and the protein (if there is one) and the amino acid change. Have a look at the example:

http://www.ensembl.org/info/website/upload/var.html

It's available online, or through the API:

http://www.ensembl.org/tools.html

Email us at helpdesk@ensembl.org for more help.
The link [http://uswest.ensembl.org/info/website/upload/var.html] gives 404 error but I think the correct link is [http://uswest.ensembl.org/Homo_sapie...oadVariations]
husamia is offline   Reply With Quote
Old 01-05-2011, 06:06 AM   #9
Giulietta
Junior Member
 
Location: UK

Join Date: Nov 2010
Posts: 8
Default

Quote:
Originally Posted by husamia View Post
Sorry about the broken link- we will endeavor to fix it.

The link at www.ensembl.org is working:

http://www.ensembl.org/info/website/upload/var.html

Try to change uswest to www (and go back to the UK site if it redirects you again!) The UploadVariations link you quote is not quite the one I was trying to point you to.

Cheers.
Giulietta is offline   Reply With Quote
Old 10-28-2014, 09:15 AM   #10
amias
Junior Member
 
Location: united states

Join Date: Sep 2014
Posts: 3
Default

Quote:
Originally Posted by Boel View Post
Hi kmcarr,

I'm looking into biopython, and there is some functionality there. Might cross over to BioPerl if I feel the need later on. Thanks a lot.
Hi Boel, could you share the biopython functionality you used for converting the genomic coordinates to transcript coordinates? I have gff file where I would like to convert the genomic coordinates of utr and cds to transcript coordinates, but I am having a hard time finding a script or function that could do this. Thanks!
amias is offline   Reply With Quote
Old 10-29-2014, 07:43 AM   #11
m_two
Member
 
Location: USA

Join Date: Mar 2010
Posts: 50
Default

Ensembl VEP is a best bet for custom annotation (fast, robust, reliable, and easily automated)

http://useast.ensembl.org/info/docs/...ep_custom.html
http://useast.ensembl.org/info/docs/...vep_cache.html
m_two is offline   Reply With Quote
Old 10-29-2014, 11:57 AM   #12
amias
Junior Member
 
Location: united states

Join Date: Sep 2014
Posts: 3
Default

Quote:
Originally Posted by m_two View Post
Ensembl VEP is a best bet for custom annotation (fast, robust, reliable, and easily automated)

http://useast.ensembl.org/info/docs/...ep_custom.html
http://useast.ensembl.org/info/docs/...vep_cache.html
As far as I understand from the documentation, the ensembl vep requires variant information as input. The sites I would like to convert are not SNP positions, but miRNA target sites-- so I could not use vep for that conversion.
amias is offline   Reply With Quote
Old 06-04-2015, 05:55 AM   #13
SrCardgage
my other car is a limozeen
 
Location: New Haven, CT, USA

Join Date: Feb 2012
Posts: 23
Default

You basically need to subtract the position of the transcription start site from the position of the variant. This info is in several places. The source I use is the UCSC Table Browser.

http://genome.ucsc.edu/cgi-bin/hgTables

The values for clade genome asssembly should be obvious.

Group Genes and Gene Predictions
Track RefSeq Genes
table refGene
output format all fields from selected table
output file refGene_human (or whatever your organism is)
file type returned gzip (speeds up download a lot)

Unzip the file and either load it into an SQL table set up with the refGene schema (click the button describe table schema for info) or programmatically search the unzipped text file for your gene to pull its TSS.

If you don't know databases, searching the plain text will be faster in the short run. But, if this is part of a major pipeline you will be running a lot, it would be worthwhile to become comfortable with a relational database system and embedding calls to that database inside your language of choice. That may sound like a major hurdle, but all the info you need is on the web. Message me, if you need help getting started to find the resources to learn this.
SrCardgage is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:00 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO