SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
extract gene names from enscafg id madsaan Bioinformatics 1 08-07-2011 12:13 PM
converting UCSC gene names to Hugo Symbol names efoss Bioinformatics 2 07-16-2011 12:41 PM
getting genomic coordinates from gene accesion information mathew Bioinformatics 11 03-18-2011 11:37 AM
From Affy probe sets/gene symbols to genomic coordinates? ETHANol Epigenetics 7 10-25-2010 02:13 AM
inconsistent gene names in genes.expr - Cufflinks Boel Bioinformatics 2 04-14-2010 05:16 AM

Reply
 
Thread Tools
Old 04-22-2009, 02:37 PM   #1
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Default Genomic coordinates to gene names

Hi All,

Trying to get a list of gene names (preferably HUGO names) for 90,000 genomic co-ordinates (BED file). Very confused with Biomarts API. Ensembl's interface is taking hours. Spent hours on UCSC and cant see any option to retrieve this information. Any help on any other method to achieve this appreciated

L
Layla is offline   Reply With Quote
Old 04-22-2009, 03:25 PM   #2
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Not sure if they are HUGO names, but seems like the refFlat table in the Table Browser will get you there.

Click the "define regions", paste in your BED file, and get output.

edit:...looks like it's limited to 1k entries....
ECO is offline   Reply With Quote
Old 04-22-2009, 05:02 PM   #3
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

You should be able to get the entire refFlat file from UCSC's table browser. That file will include the RefSeq IDs, start and end positions of the gene, and the gene name.

I think their gene name is the HUGO name.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 04-22-2009, 07:33 PM   #4
quinlana
Senior Member
 
Location: Charlottesville

Join Date: Sep 2008
Posts: 119
Default BED Tools for comparing genomic intervals

Hi,
I recently completed a new suite of BED Tools for addressing such questions.

They are available for 64-bit LINUX and Intel Macs at:
http://people.virginia.edu/~arq5x/bedtools.html

Specifically, in the case of your question, you would download RefSeq (not sure if they are HUGO names) from the UCSC Table browser.

Then run intersectBed -a <yourfile> -b refSeqFromUCSC.bed -wb

The -wb option will write the entire RefSeq entry so that you can track the name associated with each overlap.

If you have further question, just shout. Nicely.
quinlana is offline   Reply With Quote
Old 04-23-2009, 02:13 AM   #5
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Default

Thankx guys, the reflat file is useful, which I was not aware of.

Thanx ECO, but yes its limited to 1000 co-ordinates. Not the best way for 90,000 coordinates

Quinlana, I downloaded BED tools and ran from the bin folder, but I got an error message
./intersectBed -a mygenomiccoordinates.bed -b genome_ucsc.bed -wb
ERROR:
bash: ./intersectBed: Bad CPU type in executable

L
Layla is offline   Reply With Quote
Old 04-23-2009, 03:05 AM   #6
quinlana
Senior Member
 
Location: Charlottesville

Join Date: Sep 2008
Posts: 119
Default OS Type?

Hi Layla,
Apologies for that. What OS and processor are you using? The Linux version should work on 64-bit Red Hat and Ubuntu. Regardless, I'll post the source later today so you can compile the programs on your system. Sorry for the trouble, I just finished testing all of these tools yesterday and they work on all of our systems. However, I haven't been diligent about trying them out for every Linux flavor.

Best,Aaron
quinlana is offline   Reply With Quote
Old 04-23-2009, 04:25 AM   #7
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Default

Hi Aaron,

No worries, Thankyou for the help!

My machine is a Mac OS X Version: 10.4.11
Processor: 2.4GHz intel core 2 duo

Cheers!
L
Layla is offline   Reply With Quote
Old 04-23-2009, 04:43 AM   #8
quinlana
Senior Member
 
Location: Charlottesville

Join Date: Sep 2008
Posts: 119
Default

Gotcha. I believe the Core Duo processors are 32-bit. Email me at aaronquinlan [at] gmail and I'll send you a pre-compiled version for your machine.
quinlana is offline   Reply With Quote
Old 04-17-2014, 02:23 PM   #9
nturaga
Junior Member
 
Location: baltimore

Join Date: Apr 2014
Posts: 1
Default

Hi

I am still having problems with using the refFlat file and bed tools. I downloaded the refFlat.txt file for hg18. First, this file is not in the BED format. Is there a command line tool which just lets me add the gene symbol to my input file, which is in the format of "chr","start","end", so BED format. If this question is redundant, please excuse me, and point me to the right page so I can follow some instructions step wise and annotate my BED file, with gene symbols.

Thanks
nturaga is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:12 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO