SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
finding exon numbers in fasta exon file efoss Bioinformatics 1 10-20-2011 04:57 PM
getting TSS fom list of genes honey Bioinformatics 1 06-07-2011 01:57 AM
List all genes belonging to a certain genomic region Marius Bioinformatics 3 02-18-2011 12:33 AM
Alignment at exon-exon junctions Boel RNA Sequencing 2 12-09-2010 12:12 PM
Finding exon-exon junction vincebrown General 10 11-16-2010 05:08 PM

Reply
 
Thread Tools
Old 05-04-2011, 02:24 PM   #1
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default Go from list of genes to all exon coordinates?

Hey all,

I want to use eArray to create a custom capture set of baits for a few hundred genes. I'm ignorant in non-wetlab stuff, and looking at the website it appears that I cannot just upload a list of genes; rather I have to upload a list of the exon coordinates within the genes that I would like to design baits for. What would be the easiest way for me to go from a list of genes to a list of these exon coordinates? Thanks a lot for any help.
Heisman is offline   Reply With Quote
Old 05-04-2011, 02:39 PM   #2
doc.ramses
Member
 
Location: Planet Earth

Join Date: Jan 2011
Posts: 26
Default

You can use accession numbers instead of gene names separated by a | if I remember correctly.
Getting exon positions out of a list of gene names is e.g. possible in ensembl - BIOMART.
doc.ramses is offline   Reply With Quote
Old 05-04-2011, 04:17 PM   #3
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Quote:
Originally Posted by doc.ramses View Post
You can use accession numbers instead of gene names separated by a | if I remember correctly.
Getting exon positions out of a list of gene names is e.g. possible in ensembl - BIOMART.
Getting accession numbers wouldn't be too bad but would it select for just the exons as opposed to the entire gene? I have a hard time believing there is no fairly easy/straightforward way to do this. Thanks for the tip on ensembl, I will look at that.
Heisman is offline   Reply With Quote
Old 05-05-2011, 01:32 AM   #4
doc.ramses
Member
 
Location: Planet Earth

Join Date: Jan 2011
Posts: 26
Default

Quote:
Originally Posted by Heisman View Post
Getting accession numbers wouldn't be too bad but would it select for just the exons as opposed to the entire gene?
If you use the "exon finder" it will exactly do this. My advice is to ask an Agilent representative to do the design for you as earray is indeed not very handy.
doc.ramses is offline   Reply With Quote
Old 05-05-2011, 06:58 AM   #5
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Quote:
Originally Posted by doc.ramses View Post
If you use the "exon finder" it will exactly do this. My advice is to ask an Agilent representative to do the design for you as earray is indeed not very handy.
Ok, I think I have it figured out, but I'll definitely email them and see if they are willing to design it (we will be placing a big order so hopefully they'll be more amenable) as that would obviously be the easiest. Thanks a lot!
Heisman is offline   Reply With Quote
Old 05-05-2011, 07:24 AM   #6
doc.ramses
Member
 
Location: Planet Earth

Join Date: Jan 2011
Posts: 26
Default

They will definately do. They will also have a more detailed look on GC-content etc.. And if you're placeing a big order - let them do the job for earning the money
doc.ramses is offline   Reply With Quote
Old 05-05-2011, 08:00 AM   #7
adamdeluca
Member
 
Location: Iowa City, IA

Join Date: Jul 2010
Posts: 95
Default

Here is a general procedure you can follow if you want to try it yourself.

1. http://genome.ucsc.edu/cgi-bin/hgTables
2. group - "Gene and Gene Prediction Tracks", track - "UCSC genes", table - knownGene
or use the refGene table if you like refseq genes
3. paste in your list of gene identifiers
4. output as a bed file
5. restrict to just coding exons
6. save the file

7. use bedtools to merge overlapping regions, pad as you feel appropriate etc
8. load the track back into the ucsc genome browser to spot check the regions
9. convert into a format eArray likes
IIRC - chr1:100-1000
conversion program:
Code:
awk '{print $1":"$2+1"-"$3}' myRegions.bed > myRegions.txt
10. upload to agilent
adamdeluca is offline   Reply With Quote
Old 05-05-2011, 09:30 AM   #8
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

adamdeluca, thank you for your post. I'm with you on steps 1-6. I've never used bedtools but I could probably figure it out if necessary. I'm curious as to why one would expect to have overlapping regions? Also, for loading it back into the USCS to spot check it, where exactly would I load it and what would I be checking for? Thanks a lot!
Heisman is offline   Reply With Quote
Old 05-05-2011, 09:49 AM   #9
adamdeluca
Member
 
Location: Iowa City, IA

Join Date: Jul 2010
Posts: 95
Default

Quote:
Originally Posted by Heisman View Post
adamdeluca, thank you for your post. I'm with you on steps 1-6. I've never used bedtools but I could probably figure it out if necessary. I'm curious as to why one would expect to have overlapping regions? Also, for loading it back into the USCS to spot check it, where exactly would I load it and what would I be checking for? Thanks a lot!
Exons will be duplicated for every different splice form of the gene. It has to do with the way UCSC stores data.

To run the bedtools merge:
Code:
mergeBed -i in.bed -d 60 > out.bed
This will combine any features that are <=60bp apart into a single feature.
You can also use slopBed to make the baits overlap a bit into the introns if that is desirable.

To preform the sanity check you want to add a custom track. From the main page, under the "genomes" tab, click the "add custom tracks" button. Just look at a few of the exons you are intending to target, and make sure the design region looks the way you are expecting. You will also want to make sure that all of the genes you really care about are included, they sometimes get missed due to difficulties parsing gene names.
adamdeluca is offline   Reply With Quote
Old 05-05-2011, 11:46 AM   #10
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Ok, excellent. Thanks a bunch!
Heisman is offline   Reply With Quote
Old 05-06-2011, 12:10 AM   #11
steven
Senior Member
 
Location: Southern France

Join Date: Aug 2009
Posts: 269
Default

You can also use Galaxy to do 7. There should be a "send results to galaxy" checkbox in the UCSC interface. Working with command lines tools is more powerful though.
steven is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:27 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO