Seqanswers Leaderboard Ad

**ECO** · 08-03-2008, 06:51 AM

Hey Fabio,

Keep in mind I'm not a programmer, so I'm sure someone else here has a better solution! But it's pretty easy to retrieve gene names (or anything really) using the Table Browser at UCSC combiend with some simple perl. I've used the following subroutine to get info about any gene (from the "knownGene" table) given the chromosome, start and end position. It will undoubtedly need updating as it's a few years old, and could certainly be coded better (it uses LWP::Simple)

Code:

sub knownGene{
    my %knowngene;
    #my $location = "chr" . $7 . ":" . $9 . "-" . $10;
    my ($chr,$start,$end) = @_;
    my $location = "chr" . $chr . ":" . $start . "-" . $end;

    my $p = "http://genome.cse.ucsc.edu/cgi-bin/hgText?";
    my $q = "db=hg16&table=hg16.knownGene&phase=Get+all+fields&position=$location&submit=submit&";

    my $c = get ("$p"."$q");

    my @b = split ('\n',$c);

    foreach my $line (@b) {
         if ( $line =~ /^\#/){
         next;
     }
     if ($line =~ /^(\w+)\s+(\w+)\s+([-+])\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+([\w\,]+)\s+([\w\,]+)/){
         $knowngene{'name'} = $1;
         $knowngene{'chr'} = $2;
         $knowngene{'strand'} = $3;
         $knowngene{'txStart'} = $4;
         $knowngene{'txEnd'} = $5;
         $knowngene{'cdsStart'} = $6;
         $knowngene{'cdsEnd'} = $7;
         $knowngene{'exonCount'} = $8;
         $knowngene{'exonStarts'} = $9;
         $knowngene{'exonEnds'} = $10;         
     }
     
    
     if ($knowngene{'name'}) { 
         return \%knowngene;
     }else{
         return undef;
     }
     }
}

I can't do it right now, but it's pretty easy to adapt this to read in a list of "chr:XXXXXX-YYYYYY" data and output the genes. Hope that helps.

**fabio25** · 08-03-2008, 07:35 AM

hi eco thank you very much for your reply. My problem is that I'm not familiar with Perl scripting, and so I'll start to learn it. Untill now I worked only in R and bioconductor, but unfortunately I didn't find any package to manage properly chip-seq data. Sorry for the stupid question...where do you insert the PERL code???

**kmay** · 08-03-2008, 07:36 AM

Dear Fabio,

this question is more complex that it seems at first glance.
When having large numbers of regions from a NGS experiment a big number of regions won´t fall into annotated regions. Then, is gene name really what you want or is it rather the transcript or exon, or promoter, or UTR, or..., or...
NGS not alway is strand specific, so you need to look at the sense strand and anti-sense strand, both upstream and downstream.

An easy way to get all this annotation for a bed-file is RegionMiner

If you are interested in just the gene names overlaping with your regions, ECO´s script might help

Cheers

Klaus

**fabio25** · 08-03-2008, 08:14 AM

hi Kmay,
thank you very much for your help. I was trying to use the RegionMiner (genomatix), but my bed file (raw data)was to0 big, aroung 60 Mb and the server told me that I cannot up-load it. Then I up-load the .wig file (analyzed by someone other else) in uscs browser and then I downloaded it as bed file, but the table browser didn't insert the data points, only the chromosonal locations. Do you know how I can do?

**kmay** · 08-03-2008, 08:35 AM

Fabio,

before uploading the data, you have to cluster the raw data into regions of significant tag enrichment. Annotating the raw data will most likely give you almost every gene in the genome.
You cannot upload all raw data tags in the on-line version for visualization nor annotation ( as said, the latter seems not very useful to me). For such you would need to have GGA on site.
Our clustering is available only on the GGA.
However, you might give Shirely Liu´s MACS a try and upload the cluster regions thereafter.

Cheers

Klaus

**ECO** · 08-03-2008, 09:43 AM

Originally posted by fabio25 View Post

hi eco thank you very much for your reply. My problem is that I'm not familiar with Perl scripting, and so I'll start to learn it. Untill now I worked only in R and bioconductor, but unfortunately I didn't find any package to manage properly chip-seq data. Sorry for the stupid question...where do you insert the PERL code???

Hey Fabio. Klaus is right, there are more comprehensive solutions out there, but they are costly, and rarely let you do the exact analysis you need.

If you are interested in learning perl (which will undoubtedly help you at some point), there are a ton of great resources out there for learning it free...like here: http://www.perl.com/pub/a/2000/10/begperl1.html

You'll need some sort of interpreter if you're working on windows...ActivePerl is a good place to start. Good luck!

**olus** · 08-25-2008, 04:53 AM

Hi Fabio,
I never used GALAXY for NGS data but you can have a try:

Galaxy

http://main.g2.bx.psu.edu/

Galaxy is a community-driven web-based analysis platform for life science research.

**fabio25** · 08-26-2008, 08:52 AM

by fabio

hi Gbucci
thanks a lot for your advise. I tried one time to work with it but it dxoesn't work so fine with custom track in wiz format. probably it's me and I'll try again. May I ask you what do you use usually?

**kmay** · 08-26-2008, 09:09 AM

fabio,

can I ftp your data? I´ll do a quick run on them and send you the results. Will take about 15 minutes.

if it helps...

Klaus

**dcfargo** · 08-26-2008, 10:45 AM

If your organism is in Ensembl you can use the Biomart tool to extract genes (or other elements) by location.

**fabio25** · 08-26-2008, 04:03 PM

hi dcfargo
i did that, but it's not so precised. I retrieves me even te genes around doing it in R. probably I 'll have to try on the website.

**fabio25** · 08-26-2008, 04:14 PM

hi kmay,
I would like to do that, but the data are not mine and I cannot send them.
However, I'm ostinate to find an open source way how to deal with these data, but if I'm not able I'll work with GGA, how you suggested me before. Thank you very much.

**zee** · 08-26-2008, 08:47 PM

Fabio,

Galaxy and the UCSC tables browser should do exactly what you need. Use some basic logic before trying to do it all in one go. I would:

1) Choose a subset of my query data e.g grep -w "chr1" file.bed > chr1.mydata.bed
2) Go to UCSC tables browser
3) Select the Gene Table
4) Select the Union/Intersection option
5) Intersect the chr1.mydata.bed with the Genes track
6) output the intersection results in comma/tab separated format
7) Import file into MS Excel or some spreadsheet program

If this can work then u just need to generalize it to you whole dataset and not try to do too many steps at once. THis is only one possible solution and there are probably more elegant open source methods.

**olus** · 08-27-2008, 12:29 AM

Originally posted by fabio25 View Post

hi Gbucci
thanks a lot for your advise. I tried one time to work with it but it dxoesn't work so fine with custom track in wiz format. probably it's me and I'll try again. May I ask you what do you use usually?

Hi Fabio,
when I deal with long list of [chr\tstart\tend\tstrand] genomic coordinates I use a perl script pretty like the one ECO suggested you. The script parses your file, reading in the coords and passes them to the UCSC remote database, using a mysql query.
I'm quite sure that does exist a Bioconductor's way of doing it, but I can't tell you more since I never experimented it. You may have a look in the BioC mailing list.

Ask if you need help with perl scripting.

My Best

G

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 37 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

retrieve gene name

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News