![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Bug? duplicated genes in cufflinks output genes.expr | silin284 | Bioinformatics | 3 | 05-18-2014 12:19 AM |
amplicon read lengths | SeqNerd | Ion Torrent | 2 | 06-08-2011 11:50 PM |
Calculating read lengths - SOLiD | naluru | SOLiD | 1 | 01-26-2011 05:57 AM |
Platform comparison of read lengths | ryantkoehler | General | 0 | 10-05-2009 09:37 AM |
How to visualise alignments with different read lengths? | lindseyjane | Bioinformatics | 5 | 09-17-2009 02:27 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: US Join Date: Oct 2011
Posts: 47
|
![]()
Hey,
Hopefully this question is appropriate for this thread (Edited) 1) I need to compute amino acid substitution mutation frequency in a gene and I want to normalize these frequencies across all genes by dividing these frequencies with the respective lengths of the genes. E.g. (A1T frequency in TP53 frequency) = (# of samples)/1254 My question is where do I find the size of all genes? 2) If I need to compute frequencies of gene mutations in general (containing not just non-synonymous but also insertions and deletion). How would I then normalize the mutation frequency for each gene based on the size? What size would I consider (gene chr position end - gene chr position start?) Thanks. Last edited by angerusso; 01-02-2012 at 05:20 PM. |
![]() |
![]() |
![]() |
#2 |
Member
Location: Oregon Join Date: Feb 2011
Posts: 29
|
![]()
the question will need more background/context information where ever it ends up.
starting with basic information such as your data sources ... |
![]() |
![]() |
![]() |
#3 |
Member
Location: US Join Date: Oct 2011
Posts: 47
|
![]()
The data is generate in-house for 60 tumor samples and contains gene info and the nature of mutation (amino acid substitution and insertion-deletion etc.)
What other background you are looking for? I thought the question is pretty straight forward but please let me know if I can make it more clear so as to have a more productive discussion. |
![]() |
![]() |
![]() |
#4 |
Member
Location: US Join Date: Oct 2011
Posts: 47
|
![]()
The data is generate in-house for 60 tumor samples and contains gene info and the nature of mutation (amino acid substitution and insertion-deletion etc.)
What other background you are looking for? I thought the question is pretty straight forward but please let me know if I can make it more clear so as to have a more productive discussion. |
![]() |
![]() |
![]() |
#5 |
Genome Informatics Facility
Location: Iowa @isugif Join Date: Sep 2009
Posts: 105
|
![]()
For starters, it would be helpful to answer your question if we knew the following:
Do you have a particular organism that you are working with? Do you have the fasta file of the genes that you want the lengths of? Do you want the length of the cDNA or the genes with introns? Do you care about splice variants or will the longest splice variant work? |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: Phoenix, AZ Join Date: Mar 2010
Posts: 279
|
![]()
Gene sizes can be downloaded for most organisms using the BioMart tool on ensembl, or but not my favorite source UCSC using the table browser. I suspect you are good just normalizing to either the coding region or full mRNA space but I'd vote for coding if the question is about consequential changes that are easy to understand. Take a look at Wood et al. Science 318:1108 (2007) for a nice example of this type of analysis
|
![]() |
![]() |
![]() |
#7 |
Member
Location: US Join Date: Oct 2011
Posts: 47
|
![]()
Here are the answers to your questions. Thanks so much again. I will wait for the replies.
1. Do you have a particular organism that you are working with? - Human 2. Do you have the fasta file of the genes that you want the lengths of? - No 3. Do you want the length of the cDNA or the genes with introns? - The data is generated using Illumina (Exome sequencing) - So I understand I need the length of all exons in a gene (how would I get this info for a set of 1000 genes?). Am I right? 4. Do you care about splice variants or will the longest splice variant? - well, I don't know the answer. Should I consider the entire exon length as the length of the longest transcript? |
![]() |
![]() |
![]() |
#8 | |
Genome Informatics Facility
Location: Iowa @isugif Join Date: Sep 2009
Posts: 105
|
![]()
I would start by looking at the reference that Jon Keats provided as it will likely provide information on how other people have solved your particular problem in the past on Humans and tumor cells. It is always good to be able to reference a paper and say we did it the same as these people.
Quote:
|
|
![]() |
![]() |
![]() |
#9 | |
Member
Location: US Join Date: Oct 2011
Posts: 47
|
![]()
I read the paper and I didn't see where it's mentioned how they take into account normalizing against the gene size and composition. Did I miss something?
Also, I am simply looking for how to download exon lenghts of all genes (based on hugo gene symbol). Thank you for recommending biomart tool. I am not familiar with it at all but will give it a try. If you have done similar query in past before, do you have an example that can help with exon length download uding biomart? Thanks. Quote:
|
|
![]() |
![]() |
![]() |
#10 |
Member
Location: US Join Date: Oct 2011
Posts: 47
|
![]()
Hi,
In addition: I tried to download the following tables but there's no header or README file I could find. ftp://ftp.ensembl.org/pub/current_my...65_37/exon*.gz Would you able to help me this table so i can figure out which column corresponds to exon length? |
![]() |
![]() |
![]() |
Thread Tools | |
|
|