SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Lengths of all genes (http://seqanswers.com/forums/showthread.php?t=16561)

angerusso 01-02-2012 02:13 PM

Lengths of all genes
 
Hey,

Hopefully this question is appropriate for this thread (Edited)

1) I need to compute amino acid substitution mutation frequency in a gene and I want to normalize these frequencies across all genes by dividing these frequencies with the respective lengths of the genes.
E.g. (A1T frequency in TP53 frequency) = (# of samples)/1254
My question is where do I find the size of all genes?

2) If I need to compute frequencies of gene mutations in general (containing not just non-synonymous but also insertions and deletion). How would I then normalize the mutation frequency for each gene based on the size? What size would I consider (gene chr position end - gene chr position start?)

Thanks.

tomc 01-02-2012 11:51 PM

the question will need more background/context information where ever it ends up.

starting with basic information such as your data sources ...

angerusso 01-03-2012 07:50 AM

The data is generate in-house for 60 tumor samples and contains gene info and the nature of mutation (amino acid substitution and insertion-deletion etc.)

What other background you are looking for? I thought the question is pretty straight forward but please let me know if I can make it more clear so as to have a more productive discussion.

angerusso 01-03-2012 07:53 AM

The data is generate in-house for 60 tumor samples and contains gene info and the nature of mutation (amino acid substitution and insertion-deletion etc.)

What other background you are looking for? I thought the question is pretty straight forward but please let me know if I can make it more clear so as to have a more productive discussion.

severin 01-03-2012 08:04 AM

background
 
For starters, it would be helpful to answer your question if we knew the following:

Do you have a particular organism that you are working with?
Do you have the fasta file of the genes that you want the lengths of?
Do you want the length of the cDNA or the genes with introns?
Do you care about splice variants or will the longest splice variant work?

Jon_Keats 01-03-2012 08:09 AM

Gene sizes can be downloaded for most organisms using the BioMart tool on ensembl, or but not my favorite source UCSC using the table browser. I suspect you are good just normalizing to either the coding region or full mRNA space but I'd vote for coding if the question is about consequential changes that are easy to understand. Take a look at Wood et al. Science 318:1108 (2007) for a nice example of this type of analysis

angerusso 01-03-2012 09:28 AM

Here are the answers to your questions. Thanks so much again. I will wait for the replies.

1. Do you have a particular organism that you are working with?
- Human

2. Do you have the fasta file of the genes that you want the lengths of?
- No

3. Do you want the length of the cDNA or the genes with introns?
- The data is generated using Illumina (Exome sequencing)
- So I understand I need the length of all exons in a gene (how would I get this info for a set of 1000 genes?). Am I right?

4. Do you care about splice variants or will the longest splice variant?
- well, I don't know the answer. Should I consider the entire exon length as the length of the longest transcript?

severin 01-03-2012 09:46 AM

I would start by looking at the reference that Jon Keats provided as it will likely provide information on how other people have solved your particular problem in the past on Humans and tumor cells. It is always good to be able to reference a paper and say we did it the same as these people.

Quote:

Originally Posted by angerusso (Post 60775)
Here are the answers to your questions. Thanks so much again. I will wait for the replies.

1. Do you have a particular organism that you are working with?
- Human

2. Do you have the fasta file of the genes that you want the lengths of?
- No

3. Do you want the length of the cDNA or the genes with introns?
- The data is generated using Illumina (Exome sequencing)
- So I understand I need the length of all exons in a gene (how would I get this info for a set of 1000 genes?). Am I right?

4. Do you care about splice variants or will the longest splice variant?
- well, I don't know the answer. Should I consider the entire exon length as the length of the longest transcript?


angerusso 01-05-2012 11:48 AM

I read the paper and I didn't see where it's mentioned how they take into account normalizing against the gene size and composition. Did I miss something?

Also, I am simply looking for how to download exon lenghts of all genes (based on hugo gene symbol). Thank you for recommending biomart tool. I am not familiar with it at all but will give it a try. If you have done similar query in past before, do you have an example that can help with exon length download uding biomart?

Thanks.


Quote:

Originally Posted by Jon_Keats (Post 60768)
Gene sizes can be downloaded for most organisms using the BioMart tool on ensembl, or but not my favorite source UCSC using the table browser. I suspect you are good just normalizing to either the coding region or full mRNA space but I'd vote for coding if the question is about consequential changes that are easy to understand. Take a look at Wood et al. Science 318:1108 (2007) for a nice example of this type of analysis


angerusso 01-05-2012 12:30 PM

Hi,

In addition: I tried to download the following tables but there's no header or README file I could find.

ftp://ftp.ensembl.org/pub/current_my...65_37/exon*.gz

Would you able to help me this table so i can figure out which column corresponds to exon length?


All times are GMT -8. The time now is 09:19 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.