SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bug? duplicated genes in cufflinks output genes.expr silin284 Bioinformatics 3 05-18-2014 12:19 AM
amplicon read lengths SeqNerd Ion Torrent 2 06-08-2011 11:50 PM
Calculating read lengths - SOLiD naluru SOLiD 1 01-26-2011 05:57 AM
Platform comparison of read lengths ryantkoehler General 0 10-05-2009 09:37 AM
How to visualise alignments with different read lengths? lindseyjane Bioinformatics 5 09-17-2009 02:27 AM

Reply
 
Thread Tools
Old 01-02-2012, 02:13 PM   #1
angerusso
Member
 
Location: US

Join Date: Oct 2011
Posts: 47
Question Lengths of all genes

Hey,

Hopefully this question is appropriate for this thread (Edited)

1) I need to compute amino acid substitution mutation frequency in a gene and I want to normalize these frequencies across all genes by dividing these frequencies with the respective lengths of the genes.
E.g. (A1T frequency in TP53 frequency) = (# of samples)/1254
My question is where do I find the size of all genes?

2) If I need to compute frequencies of gene mutations in general (containing not just non-synonymous but also insertions and deletion). How would I then normalize the mutation frequency for each gene based on the size? What size would I consider (gene chr position end - gene chr position start?)

Thanks.

Last edited by angerusso; 01-02-2012 at 05:20 PM.
angerusso is offline   Reply With Quote
Old 01-02-2012, 11:51 PM   #2
tomc
Member
 
Location: Oregon

Join Date: Feb 2011
Posts: 29
Default

the question will need more background/context information where ever it ends up.

starting with basic information such as your data sources ...
tomc is offline   Reply With Quote
Old 01-03-2012, 07:50 AM   #3
angerusso
Member
 
Location: US

Join Date: Oct 2011
Posts: 47
Default

The data is generate in-house for 60 tumor samples and contains gene info and the nature of mutation (amino acid substitution and insertion-deletion etc.)

What other background you are looking for? I thought the question is pretty straight forward but please let me know if I can make it more clear so as to have a more productive discussion.
angerusso is offline   Reply With Quote
Old 01-03-2012, 07:53 AM   #4
angerusso
Member
 
Location: US

Join Date: Oct 2011
Posts: 47
Default

The data is generate in-house for 60 tumor samples and contains gene info and the nature of mutation (amino acid substitution and insertion-deletion etc.)

What other background you are looking for? I thought the question is pretty straight forward but please let me know if I can make it more clear so as to have a more productive discussion.
angerusso is offline   Reply With Quote
Old 01-03-2012, 08:04 AM   #5
severin
Genome Informatics Facility
 
Location: Iowa @isugif

Join Date: Sep 2009
Posts: 105
Default background

For starters, it would be helpful to answer your question if we knew the following:

Do you have a particular organism that you are working with?
Do you have the fasta file of the genes that you want the lengths of?
Do you want the length of the cDNA or the genes with introns?
Do you care about splice variants or will the longest splice variant work?
severin is offline   Reply With Quote
Old 01-03-2012, 08:09 AM   #6
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

Gene sizes can be downloaded for most organisms using the BioMart tool on ensembl, or but not my favorite source UCSC using the table browser. I suspect you are good just normalizing to either the coding region or full mRNA space but I'd vote for coding if the question is about consequential changes that are easy to understand. Take a look at Wood et al. Science 318:1108 (2007) for a nice example of this type of analysis
Jon_Keats is offline   Reply With Quote
Old 01-03-2012, 09:28 AM   #7
angerusso
Member
 
Location: US

Join Date: Oct 2011
Posts: 47
Default

Here are the answers to your questions. Thanks so much again. I will wait for the replies.

1. Do you have a particular organism that you are working with?
- Human

2. Do you have the fasta file of the genes that you want the lengths of?
- No

3. Do you want the length of the cDNA or the genes with introns?
- The data is generated using Illumina (Exome sequencing)
- So I understand I need the length of all exons in a gene (how would I get this info for a set of 1000 genes?). Am I right?

4. Do you care about splice variants or will the longest splice variant?
- well, I don't know the answer. Should I consider the entire exon length as the length of the longest transcript?
angerusso is offline   Reply With Quote
Old 01-03-2012, 09:46 AM   #8
severin
Genome Informatics Facility
 
Location: Iowa @isugif

Join Date: Sep 2009
Posts: 105
Default

I would start by looking at the reference that Jon Keats provided as it will likely provide information on how other people have solved your particular problem in the past on Humans and tumor cells. It is always good to be able to reference a paper and say we did it the same as these people.

Quote:
Originally Posted by angerusso View Post
Here are the answers to your questions. Thanks so much again. I will wait for the replies.

1. Do you have a particular organism that you are working with?
- Human

2. Do you have the fasta file of the genes that you want the lengths of?
- No

3. Do you want the length of the cDNA or the genes with introns?
- The data is generated using Illumina (Exome sequencing)
- So I understand I need the length of all exons in a gene (how would I get this info for a set of 1000 genes?). Am I right?

4. Do you care about splice variants or will the longest splice variant?
- well, I don't know the answer. Should I consider the entire exon length as the length of the longest transcript?
severin is offline   Reply With Quote
Old 01-05-2012, 11:48 AM   #9
angerusso
Member
 
Location: US

Join Date: Oct 2011
Posts: 47
Default

I read the paper and I didn't see where it's mentioned how they take into account normalizing against the gene size and composition. Did I miss something?

Also, I am simply looking for how to download exon lenghts of all genes (based on hugo gene symbol). Thank you for recommending biomart tool. I am not familiar with it at all but will give it a try. If you have done similar query in past before, do you have an example that can help with exon length download uding biomart?

Thanks.


Quote:
Originally Posted by Jon_Keats View Post
Gene sizes can be downloaded for most organisms using the BioMart tool on ensembl, or but not my favorite source UCSC using the table browser. I suspect you are good just normalizing to either the coding region or full mRNA space but I'd vote for coding if the question is about consequential changes that are easy to understand. Take a look at Wood et al. Science 318:1108 (2007) for a nice example of this type of analysis
angerusso is offline   Reply With Quote
Old 01-05-2012, 12:30 PM   #10
angerusso
Member
 
Location: US

Join Date: Oct 2011
Posts: 47
Default

Hi,

In addition: I tried to download the following tables but there's no header or README file I could find.

ftp://ftp.ensembl.org/pub/current_my...65_37/exon*.gz

Would you able to help me this table so i can figure out which column corresponds to exon length?
angerusso is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:31 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO