Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • angerusso
    Member
    • Oct 2011
    • 47

    Lengths of all genes

    Hey,

    Hopefully this question is appropriate for this thread (Edited)

    1) I need to compute amino acid substitution mutation frequency in a gene and I want to normalize these frequencies across all genes by dividing these frequencies with the respective lengths of the genes.
    E.g. (A1T frequency in TP53 frequency) = (# of samples)/1254
    My question is where do I find the size of all genes?

    2) If I need to compute frequencies of gene mutations in general (containing not just non-synonymous but also insertions and deletion). How would I then normalize the mutation frequency for each gene based on the size? What size would I consider (gene chr position end - gene chr position start?)

    Thanks.
    Last edited by angerusso; 01-02-2012, 05:20 PM.
  • tomc
    Member
    • Feb 2011
    • 29

    #2
    the question will need more background/context information where ever it ends up.

    starting with basic information such as your data sources ...

    Comment

    • angerusso
      Member
      • Oct 2011
      • 47

      #3
      The data is generate in-house for 60 tumor samples and contains gene info and the nature of mutation (amino acid substitution and insertion-deletion etc.)

      What other background you are looking for? I thought the question is pretty straight forward but please let me know if I can make it more clear so as to have a more productive discussion.

      Comment

      • angerusso
        Member
        • Oct 2011
        • 47

        #4
        The data is generate in-house for 60 tumor samples and contains gene info and the nature of mutation (amino acid substitution and insertion-deletion etc.)

        What other background you are looking for? I thought the question is pretty straight forward but please let me know if I can make it more clear so as to have a more productive discussion.

        Comment

        • severin
          Genome Informatics Facility
          • Sep 2009
          • 105

          #5
          background

          For starters, it would be helpful to answer your question if we knew the following:

          Do you have a particular organism that you are working with?
          Do you have the fasta file of the genes that you want the lengths of?
          Do you want the length of the cDNA or the genes with introns?
          Do you care about splice variants or will the longest splice variant work?

          Comment

          • Jon_Keats
            Senior Member
            • Mar 2010
            • 279

            #6
            Gene sizes can be downloaded for most organisms using the BioMart tool on ensembl, or but not my favorite source UCSC using the table browser. I suspect you are good just normalizing to either the coding region or full mRNA space but I'd vote for coding if the question is about consequential changes that are easy to understand. Take a look at Wood et al. Science 318:1108 (2007) for a nice example of this type of analysis

            Comment

            • angerusso
              Member
              • Oct 2011
              • 47

              #7
              Here are the answers to your questions. Thanks so much again. I will wait for the replies.

              1. Do you have a particular organism that you are working with?
              - Human

              2. Do you have the fasta file of the genes that you want the lengths of?
              - No

              3. Do you want the length of the cDNA or the genes with introns?
              - The data is generated using Illumina (Exome sequencing)
              - So I understand I need the length of all exons in a gene (how would I get this info for a set of 1000 genes?). Am I right?

              4. Do you care about splice variants or will the longest splice variant?
              - well, I don't know the answer. Should I consider the entire exon length as the length of the longest transcript?

              Comment

              • severin
                Genome Informatics Facility
                • Sep 2009
                • 105

                #8
                I would start by looking at the reference that Jon Keats provided as it will likely provide information on how other people have solved your particular problem in the past on Humans and tumor cells. It is always good to be able to reference a paper and say we did it the same as these people.

                Originally posted by angerusso View Post
                Here are the answers to your questions. Thanks so much again. I will wait for the replies.

                1. Do you have a particular organism that you are working with?
                - Human

                2. Do you have the fasta file of the genes that you want the lengths of?
                - No

                3. Do you want the length of the cDNA or the genes with introns?
                - The data is generated using Illumina (Exome sequencing)
                - So I understand I need the length of all exons in a gene (how would I get this info for a set of 1000 genes?). Am I right?

                4. Do you care about splice variants or will the longest splice variant?
                - well, I don't know the answer. Should I consider the entire exon length as the length of the longest transcript?

                Comment

                • angerusso
                  Member
                  • Oct 2011
                  • 47

                  #9
                  I read the paper and I didn't see where it's mentioned how they take into account normalizing against the gene size and composition. Did I miss something?

                  Also, I am simply looking for how to download exon lenghts of all genes (based on hugo gene symbol). Thank you for recommending biomart tool. I am not familiar with it at all but will give it a try. If you have done similar query in past before, do you have an example that can help with exon length download uding biomart?

                  Thanks.


                  Originally posted by Jon_Keats View Post
                  Gene sizes can be downloaded for most organisms using the BioMart tool on ensembl, or but not my favorite source UCSC using the table browser. I suspect you are good just normalizing to either the coding region or full mRNA space but I'd vote for coding if the question is about consequential changes that are easy to understand. Take a look at Wood et al. Science 318:1108 (2007) for a nice example of this type of analysis

                  Comment

                  • angerusso
                    Member
                    • Oct 2011
                    • 47

                    #10
                    Hi,

                    In addition: I tried to download the following tables but there's no header or README file I could find.

                    ftp://ftp.ensembl.org/pub/current_my...65_37/exon*.gz

                    Would you able to help me this table so i can figure out which column corresponds to exon length?

                    Comment

                    Latest Articles

                    Collapse

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, Today, 06:09 AM
                    0 responses
                    15 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    34 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    39 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-04-2026, 08:59 AM
                    0 responses
                    46 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...