Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lengths of all genes

    Hey,

    Hopefully this question is appropriate for this thread (Edited)

    1) I need to compute amino acid substitution mutation frequency in a gene and I want to normalize these frequencies across all genes by dividing these frequencies with the respective lengths of the genes.
    E.g. (A1T frequency in TP53 frequency) = (# of samples)/1254
    My question is where do I find the size of all genes?

    2) If I need to compute frequencies of gene mutations in general (containing not just non-synonymous but also insertions and deletion). How would I then normalize the mutation frequency for each gene based on the size? What size would I consider (gene chr position end - gene chr position start?)

    Thanks.
    Last edited by angerusso; 01-02-2012, 05:20 PM.

  • #2
    the question will need more background/context information where ever it ends up.

    starting with basic information such as your data sources ...

    Comment


    • #3
      The data is generate in-house for 60 tumor samples and contains gene info and the nature of mutation (amino acid substitution and insertion-deletion etc.)

      What other background you are looking for? I thought the question is pretty straight forward but please let me know if I can make it more clear so as to have a more productive discussion.

      Comment


      • #4
        The data is generate in-house for 60 tumor samples and contains gene info and the nature of mutation (amino acid substitution and insertion-deletion etc.)

        What other background you are looking for? I thought the question is pretty straight forward but please let me know if I can make it more clear so as to have a more productive discussion.

        Comment


        • #5
          background

          For starters, it would be helpful to answer your question if we knew the following:

          Do you have a particular organism that you are working with?
          Do you have the fasta file of the genes that you want the lengths of?
          Do you want the length of the cDNA or the genes with introns?
          Do you care about splice variants or will the longest splice variant work?

          Comment


          • #6
            Gene sizes can be downloaded for most organisms using the BioMart tool on ensembl, or but not my favorite source UCSC using the table browser. I suspect you are good just normalizing to either the coding region or full mRNA space but I'd vote for coding if the question is about consequential changes that are easy to understand. Take a look at Wood et al. Science 318:1108 (2007) for a nice example of this type of analysis

            Comment


            • #7
              Here are the answers to your questions. Thanks so much again. I will wait for the replies.

              1. Do you have a particular organism that you are working with?
              - Human

              2. Do you have the fasta file of the genes that you want the lengths of?
              - No

              3. Do you want the length of the cDNA or the genes with introns?
              - The data is generated using Illumina (Exome sequencing)
              - So I understand I need the length of all exons in a gene (how would I get this info for a set of 1000 genes?). Am I right?

              4. Do you care about splice variants or will the longest splice variant?
              - well, I don't know the answer. Should I consider the entire exon length as the length of the longest transcript?

              Comment


              • #8
                I would start by looking at the reference that Jon Keats provided as it will likely provide information on how other people have solved your particular problem in the past on Humans and tumor cells. It is always good to be able to reference a paper and say we did it the same as these people.

                Originally posted by angerusso View Post
                Here are the answers to your questions. Thanks so much again. I will wait for the replies.

                1. Do you have a particular organism that you are working with?
                - Human

                2. Do you have the fasta file of the genes that you want the lengths of?
                - No

                3. Do you want the length of the cDNA or the genes with introns?
                - The data is generated using Illumina (Exome sequencing)
                - So I understand I need the length of all exons in a gene (how would I get this info for a set of 1000 genes?). Am I right?

                4. Do you care about splice variants or will the longest splice variant?
                - well, I don't know the answer. Should I consider the entire exon length as the length of the longest transcript?

                Comment


                • #9
                  I read the paper and I didn't see where it's mentioned how they take into account normalizing against the gene size and composition. Did I miss something?

                  Also, I am simply looking for how to download exon lenghts of all genes (based on hugo gene symbol). Thank you for recommending biomart tool. I am not familiar with it at all but will give it a try. If you have done similar query in past before, do you have an example that can help with exon length download uding biomart?

                  Thanks.


                  Originally posted by Jon_Keats View Post
                  Gene sizes can be downloaded for most organisms using the BioMart tool on ensembl, or but not my favorite source UCSC using the table browser. I suspect you are good just normalizing to either the coding region or full mRNA space but I'd vote for coding if the question is about consequential changes that are easy to understand. Take a look at Wood et al. Science 318:1108 (2007) for a nice example of this type of analysis

                  Comment


                  • #10
                    Hi,

                    In addition: I tried to download the following tables but there's no header or README file I could find.

                    ftp://ftp.ensembl.org/pub/current_my...65_37/exon*.gz

                    Would you able to help me this table so i can figure out which column corresponds to exon length?

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X