Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Proper statistical test for high-throughput data correlations?

    Dear Forum Members,
    I have been analyzing correlations between ENCODE data and my own data. Specifically I have been looking at overlapping (or intersecting) coordinates between data sets to assess colocalization. Now, I would like to add some statistics to the analysis. The data is basically a number of colocalized features to a number of colocalized features generated by random iterations.

    So e.g. I could have something like, 900 colocalize, 300 do not from my data, and then by random iterations 200 localize while 700 dont.

    900 200

    300 700

    Is it strong enough to apply Fisher's exact test or should I opt for something different. I have approx. 60 of such four-value tables for which I need to determine statistical significance.

    I appreciate any comments on this

  • #2
    Originally posted by puggie View Post
    Dear Forum Members,
    I have been analyzing correlations between ENCODE data and my own data. Specifically I have been looking at overlapping (or intersecting) coordinates between data sets to assess colocalization. Now, I would like to add some statistics to the analysis. The data is basically a number of colocalized features to a number of colocalized features generated by random iterations.

    So e.g. I could have something like, 900 colocalize, 300 do not from my data, and then by random iterations 200 localize while 700 dont.

    900 200

    300 700

    Is it strong enough to apply Fisher's exact test or should I opt for something different. I have approx. 60 of such four-value tables for which I need to determine statistical significance.

    I appreciate any comments on this
    Hi- Rather than applying the Fisher test, I would perform a (large) number of simulations to produce a distribution of the number of features that colocalize by chance. Then I would see where my observed proportion maps on this null distribution. If it maps towards the tails, than my observation is not due to chance.
    I think the tricky problem of this approach is to produce realistic null distributions given the genome under study.

    I think this paper and the associated R package GenometriCorr might be useful http://www.ncbi.nlm.nih.gov/pubmed/22693437.

    Just some thoughts...
    Dario

    Comment


    • #3
      thanks for your reply I will look into that. I should also mention that the table I showed above is errornous.

      It should be like
      900 200
      300 1000

      So e.g. I have 1200 features, 900 hundred colocalize while 300 dont. Then by 100 random iterations (computer picking random features) I get 200 (averaged) colocalize by chance while 1000 dont.

      The numbers are just examples, it was just to show that features for the random simulations are of same size as original data.

      Comment


      • #4
        Originally posted by dariober View Post
        Hi- Rather than applying the Fisher test, I would perform a (large) number of simulations to produce a distribution of the number of features that colocalize by chance. Then I would see where my observed proportion maps on this null distribution. If it maps towards the tails, than my observation is not due to chance.
        I think the tricky problem of this approach is to produce realistic null distributions given the genome under study.

        I think this paper and the associated R package GenometriCorr might be useful http://www.ncbi.nlm.nih.gov/pubmed/22693437.

        Just some thoughts...
        Dario
        That is invalid, since you have to pick a distribution to sample from "randomly" IE uniform, normal, poisson etc.

        Comment


        • #5
          Originally posted by rskr View Post
          That is invalid, since you have to pick a distribution to sample from "randomly" IE uniform, normal, poisson etc.
          Hi- I agree, when I said that one has to pick a realistic null distribution I referred to this problem. At most you can say that the observed data doesn't come from a random uniform, normal or whatever distribution. Does it make sense? (I'd like to hear more opinions about the question puggie posted)

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          68 views
          0 likes
          Last Post seqadmin  
          Working...
          X