Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • transcription factor binding sites enrichment analysis

    Hi,

    I need a tool that does tfbs enrichment analysis -

    I have 2 bed files with Enhancers and promoters respectively and I need to see the overlap of tfbs (downloaded from UCSC browser) and then do a Statistical test to see if the overlap is not by Chance. Could you please give me some pointers on it..

    Thanks
    Ashu

  • #2
    Hello,
    Usually what I do to search for transcription factor binding sites enrichment analysis is to use Homer Software


    I would suggest to run independently your 2 bed file and then to compare the output to check whether they do overlap some tfbs,

    Paolo

    Comment


    • #3
      Thanks Paolo, I think I did not put my question in the correct words. What I have to do is:

      1. Check overlap between tfbs bed file and enhancer bed file.
      2. Then check the overlap between the same tfbs bed file and a control region (like the enhancer bed file)
      3. Then I have to see for every tf what is the number of binding sites overlapping with the enhancers and then with the random regions(control). This will tell me if the tfbs overlap with enhancers is more than/less than the random control regions. I also have to find the p-value associated with this comparison.
      4. Repeat the same steps with promoter bed file instead of the enhancers.

      Would homer do something like this?

      Ashu

      Comment


      • #4
        Originally posted by ashuchawla View Post
        1. Check overlap between tfbs bed file and enhancer bed file.
        2. Then check the overlap between the same tfbs bed file and a control region (like the enhancer bed file)
        You should be able to do this using the "intersectBed" from BedTools (http://bedtools.readthedocs.org/en/l...intersect.html).

        Comment


        • #5
          Yes I did use that tool to get the overlaps, but I also need the p-values of these overlaps... bedtools does not do that.

          Comment


          • #6
            ashuchawla sounds like you're talking about doing a Monte Carlo simulation to test difference from a random control. You will most likely have to write that test yourself - it's fairly easy to do in R.

            Comment


            • #7
              I do not understand Monte Carlo Simulation. Could a chi square test be used instead? I hav been told to make 100 sets of control regions. I am totally confused.

              Comment


              • #8
                Do you want to perform a significance test for each enhancer, or on mean number of TFs in all enhancers (or the TFs/base in enhancers)?

                In the first case, you want a Monte Carlo simulation, which is not as complicated as the name suggests. It just means that you generate a set of random controls with the same distribution of sizes as your enhancer regions have. The counts of TFs in the control regions will give you a distribution of the counts of TFs you would expect by chance. From that you can get an empirical p-value for each enhancer region by looking at what proportion of the control distribution is lower than your enhancer count. The resolution of your p-value is 1/(number of controls), so with only 100 controls the lowest p-value you can get is 0.01. 10,000 controls gives you a resolution of 0.0001. Testing for each enhancer separately, you would also want to do a multiple testing correction.

                In the second case you can just generate your control regions - at least as many as you have enhancer regions. Then you perform a simple statistical test for difference of distributions - the two-sample Kolmogorov-Smirnov test would be appropriate for empirical distributions like these. You *could* also do this case using Monte Carlo, but the KS test is easier.

                Comment


                • #9
                  I have to perform a significant test for each transcription factor. A TF will have multiple binding sites (multiple overlaps) say "x" in the enhancer file that I have. The same TF will have say "y" overlaps for a control set of regions(by chance). Now, the odds ratio would be (prob of x/prob of y). I need to perform a test which will tell me if x is significant or by chance and a corresponding p-value.


                  Originally posted by Blahah404 View Post
                  Do you want to perform a significance test for each enhancer, or on mean number of TFs in all enhancers (or the TFs/base in enhancers)?

                  In the first case, you want a Monte Carlo simulation, which is not as complicated as the name suggests. It just means that you generate a set of random controls with the same distribution of sizes as your enhancer regions have. The counts of TFs in the control regions will give you a distribution of the counts of TFs you would expect by chance. From that you can get an empirical p-value for each enhancer region by looking at what proportion of the control distribution is lower than your enhancer count. The resolution of your p-value is 1/(number of controls), so with only 100 controls the lowest p-value you can get is 0.01. 10,000 controls gives you a resolution of 0.0001. Testing for each enhancer separately, you would also want to do a multiple testing correction.

                  In the second case you can just generate your control regions - at least as many as you have enhancer regions. Then you perform a simple statistical test for difference of distributions - the two-sample Kolmogorov-Smirnov test would be appropriate for empirical distributions like these. You *could* also do this case using Monte Carlo, but the KS test is easier.

                  Comment


                  • #10
                    You could try the genomic association tester: http://code.google.com/p/genomic-association-tester/

                    Comment


                    • #11
                      Count the number of sequence reads from each ChIP-seq sample in each of your genome regions (make sure to account for the strand shift by using wide regions or shifting the reads), so you get a regions × samples matrix of read counts. Then you can import this into DESeq to do a variance-stabilized t-test comparing signal in one class with signal in another class, or a GLM for more complicated models.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:37 PM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:07 PM
                      0 responses
                      10 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      51 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      68 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X