SEQanswers

Go Back   SEQanswers > Applications Forums > Epigenetics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BAMBI inference of transcription factor binding sites jmwhitha Bioinformatics 0 08-30-2013 07:34 AM
Find binding sites for transcription factor in a sequence turnersd Bioinformatics 3 07-16-2013 10:19 PM
PubMed: Genome-wide analysis of transcription factor binding sites based on ChIP-Seq Newsbot! Literature Watch 1 01-27-2009 04:26 AM
PubMed: Genome-wide analysis of transcription factor binding sites based on ChIP-Seq Newsbot! Literature Watch 0 08-20-2008 05:00 AM

Reply
 
Thread Tools
Old 09-10-2013, 09:54 AM   #1
ashuchawla
Member
 
Location: san diego

Join Date: Jan 2012
Posts: 38
Default transcription factor binding sites enrichment analysis

Hi,

I need a tool that does tfbs enrichment analysis -

I have 2 bed files with Enhancers and promoters respectively and I need to see the overlap of tfbs (downloaded from UCSC browser) and then do a Statistical test to see if the overlap is not by Chance. Could you please give me some pointers on it..

Thanks
Ashu
ashuchawla is offline   Reply With Quote
Old 09-18-2013, 12:26 AM   #2
paolo.kunder
Member
 
Location: Milano, Italy

Join Date: Aug 2011
Posts: 92
Default

Hello,
Usually what I do to search for transcription factor binding sites enrichment analysis is to use Homer Software
http://biowhat.ucsd.edu/homer/ngs/peakMotifs.html

I would suggest to run independently your 2 bed file and then to compare the output to check whether they do overlap some tfbs,

Paolo
paolo.kunder is offline   Reply With Quote
Old 09-18-2013, 08:09 AM   #3
ashuchawla
Member
 
Location: san diego

Join Date: Jan 2012
Posts: 38
Default

Thanks Paolo, I think I did not put my question in the correct words. What I have to do is:

1. Check overlap between tfbs bed file and enhancer bed file.
2. Then check the overlap between the same tfbs bed file and a control region (like the enhancer bed file)
3. Then I have to see for every tf what is the number of binding sites overlapping with the enhancers and then with the random regions(control). This will tell me if the tfbs overlap with enhancers is more than/less than the random control regions. I also have to find the p-value associated with this comparison.
4. Repeat the same steps with promoter bed file instead of the enhancers.

Would homer do something like this?

Ashu
ashuchawla is offline   Reply With Quote
Old 09-18-2013, 08:23 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,976
Default

Quote:
Originally Posted by ashuchawla View Post
1. Check overlap between tfbs bed file and enhancer bed file.
2. Then check the overlap between the same tfbs bed file and a control region (like the enhancer bed file)
You should be able to do this using the "intersectBed" from BedTools (http://bedtools.readthedocs.org/en/l...intersect.html).
GenoMax is offline   Reply With Quote
Old 09-18-2013, 08:39 AM   #5
ashuchawla
Member
 
Location: san diego

Join Date: Jan 2012
Posts: 38
Default

Yes I did use that tool to get the overlaps, but I also need the p-values of these overlaps... bedtools does not do that.
ashuchawla is offline   Reply With Quote
Old 09-18-2013, 09:04 AM   #6
Blahah404
Member
 
Location: Cambridge, UK

Join Date: Dec 2011
Posts: 48
Default

ashuchawla sounds like you're talking about doing a Monte Carlo simulation to test difference from a random control. You will most likely have to write that test yourself - it's fairly easy to do in R.
Blahah404 is offline   Reply With Quote
Old 09-18-2013, 10:06 AM   #7
ashuchawla
Member
 
Location: san diego

Join Date: Jan 2012
Posts: 38
Default

I do not understand Monte Carlo Simulation. Could a chi square test be used instead? I hav been told to make 100 sets of control regions. I am totally confused.
ashuchawla is offline   Reply With Quote
Old 09-18-2013, 11:03 AM   #8
Blahah404
Member
 
Location: Cambridge, UK

Join Date: Dec 2011
Posts: 48
Default

Do you want to perform a significance test for each enhancer, or on mean number of TFs in all enhancers (or the TFs/base in enhancers)?

In the first case, you want a Monte Carlo simulation, which is not as complicated as the name suggests. It just means that you generate a set of random controls with the same distribution of sizes as your enhancer regions have. The counts of TFs in the control regions will give you a distribution of the counts of TFs you would expect by chance. From that you can get an empirical p-value for each enhancer region by looking at what proportion of the control distribution is lower than your enhancer count. The resolution of your p-value is 1/(number of controls), so with only 100 controls the lowest p-value you can get is 0.01. 10,000 controls gives you a resolution of 0.0001. Testing for each enhancer separately, you would also want to do a multiple testing correction.

In the second case you can just generate your control regions - at least as many as you have enhancer regions. Then you perform a simple statistical test for difference of distributions - the two-sample Kolmogorov-Smirnov test would be appropriate for empirical distributions like these. You *could* also do this case using Monte Carlo, but the KS test is easier.
Blahah404 is offline   Reply With Quote
Old 09-18-2013, 11:13 AM   #9
ashuchawla
Member
 
Location: san diego

Join Date: Jan 2012
Posts: 38
Default

I have to perform a significant test for each transcription factor. A TF will have multiple binding sites (multiple overlaps) say "x" in the enhancer file that I have. The same TF will have say "y" overlaps for a control set of regions(by chance). Now, the odds ratio would be (prob of x/prob of y). I need to perform a test which will tell me if x is significant or by chance and a corresponding p-value.


Quote:
Originally Posted by Blahah404 View Post
Do you want to perform a significance test for each enhancer, or on mean number of TFs in all enhancers (or the TFs/base in enhancers)?

In the first case, you want a Monte Carlo simulation, which is not as complicated as the name suggests. It just means that you generate a set of random controls with the same distribution of sizes as your enhancer regions have. The counts of TFs in the control regions will give you a distribution of the counts of TFs you would expect by chance. From that you can get an empirical p-value for each enhancer region by looking at what proportion of the control distribution is lower than your enhancer count. The resolution of your p-value is 1/(number of controls), so with only 100 controls the lowest p-value you can get is 0.01. 10,000 controls gives you a resolution of 0.0001. Testing for each enhancer separately, you would also want to do a multiple testing correction.

In the second case you can just generate your control regions - at least as many as you have enhancer regions. Then you perform a simple statistical test for difference of distributions - the two-sample Kolmogorov-Smirnov test would be appropriate for empirical distributions like these. You *could* also do this case using Monte Carlo, but the KS test is easier.
ashuchawla is offline   Reply With Quote
Old 09-18-2013, 12:13 PM   #10
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

You could try the genomic association tester: http://code.google.com/p/genomic-association-tester/
kopi-o is offline   Reply With Quote
Old 09-25-2013, 09:47 AM   #11
jwfoley
Senior Member
 
Location: Stanford

Join Date: Jun 2009
Posts: 179
Default

Count the number of sequence reads from each ChIP-seq sample in each of your genome regions (make sure to account for the strand shift by using wide regions or shifting the reads), so you get a regions samples matrix of read counts. Then you can import this into DESeq to do a variance-stabilized t-test comparing signal in one class with signal in another class, or a GLM for more complicated models.
jwfoley is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:52 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO