Hi guys,
I'm very new in the field of ChIP-seq and my lab doesn't really have previous experience with it so I apologize for my lack of knowledge. My post is quite long and I organized it a bit so that you can jump to your section of interest : technical details are in point 1) and the problem is detailed in point 2)
1) I'm working on a transcription factor with no relevant target described yet so I couldn't really optimize my protocol based on qPCR. So what I've been told to do by people working in the field is to show that I'm able to IP my protein of interest and that I had a nice enrichment (more than 5x) in the amount of DNA that I recover with my antibody over the control IgG (measured with qBit). I'm fully aware that it's not optimal but I was quite in a hurry and these guys have experience in the field so it looked like my best option. We sent the sample to a facility and we got 60*10^6 reads for the immunoprecipitated DNA and the input DNA. We then sent the data to a friend of us who's bioinformatician and he returned us a file with 15 000 MACS peaks linked to +- 6000 genes. The good thing is that we have microarray for WT and KO of our transcription factor.
2) So first these 6000 genes seems enormous to us but we went through a few papers which describe ChIP-seq and report 4000-5000 genes so it might not be as surprising as we could have thought. The surprise is that while the p-value seems good to us, the corresponding FDR is always sky high. For instance, many of the genes have an FDR of 100 while their respective p-value is below 10^-100. As we have no real experienced with it we don't really know where the problem is coming from. The only explanation I found is that when I display the peaks in the UCSC broswer I've got very nice enrichment but "every time" I can find a peak in my IP, I can find a smaller peak at the same spot in the input such as on this image :
So from what I (poorly) understood after reading the MACS paper, the p-value is rather related to how much the peak is significant in the IP sample while the FDR looks at the chances you've to find this peaks with a significant p-value in the control sample? I guess this would be the easiest explanation for what we observe?
Did any of you went through the same problems?
JC
I'm very new in the field of ChIP-seq and my lab doesn't really have previous experience with it so I apologize for my lack of knowledge. My post is quite long and I organized it a bit so that you can jump to your section of interest : technical details are in point 1) and the problem is detailed in point 2)
1) I'm working on a transcription factor with no relevant target described yet so I couldn't really optimize my protocol based on qPCR. So what I've been told to do by people working in the field is to show that I'm able to IP my protein of interest and that I had a nice enrichment (more than 5x) in the amount of DNA that I recover with my antibody over the control IgG (measured with qBit). I'm fully aware that it's not optimal but I was quite in a hurry and these guys have experience in the field so it looked like my best option. We sent the sample to a facility and we got 60*10^6 reads for the immunoprecipitated DNA and the input DNA. We then sent the data to a friend of us who's bioinformatician and he returned us a file with 15 000 MACS peaks linked to +- 6000 genes. The good thing is that we have microarray for WT and KO of our transcription factor.
2) So first these 6000 genes seems enormous to us but we went through a few papers which describe ChIP-seq and report 4000-5000 genes so it might not be as surprising as we could have thought. The surprise is that while the p-value seems good to us, the corresponding FDR is always sky high. For instance, many of the genes have an FDR of 100 while their respective p-value is below 10^-100. As we have no real experienced with it we don't really know where the problem is coming from. The only explanation I found is that when I display the peaks in the UCSC broswer I've got very nice enrichment but "every time" I can find a peak in my IP, I can find a smaller peak at the same spot in the input such as on this image :
So from what I (poorly) understood after reading the MACS paper, the p-value is rather related to how much the peak is significant in the IP sample while the FDR looks at the chances you've to find this peaks with a significant p-value in the control sample? I guess this would be the easiest explanation for what we observe?
Did any of you went through the same problems?
JC
Comment