Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nice p-value, bad FDR

    Hi guys,

    I'm very new in the field of ChIP-seq and my lab doesn't really have previous experience with it so I apologize for my lack of knowledge. My post is quite long and I organized it a bit so that you can jump to your section of interest : technical details are in point 1) and the problem is detailed in point 2)

    1) I'm working on a transcription factor with no relevant target described yet so I couldn't really optimize my protocol based on qPCR. So what I've been told to do by people working in the field is to show that I'm able to IP my protein of interest and that I had a nice enrichment (more than 5x) in the amount of DNA that I recover with my antibody over the control IgG (measured with qBit). I'm fully aware that it's not optimal but I was quite in a hurry and these guys have experience in the field so it looked like my best option. We sent the sample to a facility and we got 60*10^6 reads for the immunoprecipitated DNA and the input DNA. We then sent the data to a friend of us who's bioinformatician and he returned us a file with 15 000 MACS peaks linked to +- 6000 genes. The good thing is that we have microarray for WT and KO of our transcription factor.
    2) So first these 6000 genes seems enormous to us but we went through a few papers which describe ChIP-seq and report 4000-5000 genes so it might not be as surprising as we could have thought. The surprise is that while the p-value seems good to us, the corresponding FDR is always sky high. For instance, many of the genes have an FDR of 100 while their respective p-value is below 10^-100. As we have no real experienced with it we don't really know where the problem is coming from. The only explanation I found is that when I display the peaks in the UCSC broswer I've got very nice enrichment but "every time" I can find a peak in my IP, I can find a smaller peak at the same spot in the input such as on this image :



    So from what I (poorly) understood after reading the MACS paper, the p-value is rather related to how much the peak is significant in the IP sample while the FDR looks at the chances you've to find this peaks with a significant p-value in the control sample? I guess this would be the easiest explanation for what we observe?

    Did any of you went through the same problems?


    JC

  • #2
    MACS FDR is heavily influcenced by what your background looks like, and I've seen many a decent chipseq lane where the FDR was utterly unusable.

    Plus it has the amazing property that stronger peaks can have worse FDR than smaller ones if there is just a handful of strong sites in your background control (centromeres for example).

    On the other hand, what you're showing in the picture doesn't look like enrichment to me - you should see the peaks rising above the noise on the same lane. Maybe the one to the right, but 2.5x isn't much enrichment, so maybe MACS is right for once.

    What do your best peaks look like?

    Comment


    • #3
      Hi ffinkernagel and than you very much for your feedback.

      Indeed it does not really comes out of the background in this case. I will recieve the normalized wigs files later today. For now I could have a look at the xls file that my friend gave me with the gene name, FDR etc. and also to the non-normalized files (which is enough to tell see that "every time" there's a peak in the IP, there's one in the input.

      I'll provide you a screenshot of my best peaks once I've recieved the files. Do you have any indication on how I should choose these best peaks ? Should I choose it based on the FC, the FDR or the p-value ?

      Comment


      • #4
        Start with the peaks with the most reads. Well, actually, skip the first 10 or so that have x-thousand reads - they're usually artifacts. You're looking for something in the 95%th but not in the 99%th percentile.

        Another thing to consider: Do you know how many unique locations your 60 million reads mapped to?

        Comment


        • #5
          indeed, small p-value with high FDR is not rare.
          Quote from MACS FAQ page:
          "I'm looking into using the FDR values calculated from MACS, and am getting some odd behavior. In one case, the relationship seems inverse of what I'd expect with small p-values corresponding to higher FDR's. In the other case I see that, up to a point, a smaller p-value corresponds to a lower FDR. Beyond a certain point, though, the FDR starts to rise with smaller p-values, which seems off to me. Any thoughts out there on this? (From Tim Reddy)
          In MACS, the FDR values and p-values are not necessary to be correlated monotonically. For a certain p-value we calculate how many peaks can be called from treatment against control, and how many peaks can be called from control against treatment by this p-value as cutoff. Then use these two numbers to compute FDR. At last we can assign FDR for every p-value. Sometimes, there are several peaks in control sample with very significant p-values, so the FDR for this low p-value can be quite high."

          Comment


          • #6
            Hi guys,

            Sorry for the delay but I couldn't obtain the data faster.
            I don't have the information regarding one peak taken randomly that match with my transcriptomic data : Here
            Although the peak in the input is lower than in the previous example, the FDR is around 95%.
            Most of the peaks I went through look like the image I posted previously (while 12000 of the 15000 peaks have a fold enrichment higher than 5x, I observe a similar peak in the input). I think that this paper can explain what I observe : http://www.pnas.org/content/106/35/14926.full
            Do you guys have recommendation about what I should do with these data and if they're appropriate for analysis, how to analyze them properly ?

            Thank you in advance

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM
            • seqadmin
              The Impact of AI in Genomic Medicine
              by seqadmin



              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
              02-26-2024, 02:07 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-14-2024, 06:13 AM
            0 responses
            33 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-08-2024, 08:03 AM
            0 responses
            72 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-07-2024, 08:13 AM
            0 responses
            81 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-06-2024, 09:51 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X