Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MACS2 ChIP-SEQ ANALYSIS WITH BIOLOGICAL REPLICATES

    I am having problems with the broad mark H3K27me3, my biological replicates and identifying differential enrichment between treatment groups

    I have performed ChIP-seq in Honeybee on 3 treatment groups each treatment has two biological replicates and one input using the mark H3K27me3. The reads from each sample were aligned to the reference genome using bowtie with between 70-85% mapping. The reads that mapped for each sample range from between 20M-50M reads. I have fiddled around with lots of different peak callers including diffReps, Peakseq, CLC genomics and MACS and MAC2. MACS2 seems to be the only one that can really deal with broad marks. I have had to keep duplicate reads in the analysis because when they are removed I get no peaks. The peak sets I get from MACS2 reveal a vast difference in number of peaks between both biological replicates and treatment groups. The different number of peaks in the treatment groups could well be biologically relevant however I am worried about how to deal with the differences between biological replicates. I have noted that people combine their replicates i.e. concatenate or merge the files in the analysis but when I do this it seems to bias towards one of the replicates.

  • #2
    What is your duplication rate?

    Even if you call no peaks, you can correlate overall signal as described in the link provided here: https://groups.google.com/forum/#!to...t/AO6mldNxIQI/

    I would try that and see if your replicates give higher correlations than your non-replicates.

    Comment


    • #3
      wigCorrelation results

      Hey

      To answer your first question when the duplicates are left out of the analysis MACS reports a redundant rate as high as 0.42 in my treatments. When using keep-dup 5 the redundant rate is reduced to 0.05.

      I have preformed the wigCorrelation which may have thrown a massive spanner into the works.

      Some of my replicates correlate much better with non-replicates (marked with *) than the replicates.

      correlation between replicates
      W1 vs W2 = -0.007
      A1 vs A2 = 0.314
      Q2 vs Q3 = 0.906

      correlation between non replicates
      W1 vs Q2 = -0.319
      W1 vs Q3 = -0.282
      W1 vs A1 = 0.082 *
      W1 vs A2 =0.187 *
      W2 vs Q2 = 0.642 *
      W2 vs Q3 = 0.625 *
      W2 vs A1 = 0.602 *
      W2 vs A2 = 0.195 *
      A1 vs Q2 = 0.603 *
      A1 vs Q3 = 0.584 *
      A2 vs Q2 = 0.068
      A2 vs Q3 = 0.053

      It is obvious that I can't combine my replicates now, but where to from here?

      Thanks
      Megan

      Comment


      • #4
        I think your next step is to try to figure out what's going on. I'd start with W1 and W2 as they seem to be horribly correlated but should be biological replicates.

        So I would do a few things with those two in particular. First, make a list of metrics for each; total reads, total aligned reads, duplication rate, etc. I don't know if each was sequenced on one lane or multiple; regardless, I would run all of the raw reads through FastQC and see if that shows anything. I don't know if they are single or paired end but see if MACS2 reported a similar d value for each (and look at the bioanalyzer run for each of them to see if the libraries looked to be of similar fragment distributions). Also look at some of the peak regions in a viewer such as IGV; do they look similar at all between the two samples? Check some of the highest scored peak regions as well as a more broad view.

        Comment


        • #5
          Originally posted by leaskimo View Post
          I am having problems with the broad mark H3K27me3, my biological replicates and identifying differential enrichment between treatment groups
          In my opinion, histone marks like H3K27me3, H3K36me3 are just too broad for MACS2 to effectively capture.

          I have tried many (about 10) different peak callers, and I think SICER really stands out (in a good way) in how it performs. It seems to effectively capture both small and large gaps in signal, and unifies peaks where they need to be unified. So far it's by far the best broad peak caller I've tried.

          Comment


          • #6
            Originally posted by apredeus View Post
            In my opinion, histone marks like H3K27me3, H3K36me3 are just too broad for MACS2 to effectively capture.

            I have tried many (about 10) different peak callers, and I think SICER really stands out (in a good way) in how it performs. It seems to effectively capture both small and large gaps in signal, and unifies peaks where they need to be unified. So far it's by far the best broad peak caller I've tried.
            What are these different peak callers you have tried? What's the metrics you used to evaluate their performances?

            Comment


            • #7
              I've tried MACS, MACS2, SICER, SISSR, Rseg, BroadPeak, HotSpot, and I really can't remember what else. I've also experimented with settings on those peak callers quite a bit, especially on MACS2, SICER and Rseg.

              As for the metrics, I've discovered that simple visual inspection of TDF files of Chip-Seq, Input, and BED file of the called peaks makes it very obvious. I'll try to look for screenshots I've made but I'm not sure I'll be able to find them.

              At any rate, if anyone has an opinion different from mine, I'd love to hear it.

              Comment


              • #8
                Just a reminder, if you can wait for two months, you will know how people (Anshul) from ENCODE do with broad peaks stably.

                https://groups.google.com/forum/#!to...nt/yG8M8Sx_eTM

                Comment


                • #9
                  I second SICER for histone marks. MACS2 is the right pick for transcription factor ChIP. The wigCorrelation is still concerning though.

                  One thing you might be sure to check is your input read distribution for both W1 and W2. It kinda looks like one of those replicates just may not have worked at all, as you would expect a near 0 correlation with any successful ChIP-seq compared to basically nothing.

                  Also, correlation between different treatments could be caused by input bias or sequencing bias. So, if you had a poor batch of crosslinking or maybe library prep wasn't so good, and certain groups all went through those steps together, that may explain W2 being more highly related to Q2, Q3 and A1.

                  So you might group your samples by date processed through the various steps and see if that explains anything?

                  Comment


                  • #10
                    Originally posted by harryzs View Post
                    Just a reminder, if you can wait for two months, you will know how people (Anshul) from ENCODE do with broad peaks stably.

                    https://groups.google.com/forum/#!to...nt/yG8M8Sx_eTM
                    Sweet, thanks for the reminder. I should re-run some of the peak calling I've done in the past and post some screenshots here, should be fun. But maybe I'll wait until they publish their findings and/or recommended software and settings.

                    Comment


                    • #11
                      Originally posted by apredeus View Post
                      In my opinion, histone marks like H3K27me3, H3K36me3 are just too broad for MACS2 to effectively capture.

                      I have tried many (about 10) different peak callers, and I think SICER really stands out (in a good way) in how it performs. It seems to effectively capture both small and large gaps in signal, and unifies peaks where they need to be unified. So far it's by far the best broad peak caller I've tried.
                      May I ask a question: for H3K27me3 (human/mouse), how many reads (depth) do we need to get "good" results, according to your experiences?
                      Last edited by harryzs; 10-08-2013, 11:13 AM.

                      Comment


                      • #12
                        It really depends on the quality of the Chip-Seq experiment, i.e. signal-to-noise ratio. As a general rule, I think ENCODE recommends higher number of reads for "broad" marks (20M or so). This, however, would not save you at all if your library is bad and has a lot of noise. So I would say 10M aligned unique reads is the lowest you want to go.

                        As an example of an amazingly clean library I can give this sample: GSE38046 (GSM932947 - GSM932951) from laboratory of M. Busslinger. It has about 23M reads with pretty low duplicate rates (in Chip-Seq analysis, I always turn on filtering of identical reads; both MACS and SICER do it by default). In general, the quality of their Chip-Seqs is astounding, best I've ever seen. Those guys are surely doing something right

                        The same experiment done by C.Murre (GSM987809) also displays a pretty good signal-to-noise ratio and correlates with Busslinger lab Chip-Seq very well. That sample adds up to 16M aligned reads.

                        Comment


                        • #13
                          Originally posted by apredeus View Post
                          It really depends on the quality of the Chip-Seq experiment, i.e. signal-to-noise ratio. As a general rule, I think ENCODE recommends higher number of reads for "broad" marks (20M or so). This, however, would not save you at all if your library is bad and has a lot of noise. So I would say 10M aligned unique reads is the lowest you want to go.

                          As an example of an amazingly clean library I can give this sample: GSE38046 (GSM932947 - GSM932951) from laboratory of M. Busslinger. It has about 23M reads with pretty low duplicate rates (in Chip-Seq analysis, I always turn on filtering of identical reads; both MACS and SICER do it by default). In general, the quality of their Chip-Seqs is astounding, best I've ever seen. Those guys are surely doing something right

                          The same experiment done by C.Murre (GSM987809) also displays a pretty good signal-to-noise ratio and correlates with Busslinger lab Chip-Seq very well. That sample adds up to 16M aligned reads.
                          Great. Thank you very much for sharing.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          10 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          9 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          51 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          67 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X