Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Offset" in the correlation between two ChIP-seq biological replicates

    Hello.
    I was looking at the correlation of signal intensities between two biological replicates (ChIP-Seq) and I got a strange plot. I was wondering if anyone has seen something like this before, or if anyone has any ideas of why this is happening.

    I counted the number of reads in 1kb windows across the whole genome, and plotted replicate 1 Vs. replicate 2. In the image it shows the raw signal and the normalized "Reads per million" signal.
    There seems to be some kind of an "offset" between two trend lines in the plot:



    I have looked at the GC content, and position relative to genes (exons, introns, TSS, TTS) of the windows that are in the two different "offsets", and nothing came out as significantly different.
    Any ideas would be very much appreciated!
    Many thanks!
    ines

  • #2
    I think you should take a look at some of the extreme points in the plot in a genome browser and see if the raw data shows any evidence of weirdness - e.g.PCR artifacts - which would be visible as read stacking (same read sequence, same start and end position).
    If this was the case then removing duplicate reads from the data would reduce/eliminate the effect in the plots.

    If the reads in both samples look normal then maybe some manual checking of some peaks to ensure that the code you have written to generate the plots is correct.

    Maybe rather than plotting for whole genome break it out by chromosome and see if it is a genome-wide or localized effect?

    Comment


    • #3
      Hello pmcget,
      I tried the same plot with and without replicates. The one I posted is without duplicates, which I removed using picard.

      I don’t think the code has any error. I tried the same code with other datasets and the plots look fine, with good correlations between biological replicates. I also tried a different software (DiffBind) to count reads and the same plot is created. I used DiffBind to count reads in regions that are called as peaks in both replicates, so essentially, instead of considering 1kb windows across the genome, DiffBind considers only the genomic regions overlapped by peaks. But the same plot is produced, with that "double" correlation.

      I just looked at the correlation plot by chromosome, and here is the result:

      Comment


      • #4
        Is it possible that your peaks are enriched for repetitive sequences? There is a lot of inter-individual variation in some of these sequences e.g. microsatelites/STRs

        You could look at overlap of the 2 offset groups with the overall repeatmasker track - or even subtypes of repeat.

        You could also do a quick check of some of the extreme peaks and see if the reads are piling up over regions that are annotated as repeats. e.g. upload the raw reads and peak information into IGV and then load the repeatmasker track from UCSC.
        It would be very useful to see an example of the 2 sorts of peaks in each replicate (i.e. the raw reads for the peak in a genome viewer).

        Are these biological samples from normal tissue/diseased tissue/cell lines/cancer lines? The biological origin of the samples might give some clue...

        Maybe the cells in one of your replicates is undergoing synchronized mitosis and the ChIP'd protein is a marker of this process??

        Comment


        • #5
          Hello pmcget,
          These are T47D cells, a human breast cancer cell line, and the Transcription factor is FOXA1 ChIP-Seq.
          I have checked a few peaks with the track "repeat masker", and I didn't find any striking difference or piling up of reads (at least by browsing the regions by eye).

          In the meantime, I started to consider a batch effect.
          The two biological replicates were done in two different days, alongside with other ChIP-Seq experiments.

          I tried to remove the batch effects using limma in R with the "removeBatchEffect" function.
          My matrix consists of 6 columns and ~2 million rows (each row is a 1kb window). The 6 columns correspond to 3 ChIP-seq experiments prepared in duplicates, but each duplicate was prepared in a different day.

          The design matrix is something like this:
          HTML Code:
          sample  cell  replicate  batch
          1  T47D  rep1  1
          2  T47D  rep2  2
          3  MCF7  rep1  1
          4  MCF7  rep2  2
          5  ZR751  rep1  1
          6  ZR751  rep2  2
          After running the batch effect correction (limma), I re-plot the pairwise comparison between two biological replicates (e.g. the two replicates in T47D cells) and it looks a bit better:



          I am not sure how to interpret this.. but it seems like the batch effect pushes some of the signal apart and creates that weird looking plot..
          What are your thoughts?

          Thanks very much!
          ines

          Comment


          • #6
            Looks like your ChIP worked very well and that the enrichment is 1.5x higher in replicate 2. Most regions will benegative and have about the same read numbers, with repeats giving the lower arm in the plot. This is evident from chrY where there are no true binding sites but only misaligned reads.

            Comment


            • #7
              Good point.. The chrY shows the negative regions, and they correspond to the lower arm of the plot.
              Thanks!

              Comment


              • #8
                Hi inesdesantiago,

                I think chipper is correct.

                If you want to graphically investigate it further you could identify all the windows with a substantial repeat overlap e.g. using BEDtools intersectbed.

                Then you could colour the repeat enriched windows differently to the non-repeats in your plot and see do they segregate into the 2 arms of the plot.

                Comment


                • #9
                  Hello.
                  Looking at different classes of repeats, it seems like satellite repeats are enriched in the lower arm. Also quite common in the Y chromosome.
                  Thanks!
                  ines
                  Last edited by inesdesantiago; 02-28-2013, 02:44 AM.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  8 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  8 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X