Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ChIP-Seq knockouts v. inputs - how to analayze my data?

    My data is actually RIP-Seq (RNA immunoprecipitation). However, I don't feel I am getting the answers I need asking around from an RNASeq perspective. I am hoping to get some insight from those familiar with ChIP-Seq.

    I have RNA sequencing from mouse neural tissue where I IPed for a protein of interest (no crosslinking). There are three basic treatments. I have: Drug 1, Drug 1 + 2, and Vehicle in triplicate. There is a corresponding knockout sample for each treatment (instead of IgG/bead). In addition to this, I also have the inputs.

    From here, there are two basic analyses I want to do:

    Relative counts
    Get relative counts for each gene in each condition. These data could be used for something like clustering or classification analysis - it answers the questions: "What pool of RNA bound to my protein of interest were immunoprecipitated in each condition and how much (relatively) do I have?"

    I can easily normalize using one of the many normalization schemes commonly employed in RNASeq. However, how do I factor in my knockouts for each condition? Or do I just use the input? Can I divide all counts by the totals? Essentially, how to subtract out my background?

    Differential expression
    Option 1: Treat data as RNASeq. Then use some commonly applicable software (DESeq2, EdgeR, limma for RNASeq) that models the data. Here, the knockouts are treated as an interacting factor in my model matrix. But what about the input? Can they be used as well?

    Option 2: Treat data as RIP-Seq. Use RIPSeeker to analyze data. RIPSeeker uses a similar strategy to ChIP-Seq by using peak-calling. But here it seems again I have to chose between knockouts or input.

    Then there's this more broad question:
    Creating a consensus list
    Using input or KO in each of the scenarios above may produce lists that are somewhat different. In fact, even between option 1 and option 2 of differential expression analysis might result in different sets of genes. How do I know which is the "real" list? I am tempted to just look which list conforms with the literature but this seems biased. How can I confidently select which list is "correct"?

    I just feel I have so many options in front of me and I want to ensure I am approaching this correctly.

    Other relevant threads imply that input may not be as useful:
    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    Any non-primary sequence heritable modification of genetic material. ChIP-SEQ, DNA methylation (Bisulfite-SEQ), chromatin modifications (methylation, acetylation, etc), non coding RNA.

    Any non-primary sequence heritable modification of genetic material. ChIP-SEQ, DNA methylation (Bisulfite-SEQ), chromatin modifications (methylation, acetylation, etc), non coding RNA.

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc
    Last edited by syntonicC; 09-13-2016, 06:52 AM.

  • #2
    I don't have an answer, but I am also very interested in this topic. We are doing very similar experiments and there seems to be little out there on the analysis.

    Comment


    • #3
      Yeah, we've done some differential ChIP analysis but it was also pretty much seat-of-the-pants. We used a combination of your two approaches:
      1) define a set of binding intervals based on a peak caller (MACS2)
      2) adapt RNA-seq methods to detect differences in binding affinity within these intervals (using overall Input read counts within these intervals as a normalization factor)

      My intuition is that if you have a true knockout, this can replace your Input samples as a better control right? Because that still accounts for any non-specific binding by the beads, for example.

      I don't think it is surprising that you get different gene lists using your two approaches. Consider the case where a gene has multiple binding sites - how do you handle this with Option 2? Do you take something like the mean binding affinity across all sites in a given gene? What about upstream binding activity?

      You might find this thread of interest - some of the differential ChIP methods out there are essentially wrappers around RNA-seq tools like edgeR:

      Comment


      • #4
        Thanks for the replies.

        Originally posted by fanli View Post
        My intuition is that if you have a true knockout, this can replace your Input samples as a better control right? Because that still accounts for any non-specific binding by the beads, for example.
        This is what I have read as well.

        Originally posted by fanli View Post
        I don't think it is surprising that you get different gene lists using your two approaches. Consider the case where a gene has multiple binding sites - how do you handle this with Option 2? Do you take something like the mean binding affinity across all sites in a given gene? What about upstream binding activity?
        I think you are definitely right here and this is one of the major reasons I was hesitant to use thse tools for RIP-SEQ. I wasn't sure what the best approach was.

        From reading into this more in the past few months, this is what I have discovered:

        1) Inputs can be used for normalization (and seem to be more popular). To calculate enrichment that shows the success of the IP you need input from both WT and KO. Unfortunately, in my case I only had the WT input. I could just sequence KO tissue but I was worried about technical variability that might occur because of this months after the initial experiment. If you have input and KOs you could try normalizing to the input first (enrichment for WT and KO) and then exclude any genes that show up in the KO.

        2) DESeq2/EdgeR, and other similar tools are not really designed to handle the case of comparing two conditions that are themselves compared to their KO counterparts. You can analyze the WT and KO conditions separately though. One filtering approach I used was only possible because there is a known target list available for the WT condition and a list of likely "non-targets" from the literature. I checked the WT/KO count ratio for the known targets and likely "non-targets" and found that the ratio was also much closer to 1 in the "non-targets" case (i.e., high KO background). This allowed me to set a cutoff to filter potential non-targets.

        RIPSeeker, ASPeak, and Piranha are all designed to analyze RIP-SEQ data but they are all pretty new. Personally, I had some issues running them and getting data that made any sense to me. But they address the issue @fanli pointed out in their post about binding intervals. I think some of the tools recommend setting bins that are the size of the sequenced fragments.

        3) Normalization (such as by DESeq2) can obliterate count differences between WT and KO samples. These inflated KO counts are not terribly useful for analysis if you are trying to assess background. I found that the upper quartile normalization seemed to work better for the purpose of maintaining this WT-to-KO ratio.

        The best workaround for this issue is to use spike-ins that can be used as a normalization factor to ensure the ratio between the WT and KO libraries are maintained. Another alternative would be to scale back the KO counts based on some kind of factor if you have the ratio of concentrations between WT and KO from BioAnalyzer. This assumes the ratios are maintained through sequencing. Not ideal but it might get you started...
        Last edited by syntonicC; 12-26-2016, 05:38 PM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X