|Thread||Thread Starter||Forum||Replies||Last Post|
|Question about inputs for ChIP-seq||omy567||Epigenetics||8||03-20-2014 08:58 AM|
|ChIP-Seq: ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysi||Newsbot!||Literature Watch||0||03-02-2011 03:50 AM|
|09-12-2016, 11:30 AM||#1|
Join Date: Mar 2013
ChIP-Seq knockouts v. inputs - how to analayze my data?
My data is actually RIP-Seq (RNA immunoprecipitation). However, I don't feel I am getting the answers I need asking around from an RNASeq perspective. I am hoping to get some insight from those familiar with ChIP-Seq.
I have RNA sequencing from mouse neural tissue where I IPed for a protein of interest (no crosslinking). There are three basic treatments. I have: Drug 1, Drug 1 + 2, and Vehicle in triplicate. There is a corresponding knockout sample for each treatment (instead of IgG/bead). In addition to this, I also have the inputs.
From here, there are two basic analyses I want to do:
Get relative counts for each gene in each condition. These data could be used for something like clustering or classification analysis - it answers the questions: "What pool of RNA bound to my protein of interest were immunoprecipitated in each condition and how much (relatively) do I have?"
I can easily normalize using one of the many normalization schemes commonly employed in RNASeq. However, how do I factor in my knockouts for each condition? Or do I just use the input? Can I divide all counts by the totals? Essentially, how to subtract out my background?
Option 1: Treat data as RNASeq. Then use some commonly applicable software (DESeq2, EdgeR, limma for RNASeq) that models the data. Here, the knockouts are treated as an interacting factor in my model matrix. But what about the input? Can they be used as well?
Option 2: Treat data as RIP-Seq. Use RIPSeeker to analyze data. RIPSeeker uses a similar strategy to ChIP-Seq by using peak-calling. But here it seems again I have to chose between knockouts or input.
Then there's this more broad question:
Creating a consensus list
Using input or KO in each of the scenarios above may produce lists that are somewhat different. In fact, even between option 1 and option 2 of differential expression analysis might result in different sets of genes. How do I know which is the "real" list? I am tempted to just look which list conforms with the literature but this seems biased. How can I confidently select which list is "correct"?
I just feel I have so many options in front of me and I want to ensure I am approaching this correctly.
Other relevant threads imply that input may not be as useful:
Last edited by syntonicC; 09-13-2016 at 07:52 AM.
|12-20-2016, 02:31 AM||#2|
Join Date: Jul 2012
I don't have an answer, but I am also very interested in this topic. We are doing very similar experiments and there seems to be little out there on the analysis.
|12-21-2016, 10:10 AM||#3|
Join Date: Jul 2014
Yeah, we've done some differential ChIP analysis but it was also pretty much seat-of-the-pants. We used a combination of your two approaches:
1) define a set of binding intervals based on a peak caller (MACS2)
2) adapt RNA-seq methods to detect differences in binding affinity within these intervals (using overall Input read counts within these intervals as a normalization factor)
My intuition is that if you have a true knockout, this can replace your Input samples as a better control right? Because that still accounts for any non-specific binding by the beads, for example.
I don't think it is surprising that you get different gene lists using your two approaches. Consider the case where a gene has multiple binding sites - how do you handle this with Option 2? Do you take something like the mean binding affinity across all sites in a given gene? What about upstream binding activity?
You might find this thread of interest - some of the differential ChIP methods out there are essentially wrappers around RNA-seq tools like edgeR:
|12-26-2016, 02:51 PM||#4|
Join Date: Mar 2013
Thanks for the replies.
From reading into this more in the past few months, this is what I have discovered:
1) Inputs can be used for normalization (and seem to be more popular). To calculate enrichment that shows the success of the IP you need input from both WT and KO. Unfortunately, in my case I only had the WT input. I could just sequence KO tissue but I was worried about technical variability that might occur because of this months after the initial experiment. If you have input and KOs you could try normalizing to the input first (enrichment for WT and KO) and then exclude any genes that show up in the KO.
2) DESeq2/EdgeR, and other similar tools are not really designed to handle the case of comparing two conditions that are themselves compared to their KO counterparts. You can analyze the WT and KO conditions separately though. One filtering approach I used was only possible because there is a known target list available for the WT condition and a list of likely "non-targets" from the literature. I checked the WT/KO count ratio for the known targets and likely "non-targets" and found that the ratio was also much closer to 1 in the "non-targets" case (i.e., high KO background). This allowed me to set a cutoff to filter potential non-targets.
RIPSeeker, ASPeak, and Piranha are all designed to analyze RIP-SEQ data but they are all pretty new. Personally, I had some issues running them and getting data that made any sense to me. But they address the issue @fanli pointed out in their post about binding intervals. I think some of the tools recommend setting bins that are the size of the sequenced fragments.
3) Normalization (such as by DESeq2) can obliterate count differences between WT and KO samples. These inflated KO counts are not terribly useful for analysis if you are trying to assess background. I found that the upper quartile normalization seemed to work better for the purpose of maintaining this WT-to-KO ratio.
The best workaround for this issue is to use spike-ins that can be used as a normalization factor to ensure the ratio between the WT and KO libraries are maintained. Another alternative would be to scale back the KO counts based on some kind of factor if you have the ratio of concentrations between WT and KO from BioAnalyzer. This assumes the ratios are maintained through sequencing. Not ideal but it might get you started...
Last edited by syntonicC; 12-26-2016 at 05:38 PM.
|chip-seq analysis, rnaseq analysis|