SEQanswers

Go Back   SEQanswers > Applications Forums > Epigenetics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Question about inputs for ChIP-seq omy567 Epigenetics 8 03-20-2014 08:58 AM
ChIP-Seq: ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysi Newsbot! Literature Watch 0 03-02-2011 03:50 AM

Reply
 
Thread Tools
Old 09-12-2016, 11:30 AM   #1
syntonicC
Junior Member
 
Location: USA

Join Date: Mar 2013
Posts: 6
Default ChIP-Seq knockouts v. inputs - how to analayze my data?

My data is actually RIP-Seq (RNA immunoprecipitation). However, I don't feel I am getting the answers I need asking around from an RNASeq perspective. I am hoping to get some insight from those familiar with ChIP-Seq.

I have RNA sequencing from mouse neural tissue where I IPed for a protein of interest (no crosslinking). There are three basic treatments. I have: Drug 1, Drug 1 + 2, and Vehicle in triplicate. There is a corresponding knockout sample for each treatment (instead of IgG/bead). In addition to this, I also have the inputs.

From here, there are two basic analyses I want to do:

Relative counts
Get relative counts for each gene in each condition. These data could be used for something like clustering or classification analysis - it answers the questions: "What pool of RNA bound to my protein of interest were immunoprecipitated in each condition and how much (relatively) do I have?"

I can easily normalize using one of the many normalization schemes commonly employed in RNASeq. However, how do I factor in my knockouts for each condition? Or do I just use the input? Can I divide all counts by the totals? Essentially, how to subtract out my background?

Differential expression
Option 1: Treat data as RNASeq. Then use some commonly applicable software (DESeq2, EdgeR, limma for RNASeq) that models the data. Here, the knockouts are treated as an interacting factor in my model matrix. But what about the input? Can they be used as well?

Option 2: Treat data as RIP-Seq. Use RIPSeeker to analyze data. RIPSeeker uses a similar strategy to ChIP-Seq by using peak-calling. But here it seems again I have to chose between knockouts or input.

Then there's this more broad question:
Creating a consensus list
Using input or KO in each of the scenarios above may produce lists that are somewhat different. In fact, even between option 1 and option 2 of differential expression analysis might result in different sets of genes. How do I know which is the "real" list? I am tempted to just look which list conforms with the literature but this seems biased. How can I confidently select which list is "correct"?

I just feel I have so many options in front of me and I want to ensure I am approaching this correctly.

Other relevant threads imply that input may not be as useful:
http://seqanswers.com/forums/showthread.php?t=12092
http://seqanswers.com/forums/showthread.php?t=35377
http://seqanswers.com/forums/showthread.php?t=6918
http://seqanswers.com/forums/showthread.php?t=4480
http://seqanswers.com/forums/showthread.php?t=8783

Last edited by syntonicC; 09-13-2016 at 07:52 AM.
syntonicC is offline   Reply With Quote
Old 12-20-2016, 02:31 AM   #2
krespim
Member
 
Location: Dresden

Join Date: Jul 2012
Posts: 49
Default

I don't have an answer, but I am also very interested in this topic. We are doing very similar experiments and there seems to be little out there on the analysis.
krespim is offline   Reply With Quote
Old 12-21-2016, 10:10 AM   #3
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 197
Default

Yeah, we've done some differential ChIP analysis but it was also pretty much seat-of-the-pants. We used a combination of your two approaches:
1) define a set of binding intervals based on a peak caller (MACS2)
2) adapt RNA-seq methods to detect differences in binding affinity within these intervals (using overall Input read counts within these intervals as a normalization factor)

My intuition is that if you have a true knockout, this can replace your Input samples as a better control right? Because that still accounts for any non-specific binding by the beads, for example.

I don't think it is surprising that you get different gene lists using your two approaches. Consider the case where a gene has multiple binding sites - how do you handle this with Option 2? Do you take something like the mean binding affinity across all sites in a given gene? What about upstream binding activity?

You might find this thread of interest - some of the differential ChIP methods out there are essentially wrappers around RNA-seq tools like edgeR:
https://www.biostars.org/p/195689/
fanli is offline   Reply With Quote
Old 12-26-2016, 02:51 PM   #4
syntonicC
Junior Member
 
Location: USA

Join Date: Mar 2013
Posts: 6
Default

Thanks for the replies.

Quote:
Originally Posted by fanli View Post
My intuition is that if you have a true knockout, this can replace your Input samples as a better control right? Because that still accounts for any non-specific binding by the beads, for example.
This is what I have read as well.

Quote:
Originally Posted by fanli View Post
I don't think it is surprising that you get different gene lists using your two approaches. Consider the case where a gene has multiple binding sites - how do you handle this with Option 2? Do you take something like the mean binding affinity across all sites in a given gene? What about upstream binding activity?
I think you are definitely right here and this is one of the major reasons I was hesitant to use thse tools for RIP-SEQ. I wasn't sure what the best approach was.

From reading into this more in the past few months, this is what I have discovered:

1) Inputs can be used for normalization (and seem to be more popular). To calculate enrichment that shows the success of the IP you need input from both WT and KO. Unfortunately, in my case I only had the WT input. I could just sequence KO tissue but I was worried about technical variability that might occur because of this months after the initial experiment. If you have input and KOs you could try normalizing to the input first (enrichment for WT and KO) and then exclude any genes that show up in the KO.

2) DESeq2/EdgeR, and other similar tools are not really designed to handle the case of comparing two conditions that are themselves compared to their KO counterparts. You can analyze the WT and KO conditions separately though. One filtering approach I used was only possible because there is a known target list available for the WT condition and a list of likely "non-targets" from the literature. I checked the WT/KO count ratio for the known targets and likely "non-targets" and found that the ratio was also much closer to 1 in the "non-targets" case (i.e., high KO background). This allowed me to set a cutoff to filter potential non-targets.

RIPSeeker, ASPeak, and Piranha are all designed to analyze RIP-SEQ data but they are all pretty new. Personally, I had some issues running them and getting data that made any sense to me. But they address the issue @fanli pointed out in their post about binding intervals. I think some of the tools recommend setting bins that are the size of the sequenced fragments.

3) Normalization (such as by DESeq2) can obliterate count differences between WT and KO samples. These inflated KO counts are not terribly useful for analysis if you are trying to assess background. I found that the upper quartile normalization seemed to work better for the purpose of maintaining this WT-to-KO ratio.

The best workaround for this issue is to use spike-ins that can be used as a normalization factor to ensure the ratio between the WT and KO libraries are maintained. Another alternative would be to scale back the KO counts based on some kind of factor if you have the ratio of concentrations between WT and KO from BioAnalyzer. This assumes the ratios are maintained through sequencing. Not ideal but it might get you started...

Last edited by syntonicC; 12-26-2016 at 05:38 PM.
syntonicC is offline   Reply With Quote
Reply

Tags
chip-seq analysis, rnaseq analysis

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:26 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO