SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
ChIP-seq peak caller that handles biological replicates AnnaE Bioinformatics 6 05-13-2015 12:49 PM
Biological replicates for RNA-seq vpp605 RNA Sequencing 15 08-29-2014 04:30 AM
"Offset" in the correlation between two ChIP-seq biological replicates inesdesantiago Bioinformatics 8 02-27-2013 08:20 AM
TSS analysis without biological replicates ralfne RNA Sequencing 0 11-14-2012 01:38 AM
ChIP-Seq Biological Replicates LouDore General 0 08-11-2009 11:35 AM

Reply
 
Thread Tools
Old 08-06-2013, 04:17 PM   #1
leaskimo
Junior Member
 
Location: New Zealand

Join Date: Aug 2013
Posts: 2
Default MACS2 ChIP-SEQ ANALYSIS WITH BIOLOGICAL REPLICATES

I am having problems with the broad mark H3K27me3, my biological replicates and identifying differential enrichment between treatment groups

I have performed ChIP-seq in Honeybee on 3 treatment groups each treatment has two biological replicates and one input using the mark H3K27me3. The reads from each sample were aligned to the reference genome using bowtie with between 70-85% mapping. The reads that mapped for each sample range from between 20M-50M reads. I have fiddled around with lots of different peak callers including diffReps, Peakseq, CLC genomics and MACS and MAC2. MACS2 seems to be the only one that can really deal with broad marks. I have had to keep duplicate reads in the analysis because when they are removed I get no peaks. The peak sets I get from MACS2 reveal a vast difference in number of peaks between both biological replicates and treatment groups. The different number of peaks in the treatment groups could well be biologically relevant however I am worried about how to deal with the differences between biological replicates. I have noted that people combine their replicates i.e. concatenate or merge the files in the analysis but when I do this it seems to bias towards one of the replicates.
leaskimo is offline   Reply With Quote
Old 08-06-2013, 09:51 PM   #2
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

What is your duplication rate?

Even if you call no peaks, you can correlate overall signal as described in the link provided here: https://groups.google.com/forum/#!to...t/AO6mldNxIQI/

I would try that and see if your replicates give higher correlations than your non-replicates.
Heisman is offline   Reply With Quote
Old 08-11-2013, 04:09 PM   #3
leaskimo
Junior Member
 
Location: New Zealand

Join Date: Aug 2013
Posts: 2
Unhappy wigCorrelation results

Hey

To answer your first question when the duplicates are left out of the analysis MACS reports a redundant rate as high as 0.42 in my treatments. When using keep-dup 5 the redundant rate is reduced to 0.05.

I have preformed the wigCorrelation which may have thrown a massive spanner into the works.

Some of my replicates correlate much better with non-replicates (marked with *) than the replicates.

correlation between replicates
W1 vs W2 = -0.007
A1 vs A2 = 0.314
Q2 vs Q3 = 0.906

correlation between non replicates
W1 vs Q2 = -0.319
W1 vs Q3 = -0.282
W1 vs A1 = 0.082 *
W1 vs A2 =0.187 *
W2 vs Q2 = 0.642 *
W2 vs Q3 = 0.625 *
W2 vs A1 = 0.602 *
W2 vs A2 = 0.195 *
A1 vs Q2 = 0.603 *
A1 vs Q3 = 0.584 *
A2 vs Q2 = 0.068
A2 vs Q3 = 0.053

It is obvious that I can't combine my replicates now, but where to from here?

Thanks
Megan
leaskimo is offline   Reply With Quote
Old 08-11-2013, 06:41 PM   #4
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

I think your next step is to try to figure out what's going on. I'd start with W1 and W2 as they seem to be horribly correlated but should be biological replicates.

So I would do a few things with those two in particular. First, make a list of metrics for each; total reads, total aligned reads, duplication rate, etc. I don't know if each was sequenced on one lane or multiple; regardless, I would run all of the raw reads through FastQC and see if that shows anything. I don't know if they are single or paired end but see if MACS2 reported a similar d value for each (and look at the bioanalyzer run for each of them to see if the libraries looked to be of similar fragment distributions). Also look at some of the peak regions in a viewer such as IGV; do they look similar at all between the two samples? Check some of the highest scored peak regions as well as a more broad view.
Heisman is offline   Reply With Quote
Old 10-01-2013, 12:18 PM   #5
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

Quote:
Originally Posted by leaskimo View Post
I am having problems with the broad mark H3K27me3, my biological replicates and identifying differential enrichment between treatment groups
In my opinion, histone marks like H3K27me3, H3K36me3 are just too broad for MACS2 to effectively capture.

I have tried many (about 10) different peak callers, and I think SICER really stands out (in a good way) in how it performs. It seems to effectively capture both small and large gaps in signal, and unifies peaks where they need to be unified. So far it's by far the best broad peak caller I've tried.
apredeus is offline   Reply With Quote
Old 10-07-2013, 08:42 PM   #6
gene_x
Senior Member
 
Location: MO

Join Date: May 2010
Posts: 108
Default

Quote:
Originally Posted by apredeus View Post
In my opinion, histone marks like H3K27me3, H3K36me3 are just too broad for MACS2 to effectively capture.

I have tried many (about 10) different peak callers, and I think SICER really stands out (in a good way) in how it performs. It seems to effectively capture both small and large gaps in signal, and unifies peaks where they need to be unified. So far it's by far the best broad peak caller I've tried.
What are these different peak callers you have tried? What's the metrics you used to evaluate their performances?
gene_x is offline   Reply With Quote
Old 10-07-2013, 09:06 PM   #7
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

I've tried MACS, MACS2, SICER, SISSR, Rseg, BroadPeak, HotSpot, and I really can't remember what else. I've also experimented with settings on those peak callers quite a bit, especially on MACS2, SICER and Rseg.

As for the metrics, I've discovered that simple visual inspection of TDF files of Chip-Seq, Input, and BED file of the called peaks makes it very obvious. I'll try to look for screenshots I've made but I'm not sure I'll be able to find them.

At any rate, if anyone has an opinion different from mine, I'd love to hear it.
apredeus is offline   Reply With Quote
Old 10-08-2013, 06:58 AM   #8
harryzs
Member
 
Location: Germany

Join Date: Dec 2010
Posts: 29
Default

Just a reminder, if you can wait for two months, you will know how people (Anshul) from ENCODE do with broad peaks stably.

https://groups.google.com/forum/#!to...nt/yG8M8Sx_eTM
harryzs is offline   Reply With Quote
Old 10-08-2013, 08:42 AM   #9
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

I second SICER for histone marks. MACS2 is the right pick for transcription factor ChIP. The wigCorrelation is still concerning though.

One thing you might be sure to check is your input read distribution for both W1 and W2. It kinda looks like one of those replicates just may not have worked at all, as you would expect a near 0 correlation with any successful ChIP-seq compared to basically nothing.

Also, correlation between different treatments could be caused by input bias or sequencing bias. So, if you had a poor batch of crosslinking or maybe library prep wasn't so good, and certain groups all went through those steps together, that may explain W2 being more highly related to Q2, Q3 and A1.

So you might group your samples by date processed through the various steps and see if that explains anything?
Wallysb01 is offline   Reply With Quote
Old 10-08-2013, 10:42 AM   #10
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

Quote:
Originally Posted by harryzs View Post
Just a reminder, if you can wait for two months, you will know how people (Anshul) from ENCODE do with broad peaks stably.

https://groups.google.com/forum/#!to...nt/yG8M8Sx_eTM
Sweet, thanks for the reminder. I should re-run some of the peak calling I've done in the past and post some screenshots here, should be fun. But maybe I'll wait until they publish their findings and/or recommended software and settings.
apredeus is offline   Reply With Quote
Old 10-08-2013, 11:09 AM   #11
harryzs
Member
 
Location: Germany

Join Date: Dec 2010
Posts: 29
Default

Quote:
Originally Posted by apredeus View Post
In my opinion, histone marks like H3K27me3, H3K36me3 are just too broad for MACS2 to effectively capture.

I have tried many (about 10) different peak callers, and I think SICER really stands out (in a good way) in how it performs. It seems to effectively capture both small and large gaps in signal, and unifies peaks where they need to be unified. So far it's by far the best broad peak caller I've tried.
May I ask a question: for H3K27me3 (human/mouse), how many reads (depth) do we need to get "good" results, according to your experiences?

Last edited by harryzs; 10-08-2013 at 11:13 AM.
harryzs is offline   Reply With Quote
Old 10-08-2013, 11:35 AM   #12
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

It really depends on the quality of the Chip-Seq experiment, i.e. signal-to-noise ratio. As a general rule, I think ENCODE recommends higher number of reads for "broad" marks (20M or so). This, however, would not save you at all if your library is bad and has a lot of noise. So I would say 10M aligned unique reads is the lowest you want to go.

As an example of an amazingly clean library I can give this sample: GSE38046 (GSM932947 - GSM932951) from laboratory of M. Busslinger. It has about 23M reads with pretty low duplicate rates (in Chip-Seq analysis, I always turn on filtering of identical reads; both MACS and SICER do it by default). In general, the quality of their Chip-Seqs is astounding, best I've ever seen. Those guys are surely doing something right

The same experiment done by C.Murre (GSM987809) also displays a pretty good signal-to-noise ratio and correlates with Busslinger lab Chip-Seq very well. That sample adds up to 16M aligned reads.
apredeus is offline   Reply With Quote
Old 10-08-2013, 12:10 PM   #13
harryzs
Member
 
Location: Germany

Join Date: Dec 2010
Posts: 29
Thumbs up

Quote:
Originally Posted by apredeus View Post
It really depends on the quality of the Chip-Seq experiment, i.e. signal-to-noise ratio. As a general rule, I think ENCODE recommends higher number of reads for "broad" marks (20M or so). This, however, would not save you at all if your library is bad and has a lot of noise. So I would say 10M aligned unique reads is the lowest you want to go.

As an example of an amazingly clean library I can give this sample: GSE38046 (GSM932947 - GSM932951) from laboratory of M. Busslinger. It has about 23M reads with pretty low duplicate rates (in Chip-Seq analysis, I always turn on filtering of identical reads; both MACS and SICER do it by default). In general, the quality of their Chip-Seqs is astounding, best I've ever seen. Those guys are surely doing something right

The same experiment done by C.Murre (GSM987809) also displays a pretty good signal-to-noise ratio and correlates with Busslinger lab Chip-Seq very well. That sample adds up to 16M aligned reads.
Great. Thank you very much for sharing.
harryzs is offline   Reply With Quote
Reply

Tags
biological replicates, broad peaks, chip seq, differential analysis, histone modifications, macs2

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:37 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO