SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
p-value for peak-calling using MACS ywlim Bioinformatics 8 08-08-2013 02:27 AM
RNA-seq replicates & experiment design sebastion RNA Sequencing 4 11-22-2011 10:45 AM
ChIP-seq peak calling from replicates ttnguyen Bioinformatics 4 08-10-2011 01:21 AM
PubMed: Control-free calling of copy number alterations in deep-sequencing data using Newsbot! Literature Watch 0 04-08-2011 01:10 AM
User friendly Peak Calling Giles Bioinformatics 0 02-02-2010 05:58 PM

Reply
 
Thread Tools
Old 02-08-2012, 04:37 AM   #1
vanbug
Member
 
Location: Dresden

Join Date: Aug 2011
Posts: 11
Exclamation Peak Calling : Using Control of one experiment with other & data scaling.

Hi,
I was wondering how easily one can use the control from same type of experiment which was sequenced earlier with the one which is sequenced now.

Control (Mock IP) was sequenced using Genome Analyzer IIx and the samples are sequenced using HiSeq-2000 from Illumina. The experiment was done with the same cells using the same protocol.

Number of reads in Control is ~5.5 million
Number of reads in Sample is ~45 milllion.


Does this huge scaling up(10X) has a deep or light effect while calling peaks using MACS. Is there any limiting factor or threshold while peak calling which says the sample should not have more than 2 times the reads in control or there is better peak caller for this purpose.


Thanks for your time.
Sukhi
vanbug is offline   Reply With Quote
Old 02-08-2012, 05:45 AM   #2
harryzs
Member
 
Location: Germany

Join Date: Dec 2010
Posts: 29
Default

in README of MACS1.4

--to-small When set, scale the larger dataset down to the smaller
dataset, by default, the smaller dataset will be
scaled towards the larger dataset. DEFAULT: False
harryzs is offline   Reply With Quote
Old 02-08-2012, 07:44 AM   #3
vanbug
Member
 
Location: Dresden

Join Date: Aug 2011
Posts: 11
Default

Hey harryzs,
Thanks for the reply.

Its there in docs, but which one would be better (I can do the test here with the result in difference of the no. of peaks called). If I were to think of it, scaling smaller to larger means addition of reads virtually which might cause biasing as opposed to scaling down (which might remove real binding sites)

Again, is there an implication/threshold to the read density difference for such high (10X) differences or not.

Cheers
vanbug is offline   Reply With Quote
Old 02-08-2012, 07:51 AM   #4
harryzs
Member
 
Location: Germany

Join Date: Dec 2010
Posts: 29
Default

please see this
https://groups.google.com/group/macs...65a20b720a437c
"....
By default, MACS will now scale the smaller dataset to the bigger
dataset. For instance, if IP has 10 million reads, and Input has 5
million, MACS will double the lambda value calculated from Input
reads while calling BOTH the positive peaks and negative
peaks. This will address the issue caused by unbalanced numbers of
reads from IP and Input. If --to-small is turned on, MACS will
scale the larger dataset to the smaller one. So from now on, if d
is fixed, then the peaks from a MACS call for A vs B should be
identical to the negative peaks from a B vs A. ...."


Quote:
Originally Posted by vanbug View Post
Hey harryzs,
Thanks for the reply.

Its there in docs, but which one would be better (I can do the test here with the result in difference of the no. of peaks called). If I were to think of it, scaling smaller to larger means addition of reads virtually which might cause biasing as opposed to scaling down (which might remove real binding sites)

Again, is there an implication/threshold to the read density difference for such high (10X) differences or not.

Cheers
harryzs is offline   Reply With Quote
Old 02-08-2012, 07:59 AM   #5
vanbug
Member
 
Location: Dresden

Join Date: Aug 2011
Posts: 11
Default

Thanks a lot,

Answers the second half, for the first half (using control with sample from different Illumina machines), I think the answer would be the it depend on the signal-to-noise ratio which would be different for different machines, though the algorithms for base calling are more or less the same. There might also be differences in the biases though the protocol is same. So, though not the ideal case but one could use it.

Thanks
Sukhi
vanbug is offline   Reply With Quote
Old 02-08-2012, 09:02 AM   #6
ETHANol
Senior Member
 
Location: Western Australia

Join Date: Feb 2010
Posts: 308
Default

I would recommend against using the old 'control'. It could work, but it could send you down some misleading paths. You are probably using a 35% formaldehyde stock with loses strength over time so your X-linking is probably not very reproducible. How reproducible is your fragmentation, unless you are using the Covaris there is probably significant variability. It certainly is not publishable, which makes it a pilot experiment. So as a pilot experiment, you should just barcode your samples and go with less reads for each. So instead of your 45 million reads for your ChIP sample you could have done 20 million reads for your ChIP and 20 million reads for your input control, which is the same cost but will produce a properly controlled data set and is still probably plenty of reads. Cutting corners like this works sometimes but overall you will end up wasting time and money. Sloppy science + poorly designed experiments = a waste of time and money.

Also, it is much better to use input chromatin for your control and not non-specific IgG.
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 02-08-2012, 11:30 PM   #7
mudshark
Senior Member
 
Location: Munich

Join Date: Jan 2009
Posts: 138
Default

a) better than using no control is using a control from a different experiment performed at a different time (there is even literature discussing this, unfortunately i don't remember the refs)

b) scaling and many other normalization procedures are based on rather simple assumptions all of which are most likely wrong (if you ever happen to sequence the same sample twice with different depths and manage to normalize them using simple scaling we can go on arguing about that)

c) if your IP has a good signal to noise you are anyway on the safe side. if not, you have to consider (d)

d) validate your results!

e) use different peak callers to check the robustness of your peak calling (SISSER, cisgenome e.g.)
mudshark is offline   Reply With Quote
Old 02-09-2012, 12:47 AM   #8
vanbug
Member
 
Location: Dresden

Join Date: Aug 2011
Posts: 11
Default

Thanks Ethan and mudshark for your comments.
We are using 37% Formaldehyde and using Covaris as well. We sequenced Input as well as mock in Multiplexing mode, but Input failed during the sequencing and mock is very bad {random peaks all over the place with more peaks in control than sample (for most of the samples)}.

Using control from previous experiment done through same protocol and the one, sequenced newly thorough same protocol has lot variability in the terms of peaks. Ratio of +ve/-ve peaks went from 0.14 to 73.27. So, I was bit curious if we could use it or not, ideally not but may be in few cases. How we can check the signal-to-noise ratio for a sample, I think number of peaks or the ratio of +ve/-ve peaks might be a determinant.
I will use different peak callers for the samples in question but the cross-comparisons might be a problem sometimes (eg. Macs returning 5000 peaks but SISSER 2000 etc.)

Thanks a lot
vanbug is offline   Reply With Quote
Reply

Tags
chip-seq, macs, mock, peakcalling

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:17 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO