Seqanswers Leaderboard Ad

**harryzs** · 02-08-2012, 06:45 AM

in README of MACS1.4

--to-small When set, scale the larger dataset down to the smaller
dataset, by default, the smaller dataset will be
scaled towards the larger dataset. DEFAULT: False

**vanbug** · 02-08-2012, 08:44 AM

Hey harryzs,
Thanks for the reply.

Its there in docs, but which one would be better (I can do the test here with the result in difference of the no. of peaks called). If I were to think of it, scaling smaller to larger means addition of reads virtually which might cause biasing as opposed to scaling down (which might remove real binding sites)

Again, is there an implication/threshold to the read density difference for such high (10X) differences or not.

Cheers

**harryzs** · 02-08-2012, 08:51 AM

please see this

https://groups.google.com/group/macs-announcement/browse_thread/thread/9e401eeced7ba235/fd65a20b720a437c?lnk=gst&q=to-small#fd65a20b720a437c

"....
By default, MACS will now scale the smaller dataset to the bigger
dataset. For instance, if IP has 10 million reads, and Input has 5
million, MACS will double the lambda value calculated from Input
reads while calling BOTH the positive peaks and negative
peaks. This will address the issue caused by unbalanced numbers of
reads from IP and Input. If --to-small is turned on, MACS will
scale the larger dataset to the smaller one. So from now on, if d
is fixed, then the peaks from a MACS call for A vs B should be
identical to the negative peaks from a B vs A. ...."

Originally posted by vanbug View Post

Hey harryzs,
Thanks for the reply.

Its there in docs, but which one would be better (I can do the test here with the result in difference of the no. of peaks called). If I were to think of it, scaling smaller to larger means addition of reads virtually which might cause biasing as opposed to scaling down (which might remove real binding sites)

Again, is there an implication/threshold to the read density difference for such high (10X) differences or not.

Cheers

**vanbug** · 02-08-2012, 08:59 AM

Thanks a lot,

Answers the second half, for the first half (using control with sample from different Illumina machines), I think the answer would be the it depend on the signal-to-noise ratio which would be different for different machines, though the algorithms for base calling are more or less the same. There might also be differences in the biases though the protocol is same. So, though not the ideal case but one could use it.

Thanks
Sukhi

**ETHANol** · 02-08-2012, 10:02 AM

I would recommend against using the old 'control'. It could work, but it could send you down some misleading paths. You are probably using a 35% formaldehyde stock with loses strength over time so your X-linking is probably not very reproducible. How reproducible is your fragmentation, unless you are using the Covaris there is probably significant variability. It certainly is not publishable, which makes it a pilot experiment. So as a pilot experiment, you should just barcode your samples and go with less reads for each. So instead of your 45 million reads for your ChIP sample you could have done 20 million reads for your ChIP and 20 million reads for your input control, which is the same cost but will produce a properly controlled data set and is still probably plenty of reads. Cutting corners like this works sometimes but overall you will end up wasting time and money. Sloppy science + poorly designed experiments = a waste of time and money.

Also, it is much better to use input chromatin for your control and not non-specific IgG.

**mudshark** · 02-09-2012, 12:30 AM

a) better than using no control is using a control from a different experiment performed at a different time (there is even literature discussing this, unfortunately i don't remember the refs)

b) scaling and many other normalization procedures are based on rather simple assumptions all of which are most likely wrong (if you ever happen to sequence the same sample twice with different depths and manage to normalize them using simple scaling we can go on arguing about that)

c) if your IP has a good signal to noise you are anyway on the safe side. if not, you have to consider (d)

d) validate your results!

e) use different peak callers to check the robustness of your peak calling (SISSER, cisgenome e.g.)

**vanbug** · 02-09-2012, 01:47 AM

Thanks Ethan and mudshark for your comments.
We are using 37% Formaldehyde and using Covaris as well. We sequenced Input as well as mock in Multiplexing mode, but Input failed during the sequencing and mock is very bad {random peaks all over the place with more peaks in control than sample (for most of the samples)}.

Using control from previous experiment done through same protocol and the one, sequenced newly thorough same protocol has lot variability in the terms of peaks. Ratio of +ve/-ve peaks went from 0.14 to 73.27. So, I was bit curious if we could use it or not, ideally not but may be in few cases. How we can check the signal-to-noise ratio for a sample, I think number of peaks or the ratio of +ve/-ve peaks might be a determinant.
I will use different peak callers for the samples in question but the cross-comparisons might be a problem sometimes (eg. Macs returning 5000 peaks but SISSER 2000 etc.)

Thanks a lot

Topics	Statistics	Last Post
Bacterial Timeline Study Suggests Oxygen Use Preceded Photosynthesis by seqadmin Started by seqadmin, Today, 12:59 PM	0 responses 7 views 0 reactions	Last Post by seqadmin Today, 12:59 PM
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 8 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 60 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM

Seqanswers Leaderboard Ad

Peak Calling : Using Control of one experiment with other & data scaling.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News