Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Peak Calling : Using Control of one experiment with other & data scaling.

    Hi,
    I was wondering how easily one can use the control from same type of experiment which was sequenced earlier with the one which is sequenced now.

    Control (Mock IP) was sequenced using Genome Analyzer IIx and the samples are sequenced using HiSeq-2000 from Illumina. The experiment was done with the same cells using the same protocol.

    Number of reads in Control is ~5.5 million
    Number of reads in Sample is ~45 milllion.


    Does this huge scaling up(10X) has a deep or light effect while calling peaks using MACS. Is there any limiting factor or threshold while peak calling which says the sample should not have more than 2 times the reads in control or there is better peak caller for this purpose.


    Thanks for your time.
    Sukhi

  • #2
    in README of MACS1.4

    --to-small When set, scale the larger dataset down to the smaller
    dataset, by default, the smaller dataset will be
    scaled towards the larger dataset. DEFAULT: False

    Comment


    • #3
      Hey harryzs,
      Thanks for the reply.

      Its there in docs, but which one would be better (I can do the test here with the result in difference of the no. of peaks called). If I were to think of it, scaling smaller to larger means addition of reads virtually which might cause biasing as opposed to scaling down (which might remove real binding sites)

      Again, is there an implication/threshold to the read density difference for such high (10X) differences or not.

      Cheers

      Comment


      • #4
        please see this

        "....
        By default, MACS will now scale the smaller dataset to the bigger
        dataset. For instance, if IP has 10 million reads, and Input has 5
        million, MACS will double the lambda value calculated from Input
        reads while calling BOTH the positive peaks and negative
        peaks. This will address the issue caused by unbalanced numbers of
        reads from IP and Input. If --to-small is turned on, MACS will
        scale the larger dataset to the smaller one. So from now on, if d
        is fixed, then the peaks from a MACS call for A vs B should be
        identical to the negative peaks from a B vs A. ...."


        Originally posted by vanbug View Post
        Hey harryzs,
        Thanks for the reply.

        Its there in docs, but which one would be better (I can do the test here with the result in difference of the no. of peaks called). If I were to think of it, scaling smaller to larger means addition of reads virtually which might cause biasing as opposed to scaling down (which might remove real binding sites)

        Again, is there an implication/threshold to the read density difference for such high (10X) differences or not.

        Cheers

        Comment


        • #5
          Thanks a lot,

          Answers the second half, for the first half (using control with sample from different Illumina machines), I think the answer would be the it depend on the signal-to-noise ratio which would be different for different machines, though the algorithms for base calling are more or less the same. There might also be differences in the biases though the protocol is same. So, though not the ideal case but one could use it.

          Thanks
          Sukhi

          Comment


          • #6
            I would recommend against using the old 'control'. It could work, but it could send you down some misleading paths. You are probably using a 35% formaldehyde stock with loses strength over time so your X-linking is probably not very reproducible. How reproducible is your fragmentation, unless you are using the Covaris there is probably significant variability. It certainly is not publishable, which makes it a pilot experiment. So as a pilot experiment, you should just barcode your samples and go with less reads for each. So instead of your 45 million reads for your ChIP sample you could have done 20 million reads for your ChIP and 20 million reads for your input control, which is the same cost but will produce a properly controlled data set and is still probably plenty of reads. Cutting corners like this works sometimes but overall you will end up wasting time and money. Sloppy science + poorly designed experiments = a waste of time and money.

            Also, it is much better to use input chromatin for your control and not non-specific IgG.
            --------------
            Ethan

            Comment


            • #7
              a) better than using no control is using a control from a different experiment performed at a different time (there is even literature discussing this, unfortunately i don't remember the refs)

              b) scaling and many other normalization procedures are based on rather simple assumptions all of which are most likely wrong (if you ever happen to sequence the same sample twice with different depths and manage to normalize them using simple scaling we can go on arguing about that)

              c) if your IP has a good signal to noise you are anyway on the safe side. if not, you have to consider (d)

              d) validate your results!

              e) use different peak callers to check the robustness of your peak calling (SISSER, cisgenome e.g.)

              Comment


              • #8
                Thanks Ethan and mudshark for your comments.
                We are using 37% Formaldehyde and using Covaris as well. We sequenced Input as well as mock in Multiplexing mode, but Input failed during the sequencing and mock is very bad {random peaks all over the place with more peaks in control than sample (for most of the samples)}.

                Using control from previous experiment done through same protocol and the one, sequenced newly thorough same protocol has lot variability in the terms of peaks. Ratio of +ve/-ve peaks went from 0.14 to 73.27. So, I was bit curious if we could use it or not, ideally not but may be in few cases. How we can check the signal-to-noise ratio for a sample, I think number of peaks or the ratio of +ve/-ve peaks might be a determinant.
                I will use different peak callers for the samples in question but the cross-comparisons might be a problem sometimes (eg. Macs returning 5000 peaks but SISSER 2000 etc.)

                Thanks a lot

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Investigating the Gut Microbiome Through Diet and Spatial Biology
                  by seqadmin




                  The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                  02-24-2025, 06:31 AM
                • seqadmin
                  Quality Control Essentials for Next-Generation Sequencing Workflows
                  by seqadmin




                  Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

                  Nucleic Acid Quality Control
                  Preparing for NGS starts with isolating the...
                  02-10-2025, 01:58 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-03-2025, 01:15 PM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-28-2025, 12:58 PM
                0 responses
                124 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-24-2025, 02:48 PM
                0 responses
                485 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-21-2025, 02:46 PM
                0 responses
                241 views
                0 likes
                Last Post seqadmin  
                Working...
                X