Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • vanbug
    Member
    • Aug 2011
    • 11

    Peak Calling : Using Control of one experiment with other & data scaling.

    Hi,
    I was wondering how easily one can use the control from same type of experiment which was sequenced earlier with the one which is sequenced now.

    Control (Mock IP) was sequenced using Genome Analyzer IIx and the samples are sequenced using HiSeq-2000 from Illumina. The experiment was done with the same cells using the same protocol.

    Number of reads in Control is ~5.5 million
    Number of reads in Sample is ~45 milllion.


    Does this huge scaling up(10X) has a deep or light effect while calling peaks using MACS. Is there any limiting factor or threshold while peak calling which says the sample should not have more than 2 times the reads in control or there is better peak caller for this purpose.


    Thanks for your time.
    Sukhi
  • harryzs
    Member
    • Dec 2010
    • 30

    #2
    in README of MACS1.4

    --to-small When set, scale the larger dataset down to the smaller
    dataset, by default, the smaller dataset will be
    scaled towards the larger dataset. DEFAULT: False

    Comment

    • vanbug
      Member
      • Aug 2011
      • 11

      #3
      Hey harryzs,
      Thanks for the reply.

      Its there in docs, but which one would be better (I can do the test here with the result in difference of the no. of peaks called). If I were to think of it, scaling smaller to larger means addition of reads virtually which might cause biasing as opposed to scaling down (which might remove real binding sites)

      Again, is there an implication/threshold to the read density difference for such high (10X) differences or not.

      Cheers

      Comment

      • harryzs
        Member
        • Dec 2010
        • 30

        #4
        please see this

        "....
        By default, MACS will now scale the smaller dataset to the bigger
        dataset. For instance, if IP has 10 million reads, and Input has 5
        million, MACS will double the lambda value calculated from Input
        reads while calling BOTH the positive peaks and negative
        peaks. This will address the issue caused by unbalanced numbers of
        reads from IP and Input. If --to-small is turned on, MACS will
        scale the larger dataset to the smaller one. So from now on, if d
        is fixed, then the peaks from a MACS call for A vs B should be
        identical to the negative peaks from a B vs A. ...."


        Originally posted by vanbug View Post
        Hey harryzs,
        Thanks for the reply.

        Its there in docs, but which one would be better (I can do the test here with the result in difference of the no. of peaks called). If I were to think of it, scaling smaller to larger means addition of reads virtually which might cause biasing as opposed to scaling down (which might remove real binding sites)

        Again, is there an implication/threshold to the read density difference for such high (10X) differences or not.

        Cheers

        Comment

        • vanbug
          Member
          • Aug 2011
          • 11

          #5
          Thanks a lot,

          Answers the second half, for the first half (using control with sample from different Illumina machines), I think the answer would be the it depend on the signal-to-noise ratio which would be different for different machines, though the algorithms for base calling are more or less the same. There might also be differences in the biases though the protocol is same. So, though not the ideal case but one could use it.

          Thanks
          Sukhi

          Comment

          • ETHANol
            Senior Member
            • Feb 2010
            • 308

            #6
            I would recommend against using the old 'control'. It could work, but it could send you down some misleading paths. You are probably using a 35% formaldehyde stock with loses strength over time so your X-linking is probably not very reproducible. How reproducible is your fragmentation, unless you are using the Covaris there is probably significant variability. It certainly is not publishable, which makes it a pilot experiment. So as a pilot experiment, you should just barcode your samples and go with less reads for each. So instead of your 45 million reads for your ChIP sample you could have done 20 million reads for your ChIP and 20 million reads for your input control, which is the same cost but will produce a properly controlled data set and is still probably plenty of reads. Cutting corners like this works sometimes but overall you will end up wasting time and money. Sloppy science + poorly designed experiments = a waste of time and money.

            Also, it is much better to use input chromatin for your control and not non-specific IgG.
            --------------
            Ethan

            Comment

            • mudshark
              Senior Member
              • Jan 2009
              • 138

              #7
              a) better than using no control is using a control from a different experiment performed at a different time (there is even literature discussing this, unfortunately i don't remember the refs)

              b) scaling and many other normalization procedures are based on rather simple assumptions all of which are most likely wrong (if you ever happen to sequence the same sample twice with different depths and manage to normalize them using simple scaling we can go on arguing about that)

              c) if your IP has a good signal to noise you are anyway on the safe side. if not, you have to consider (d)

              d) validate your results!

              e) use different peak callers to check the robustness of your peak calling (SISSER, cisgenome e.g.)

              Comment

              • vanbug
                Member
                • Aug 2011
                • 11

                #8
                Thanks Ethan and mudshark for your comments.
                We are using 37% Formaldehyde and using Covaris as well. We sequenced Input as well as mock in Multiplexing mode, but Input failed during the sequencing and mock is very bad {random peaks all over the place with more peaks in control than sample (for most of the samples)}.

                Using control from previous experiment done through same protocol and the one, sequenced newly thorough same protocol has lot variability in the terms of peaks. Ratio of +ve/-ve peaks went from 0.14 to 73.27. So, I was bit curious if we could use it or not, ideally not but may be in few cases. How we can check the signal-to-noise ratio for a sample, I think number of peaks or the ratio of +ve/-ve peaks might be a determinant.
                I will use different peak callers for the samples in question but the cross-comparisons might be a problem sometimes (eg. Macs returning 5000 peaks but SISSER 2000 etc.)

                Thanks a lot

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Pathogen Surveillance with Advanced Genomic Tools
                  by seqadmin




                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                  03-24-2025, 11:48 AM
                • seqadmin
                  New Genomics Tools and Methods Shared at AGBT 2025
                  by seqadmin


                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                  The Headliner
                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                  03-03-2025, 01:39 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-20-2025, 05:03 AM
                0 responses
                49 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-19-2025, 07:27 AM
                0 responses
                57 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-18-2025, 12:50 PM
                0 responses
                50 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-03-2025, 01:15 PM
                0 responses
                200 views
                0 reactions
                Last Post seqadmin  
                Working...