Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Peak Calling : Using Control of one experiment with other & data scaling.

    Hi,
    I was wondering how easily one can use the control from same type of experiment which was sequenced earlier with the one which is sequenced now.

    Control (Mock IP) was sequenced using Genome Analyzer IIx and the samples are sequenced using HiSeq-2000 from Illumina. The experiment was done with the same cells using the same protocol.

    Number of reads in Control is ~5.5 million
    Number of reads in Sample is ~45 milllion.


    Does this huge scaling up(10X) has a deep or light effect while calling peaks using MACS. Is there any limiting factor or threshold while peak calling which says the sample should not have more than 2 times the reads in control or there is better peak caller for this purpose.


    Thanks for your time.
    Sukhi

  • #2
    in README of MACS1.4

    --to-small When set, scale the larger dataset down to the smaller
    dataset, by default, the smaller dataset will be
    scaled towards the larger dataset. DEFAULT: False

    Comment


    • #3
      Hey harryzs,
      Thanks for the reply.

      Its there in docs, but which one would be better (I can do the test here with the result in difference of the no. of peaks called). If I were to think of it, scaling smaller to larger means addition of reads virtually which might cause biasing as opposed to scaling down (which might remove real binding sites)

      Again, is there an implication/threshold to the read density difference for such high (10X) differences or not.

      Cheers

      Comment


      • #4
        please see this

        "....
        By default, MACS will now scale the smaller dataset to the bigger
        dataset. For instance, if IP has 10 million reads, and Input has 5
        million, MACS will double the lambda value calculated from Input
        reads while calling BOTH the positive peaks and negative
        peaks. This will address the issue caused by unbalanced numbers of
        reads from IP and Input. If --to-small is turned on, MACS will
        scale the larger dataset to the smaller one. So from now on, if d
        is fixed, then the peaks from a MACS call for A vs B should be
        identical to the negative peaks from a B vs A. ...."


        Originally posted by vanbug View Post
        Hey harryzs,
        Thanks for the reply.

        Its there in docs, but which one would be better (I can do the test here with the result in difference of the no. of peaks called). If I were to think of it, scaling smaller to larger means addition of reads virtually which might cause biasing as opposed to scaling down (which might remove real binding sites)

        Again, is there an implication/threshold to the read density difference for such high (10X) differences or not.

        Cheers

        Comment


        • #5
          Thanks a lot,

          Answers the second half, for the first half (using control with sample from different Illumina machines), I think the answer would be the it depend on the signal-to-noise ratio which would be different for different machines, though the algorithms for base calling are more or less the same. There might also be differences in the biases though the protocol is same. So, though not the ideal case but one could use it.

          Thanks
          Sukhi

          Comment


          • #6
            I would recommend against using the old 'control'. It could work, but it could send you down some misleading paths. You are probably using a 35% formaldehyde stock with loses strength over time so your X-linking is probably not very reproducible. How reproducible is your fragmentation, unless you are using the Covaris there is probably significant variability. It certainly is not publishable, which makes it a pilot experiment. So as a pilot experiment, you should just barcode your samples and go with less reads for each. So instead of your 45 million reads for your ChIP sample you could have done 20 million reads for your ChIP and 20 million reads for your input control, which is the same cost but will produce a properly controlled data set and is still probably plenty of reads. Cutting corners like this works sometimes but overall you will end up wasting time and money. Sloppy science + poorly designed experiments = a waste of time and money.

            Also, it is much better to use input chromatin for your control and not non-specific IgG.
            --------------
            Ethan

            Comment


            • #7
              a) better than using no control is using a control from a different experiment performed at a different time (there is even literature discussing this, unfortunately i don't remember the refs)

              b) scaling and many other normalization procedures are based on rather simple assumptions all of which are most likely wrong (if you ever happen to sequence the same sample twice with different depths and manage to normalize them using simple scaling we can go on arguing about that)

              c) if your IP has a good signal to noise you are anyway on the safe side. if not, you have to consider (d)

              d) validate your results!

              e) use different peak callers to check the robustness of your peak calling (SISSER, cisgenome e.g.)

              Comment


              • #8
                Thanks Ethan and mudshark for your comments.
                We are using 37% Formaldehyde and using Covaris as well. We sequenced Input as well as mock in Multiplexing mode, but Input failed during the sequencing and mock is very bad {random peaks all over the place with more peaks in control than sample (for most of the samples)}.

                Using control from previous experiment done through same protocol and the one, sequenced newly thorough same protocol has lot variability in the terms of peaks. Ratio of +ve/-ve peaks went from 0.14 to 73.27. So, I was bit curious if we could use it or not, ideally not but may be in few cases. How we can check the signal-to-noise ratio for a sample, I think number of peaks or the ratio of +ve/-ve peaks might be a determinant.
                I will use different peak callers for the samples in question but the cross-comparisons might be a problem sometimes (eg. Macs returning 5000 peaks but SISSER 2000 etc.)

                Thanks a lot

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                46 views
                0 likes
                Last Post seqadmin  
                Working...
                X