Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [ChIP-seq] MACS 1.4: assertion error

    HI,

    I just started working with ChIP-seq and I am using MACS to predict binding sites for a fungal genome (12MB).
    As mentioned here, I am getting the same problem as some previous users: I get an assertion error when calculating negative peaks, and I only managed to solve the problem by reducing the amount of data to 75%, i.e. if the amount of reads goes to 80% or more of the original amount, I will get the error again.

    I have a control file with 31M 36bp single reads (4.1GB) and a sample file
    with 28M 36bp single reads (3.6GB).
    I tried it both a laptop (4GB RAM) and a server (16GB RAM), in
    both cases it was using 1.6GB of memory and was behaving the same way. I also tried it on Cistrome, idem.
    Changing the mfold parameter didn't help.
    With 75% of the data I did get sensible results, but I am not sure how can I move on discarding 25% of a dataset... it just adds another layer of complexity to the analysis.

    Does anybody have an idea of the cause of the problem, and of the reason
    why reducing the amount of data works?
    Also, if anybody knows any alternative, valid tool, feel free to suggest
    Thanks!

  • #2
    I just answered this question in MACS user group, however, since you asked in seqanswer, I re-post it here.


    This error normally happens when you have too many reads in a very small genome. In your case, you use a whole GA2 lane to sequence a single factor in a genome like E coli. Then due to the extremely high coverage, this overflow error occurs since my function doesn't expect a poisson rate higher than 740...

    In practice, you'd better consider using multiplex to fully use a single lane to sequence multiple factors or a single factor in multiple conditions/time points. 30million reads for a single experiment on a 4million genome is a big waste -- you can even assemble the genome for this species now...

    Anyway, since you have already got your 30millions reads, what you can do ( instead of waiting me to fix it (: ) is to subsample your sequencing reads. My impression for human chip-seq, if you want to reach saturation for peak detection, you need about 300 million reads ( from our unpublished Nat Method paper ) which is equivalent to 0.5million reads in E coli . You can use "samtools view -s" to subsample a portion of your BAM file.
    梦蝶

    Comment


    • #3
      Thanks for the explanation! I ll try with downsampling, then.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      50 views
      0 likes
      Last Post seqadmin  
      Working...
      X