SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Assertion failed error in BFAST localalign seeker Bioinformatics 7 09-02-2011 10:33 PM
ChIP-Seq: Using MACS to Identify Peaks from ChIP-Seq Data. Newsbot! Literature Watch 0 06-03-2011 03:00 AM
MACS-ChIP seq-Error repinementer Bioinformatics 5 02-09-2010 01:44 AM
Samtools Pileup Assertion Error AnamikaDarwin Bioinformatics 2 06-29-2009 01:44 PM
Method paper: Model-based Analysis of ChIP-Seq (MACS) ECO Literature Watch 0 09-27-2008 11:52 AM

Reply
 
Thread Tools
Old 04-12-2012, 10:59 AM   #1
rikkomba
Junior Member
 
Location: andorra

Join Date: Oct 2009
Posts: 4
Default [ChIP-seq] MACS 1.4: assertion error

HI,

I just started working with ChIP-seq and I am using MACS to predict binding sites for a fungal genome (12MB).
As mentioned here, I am getting the same problem as some previous users: I get an assertion error when calculating negative peaks, and I only managed to solve the problem by reducing the amount of data to 75%, i.e. if the amount of reads goes to 80% or more of the original amount, I will get the error again.

I have a control file with 31M 36bp single reads (4.1GB) and a sample file
with 28M 36bp single reads (3.6GB).
I tried it both a laptop (4GB RAM) and a server (16GB RAM), in
both cases it was using 1.6GB of memory and was behaving the same way. I also tried it on Cistrome, idem.
Changing the mfold parameter didn't help.
With 75% of the data I did get sensible results, but I am not sure how can I move on discarding 25% of a dataset... it just adds another layer of complexity to the analysis.

Does anybody have an idea of the cause of the problem, and of the reason
why reducing the amount of data works?
Also, if anybody knows any alternative, valid tool, feel free to suggest
Thanks!
rikkomba is offline   Reply With Quote
Old 04-13-2012, 12:39 PM   #2
taoliu
Junior Member
 
Location: Boston MA

Join Date: Sep 2009
Posts: 3
Default

I just answered this question in MACS user group, however, since you asked in seqanswer, I re-post it here.


This error normally happens when you have too many reads in a very small genome. In your case, you use a whole GA2 lane to sequence a single factor in a genome like E coli. Then due to the extremely high coverage, this overflow error occurs since my function doesn't expect a poisson rate higher than 740...

In practice, you'd better consider using multiplex to fully use a single lane to sequence multiple factors or a single factor in multiple conditions/time points. 30million reads for a single experiment on a 4million genome is a big waste -- you can even assemble the genome for this species now...

Anyway, since you have already got your 30millions reads, what you can do ( instead of waiting me to fix it (: ) is to subsample your sequencing reads. My impression for human chip-seq, if you want to reach saturation for peak detection, you need about 300 million reads ( from our unpublished Nat Method paper ) which is equivalent to 0.5million reads in E coli . You can use "samtools view -s" to subsample a portion of your BAM file.
__________________
梦蝶
taoliu is offline   Reply With Quote
Old 04-13-2012, 01:11 PM   #3
rikkomba
Junior Member
 
Location: andorra

Join Date: Oct 2009
Posts: 4
Default

Thanks for the explanation! I ll try with downsampling, then.
rikkomba is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:12 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO