![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Assertion failed error in BFAST localalign | seeker | Bioinformatics | 7 | 09-02-2011 10:33 PM |
ChIP-Seq: Using MACS to Identify Peaks from ChIP-Seq Data. | Newsbot! | Literature Watch | 0 | 06-03-2011 03:00 AM |
MACS-ChIP seq-Error | repinementer | Bioinformatics | 5 | 02-09-2010 01:44 AM |
Samtools Pileup Assertion Error | AnamikaDarwin | Bioinformatics | 2 | 06-29-2009 01:44 PM |
Method paper: Model-based Analysis of ChIP-Seq (MACS) | ECO | Literature Watch | 0 | 09-27-2008 11:52 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: andorra Join Date: Oct 2009
Posts: 4
|
![]()
HI,
I just started working with ChIP-seq and I am using MACS to predict binding sites for a fungal genome (12MB). As mentioned here, I am getting the same problem as some previous users: I get an assertion error when calculating negative peaks, and I only managed to solve the problem by reducing the amount of data to 75%, i.e. if the amount of reads goes to 80% or more of the original amount, I will get the error again. I have a control file with 31M 36bp single reads (4.1GB) and a sample file with 28M 36bp single reads (3.6GB). I tried it both a laptop (4GB RAM) and a server (16GB RAM), in both cases it was using 1.6GB of memory and was behaving the same way. I also tried it on Cistrome, idem. Changing the mfold parameter didn't help. With 75% of the data I did get sensible results, but I am not sure how can I move on discarding 25% of a dataset... it just adds another layer of complexity to the analysis. Does anybody have an idea of the cause of the problem, and of the reason why reducing the amount of data works? Also, if anybody knows any alternative, valid tool, feel free to suggest ![]() Thanks! |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: Boston MA Join Date: Sep 2009
Posts: 3
|
![]()
I just answered this question in MACS user group, however, since you asked in seqanswer, I re-post it here.
This error normally happens when you have too many reads in a very small genome. In your case, you use a whole GA2 lane to sequence a single factor in a genome like E coli. Then due to the extremely high coverage, this overflow error occurs since my function doesn't expect a poisson rate higher than 740... In practice, you'd better consider using multiplex to fully use a single lane to sequence multiple factors or a single factor in multiple conditions/time points. 30million reads for a single experiment on a 4million genome is a big waste -- you can even assemble the genome for this species now... Anyway, since you have already got your 30millions reads, what you can do ( instead of waiting me to fix it (: ) is to subsample your sequencing reads. My impression for human chip-seq, if you want to reach saturation for peak detection, you need about 300 million reads ( from our unpublished Nat Method paper ) which is equivalent to 0.5million reads in E coli . You can use "samtools view -s" to subsample a portion of your BAM file.
__________________
梦蝶 |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: andorra Join Date: Oct 2009
Posts: 4
|
![]()
Thanks for the explanation! I ll try with downsampling, then.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|