SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
No peaks in my ChIP Seq samples Chloe Sample Prep / Library Generation 12 01-26-2017 12:16 AM
ChIP-Seq: False positive peaks in ChIP-seq and other sequencing-based functional assa Newsbot! Literature Watch 0 06-29-2011 02:10 PM
ChIP-Seq: Using MACS to Identify Peaks from ChIP-Seq Data. Newsbot! Literature Watch 0 06-03-2011 03:00 AM
chip seq peaks on whole genome honey Bioinformatics 4 09-29-2010 02:05 PM
Using Galaxy to Call Peaks? Giles Bioinformatics 4 02-05-2010 11:43 AM

Reply
 
Thread Tools
Old 08-27-2010, 02:33 AM   #1
sonja
Member
 
Location: spain

Join Date: Aug 2010
Posts: 10
Default ERANGE to call peaks from ChIP-Seq data

Hello!
I try to call peaks from a bigger data set (~6 million reads) with erange.
Running it with the default parameters works but only results into ~7000 peaks. With my factor I expect around 20000 binding sites and I try to get 40000 to analyze in a less restrictive way.
However I fail generating that many peaks with erange. Either I get not enough peaks or the server gets frozen! The latter happened with the following command:


python ERANGE3.2/commoncode/findall.py er chip.rds ErangePeaks_min1_minPeak0.000001_spacing200_fold 1_shift -control mock.rds ‐minimum 1 -ratio 2 ‐minPeak 0.000001 ‐revbackground -shift 80 -spacing 200

I set -minPeak very low to get a lot of peaks...

Can anyone help???
sonja is offline   Reply With Quote
Old 08-30-2010, 03:12 AM   #2
GKM
Member
 
Location: Pasadena, CA

Join Date: May 2009
Posts: 45
Default

If letting it learn the shift value instead of specifying it to be 80 doesn't help, then 7000 is probably what you have in that library.

Where does the number 20000 come from? Another peak caller or simply expectations?
GKM is offline   Reply With Quote
Old 08-31-2010, 07:48 AM   #3
sonja
Member
 
Location: spain

Join Date: Aug 2010
Posts: 10
Default

Thanks for replying!
I tried what you said and I still do not get more peaks.
Also it doesnt matter if I put the parameter ‐minPeak to 0.00001 or to 0.01

The 20,000 is what I get with the default options of other peak callers. And in total, this data set contains more than 100,000 clusters (this is what I can reach if I am very little restrictive).

Does anyone have experience with this? Is there anything I could try?
Thank you!
sonja is offline   Reply With Quote
Old 08-31-2010, 09:47 AM   #4
GKM
Member
 
Location: Pasadena, CA

Join Date: May 2009
Posts: 45
Default

In general, the settings you are using are very liberal, I personally wouldn't feel comfortable working with such peak calls - in my view it's better to have a few false negatives than a lot of false positives when looking at ChIP-Seq data and "the more peaks the better" approach isn't exactly the best approach to understanding the biology of the protein.

That said, if you don't have any high confidence prior expectation on the number of calls you should get (say you're working with something that has never been ChIP-ed before and not with NRSF or CTCF) a few things would be helpful to know:

- How many of the peaks contain an explanatory motif (although this wouldn't help if you're not dealing with a sequence-specific transcription factor, of course). How many of the peaks you are getting with a not so conservative peak caller contain it?

- what is the size of the genome you're working with?

- what is your IP efficiency as calculated by ERANGE? (second to last line in the output file)

- How many peaks do you get with more conservative settings? (-minimum 2, -ratio 3, -minimum 4 -ratio 4, etc.)
GKM is offline   Reply With Quote
Old 09-01-2010, 02:24 AM   #5
sonja
Member
 
Location: spain

Join Date: Aug 2010
Posts: 10
Default

The reason why I need so many peaks is because I want to compare the top 500, 1000, ... , 40000 peaks of different methods in terms of %motif (fraction of peaks with motif).

Basically I am interested in the scoring system of different methods. If there is no possibility to increase the number of peaks I have to use another method for my comparisons..

I am working on hg18.

The IP efficiency:
#stats: 77936.5 RPM in 7313 regions

I didnt try to use more conservative settings because I want to be as little restrictive as possible. As I said I want to explore the scoring system.

BTW, may I ask who you are ? Did you help developing the program or are you a user?
thanks!
sonja is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:27 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO