SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
ChIP-seq: distance to TSS Jeannine General 14 04-13-2016 11:38 PM
ChIP-Seq: Processing and analyzing ChIP-seq data: from short reads to regulatory inte Newsbot! Literature Watch 0 09-24-2010 02:10 AM
ChIP-Seq: Genome-wide mapping of RNA Pol-II promoter usage in mouse tissues by ChIP-s Newsbot! Literature Watch 0 09-17-2010 02:30 AM
distance measure to compare peak set profiles in chip-seq datasets avilella Bioinformatics 0 03-18-2010 02:01 AM

Reply
 
Thread Tools
Old 08-28-2008, 07:29 AM   #1
seqfast
Member
 
Location: SF Bay Area

Join Date: Aug 2008
Posts: 16
Default ChIP-Seq reads correlated/distance to with TSS/promoter etc.

Hi all,

Interested in producing some of those plots that illustrate the read density at some distance from the transcription start site, or from other known regions/features.

I understand basic intersects and thing like this, but running all the read locations against said features is my main goal.

I see a site that can produce these for you (http://www.isrec.isb-sib.ch/chipseq/chip_cor.html), and this works for their resident data, but I can't get their ELAND2SGA tool to produce a file for me, and when trying to make my own SGA format things don't go well.

I realize that given a file with TSS sites or other features, you could write a script that would catalog the average read density of window X size at some distance and report this, I lack the requisite programming skills for this however. If there's a simple, free tool or something I am missing at UCSC or Galaxy that's great. I'm comfortable in Linux environments, just not pure programming.

Thanks!
seqfast is offline   Reply With Quote
Old 09-03-2008, 08:44 AM   #2
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Hi Seqfast,

There are TONS of tools out there for doing the first part of this: aligning the reads against the genome. Most of them are designed for ChIP-Seq: FindPeaks, Peakfinder, USeq, MACS.... etc etc etc.

The trick is then interpreting the data they return. We (the people at the BC Genome Sciences Centre) have developed lots of tools for this particular application, but it's not necessarily a straight forward interpretation - it really depends on the signal you're looking at. (eg. transcription factor vs histone modification, etc). I only work on the first part, processing the reads, but there are several people here working full time on interpreting results and writing software that perform the tasks you require. (I just don't think any of the tools have officially been released, though it's in the works, I believe.)

To get started, you might want to pick one of the tools out there for ChIP-Seq, and play with it for a while. It probably won't get you all the way to the results you require, but it probably will get you pretty far.

I'm the author of FindPeaks, so I'm a little biased towards it, but the others are all good too. (-:

Anthony
__________________
The more you know, the more you know you don't know. —Aristotle

Last edited by apfejes; 09-03-2008 at 08:46 AM. Reason: clarity
apfejes is offline   Reply With Quote
Old 09-05-2008, 07:40 AM   #3
seqfast
Member
 
Location: SF Bay Area

Join Date: Aug 2008
Posts: 16
Default

Thanks Anthony,

I have all the upstream portions and have used a lot of the peakfinders - I like yours quite a bit! BED and WIG tracks are fine, and the intersects can give some info I'm after. I have data for both histone variants and TF's, totally different applications indeed. I'll be on the lookout for ways to make some plots. Thanks and keep up the good work!

sf
seqfast is offline   Reply With Quote
Old 09-05-2008, 09:01 AM   #4
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Good to hear you've found a tool you like.... (-:

and good luck with the experiment!

Anthony
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Old 10-06-2008, 10:49 PM   #5
seqing
Junior Member
 
Location: los angeles

Join Date: Oct 2008
Posts: 3
Default

Hi Anthony
Do you know which ChIP-seq peak finder works well for widespread histone marks? I am trying MACS but am not getting satisfying results.
Thanks
HS
seqing is offline   Reply With Quote
Old 10-06-2008, 10:53 PM   #6
seqing
Junior Member
 
Location: los angeles

Join Date: Oct 2008
Posts: 3
Default

One thing that keeps me from trying FindPeaks is that it does not seem to integrate control data to find the peaks...

Last edited by seqing; 10-06-2008 at 10:56 PM.
seqing is offline   Reply With Quote
Old 10-06-2008, 10:54 PM   #7
seqing
Junior Member
 
Location: los angeles

Join Date: Oct 2008
Posts: 3
Default

it's tough choosing the right peak finder!

Last edited by seqing; 10-06-2008 at 10:59 PM.
seqing is offline   Reply With Quote
Old 10-07-2008, 06:34 AM   #8
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

QuEST does use a control lane, but I could not interpret it as well as I would like to..
http://mendel.stanford.edu/SidowLab/...est/index.html
bioinfosm is offline   Reply With Quote
Old 10-07-2008, 06:41 AM   #9
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

I saw a couple good presentations by this group, and others who used their tool:

http://woldlab.caltech.edu/html/chipseq_peak_finder
ECO is offline   Reply With Quote
Old 10-07-2008, 12:25 PM   #10
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

I figure I should respond to the points mentioned here as best as I can.

The "integrated control" feature is coming up soon for FindPeaks. However, I think that this has been WAY overblown. Integrating it into your peak finder itself is a relatively poor solution from many angles. I.e, some implementations require that you have identical numbers of reads in both your control and your sample - which is never a great precondition.

With any peak finder, you can get a list of peaks from your control and your sample - it's a simple matter of scripting to compare your peak list. The trick is then using this information wisely, which I'm not sure any of the peak finders currently do. I've been sketching out ideas for how to improve this for the past couple of days, and finally think I have a winning solution - I just need to find the time to do that, and still write up my thesis proposal. (-;

Anyhow, if you have feature requests like this for findpeaks, feel free to file a request or a bug report for it -- or better yet, write a patch. (-: I do read the bug reports, and try my best to reply to all FindPeaks related email.

For the question of which peak finder should be used for histones - the honest answer is that each peak finder has it's strong and weak points. I personally believe that the triangle weighted distribution in FindPeaks is a major advantage over the other peak finders, and that for this application, you'll absolutely require a sub-peak function. Both FindPeaks and MACS are probably your best bets. (The wold lab and SISSR versions doesn't do sub-peaks, if I recall correctly, but that may have changed.)

I believe I'm about 2 weeks away from tagging a FindPeaks 3.2 beta release, if all goes smoothly - and hopefully this will address the points above.
__________________
The more you know, the more you know you don't know. —Aristotle

Last edited by apfejes; 10-07-2008 at 12:31 PM. Reason: clarity
apfejes is offline   Reply With Quote
Old 10-07-2008, 12:53 PM   #11
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

SISSRs way of identifying peak locations makes it unnecessary to search for subpeaks since it does not cluster reads in peaks in the first place. But I agree that you have to use the control carefully - otherwise you may end up filtering away a large proportion of your true positives.

Seqing, did I understand you correctly that you are studying histone marks like k27me3 or k36 wher you would expect large regions to be enriched but with realatively few reads obtained per histone? Then I guess you would be better of trying a window-based scanning methid using large windows as opposed to identifying peaks from individual nucleosomes which is what findpeaks/SISSRs/MACS will do.
Chipper is offline   Reply With Quote
Old 10-07-2008, 01:05 PM   #12
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Hi Chipper

SISSR does do "subpeaks", in a sense, however it's based entirely on finding areas bracketed by reads facing opposite directions. From personal experience - we had implemented a version of this in FindPeaks at one point, it isn't a particularly reliable method, as peaks which appear in low-seqenceability regions will disappear completely (whether they're real or not is a different story), and small peaks don't always have reads in both directions even when they are real.

In any case, as for the windows, I can't think of a valid reason for using them - you'd lose resolution, and a large window would give you "blurrs" instead of positions for nucleosomes, where they're available. You'd be throwing away a lot of valuable information, while peak finders will still find the blurry regions as well just as well as a windowed method.
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Old 10-07-2008, 01:26 PM   #13
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Hi,

sorry if I was not clear enough. If your FindPeaks identifies subpeaks it will (or at least should) have opposing reads otherwise it is not going to form a subpeak so what SISSR will do is more or less the same. But it does not require a window (was it 20 bp?) to have reads from both strands, just that you go from + to -, it could be readless windows in between. If it is a good method or not is another story.

My interpretation of the Histone question was that he wanted to find regions that are enriched, not the histone positions. But that may be totally wrong. Anyway, if each histone gives only a few reads, or if the nucleosomes are not well positioned, there is really not much valuable information to throw away. If you for example take the avearge read density over the gene body it can still be significanltly enriched.
Chipper is offline   Reply With Quote
Old 10-07-2008, 03:50 PM   #14
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Hi Chipper,

Thanks for clarifying. I'm not sure we're talking about the same things, when it comes to sub-peaks. When you have a good model of the length distribution of the reads, you often see complex regions which don't necessarily switch from forward to reverse - but are made up of distinct "clusters" of reads. FindPeaks does this without worrying about the forward/reverse orientation of the reads by simply building in the model of the read lengths. Thus, the "peaks" themselves are probabilities of the number of fragments overlapping at a given point.

For both TF and histone, you will see clear enrichments at certain locations, using these models because the contribution of any given read is clearly directional, based upon the location and strand of the tag sequenced.

This is much easier to draw than to explain in text!

Anyhow, you could be right that seqing is looking only for area of enrichment, but a good peak finder should handle those areas just as well as those with clear TF-like enrichment.

Cheers!
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Reply

Tags
bioinformatics, chip-seq, chip-seq analysis, tss

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:37 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO