Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ChIP-Seq reads correlated/distance to with TSS/promoter etc.

    Hi all,

    Interested in producing some of those plots that illustrate the read density at some distance from the transcription start site, or from other known regions/features.

    I understand basic intersects and thing like this, but running all the read locations against said features is my main goal.

    I see a site that can produce these for you (http://www.isrec.isb-sib.ch/chipseq/chip_cor.html), and this works for their resident data, but I can't get their ELAND2SGA tool to produce a file for me, and when trying to make my own SGA format things don't go well.

    I realize that given a file with TSS sites or other features, you could write a script that would catalog the average read density of window X size at some distance and report this, I lack the requisite programming skills for this however. If there's a simple, free tool or something I am missing at UCSC or Galaxy that's great. I'm comfortable in Linux environments, just not pure programming.

    Thanks!

  • #2
    Hi Seqfast,

    There are TONS of tools out there for doing the first part of this: aligning the reads against the genome. Most of them are designed for ChIP-Seq: FindPeaks, Peakfinder, USeq, MACS.... etc etc etc.

    The trick is then interpreting the data they return. We (the people at the BC Genome Sciences Centre) have developed lots of tools for this particular application, but it's not necessarily a straight forward interpretation - it really depends on the signal you're looking at. (eg. transcription factor vs histone modification, etc). I only work on the first part, processing the reads, but there are several people here working full time on interpreting results and writing software that perform the tasks you require. (I just don't think any of the tools have officially been released, though it's in the works, I believe.)

    To get started, you might want to pick one of the tools out there for ChIP-Seq, and play with it for a while. It probably won't get you all the way to the results you require, but it probably will get you pretty far.

    I'm the author of FindPeaks, so I'm a little biased towards it, but the others are all good too. (-:

    Anthony
    Last edited by apfejes; 09-03-2008, 08:46 AM. Reason: clarity
    The more you know, the more you know you don't know. —Aristotle

    Comment


    • #3
      Thanks Anthony,

      I have all the upstream portions and have used a lot of the peakfinders - I like yours quite a bit! BED and WIG tracks are fine, and the intersects can give some info I'm after. I have data for both histone variants and TF's, totally different applications indeed. I'll be on the lookout for ways to make some plots. Thanks and keep up the good work!

      sf

      Comment


      • #4
        Good to hear you've found a tool you like.... (-:

        and good luck with the experiment!

        Anthony
        The more you know, the more you know you don't know. —Aristotle

        Comment


        • #5
          Hi Anthony
          Do you know which ChIP-seq peak finder works well for widespread histone marks? I am trying MACS but am not getting satisfying results.
          Thanks
          HS

          Comment


          • #6
            One thing that keeps me from trying FindPeaks is that it does not seem to integrate control data to find the peaks...
            Last edited by seqing; 10-06-2008, 10:56 PM.

            Comment


            • #7
              it's tough choosing the right peak finder!
              Last edited by seqing; 10-06-2008, 10:59 PM.

              Comment


              • #8
                QuEST does use a control lane, but I could not interpret it as well as I would like to..
                --
                bioinfosm

                Comment


                • #9
                  I saw a couple good presentations by this group, and others who used their tool:

                  Comment


                  • #10
                    I figure I should respond to the points mentioned here as best as I can.

                    The "integrated control" feature is coming up soon for FindPeaks. However, I think that this has been WAY overblown. Integrating it into your peak finder itself is a relatively poor solution from many angles. I.e, some implementations require that you have identical numbers of reads in both your control and your sample - which is never a great precondition.

                    With any peak finder, you can get a list of peaks from your control and your sample - it's a simple matter of scripting to compare your peak list. The trick is then using this information wisely, which I'm not sure any of the peak finders currently do. I've been sketching out ideas for how to improve this for the past couple of days, and finally think I have a winning solution - I just need to find the time to do that, and still write up my thesis proposal. (-;

                    Anyhow, if you have feature requests like this for findpeaks, feel free to file a request or a bug report for it -- or better yet, write a patch. (-: I do read the bug reports, and try my best to reply to all FindPeaks related email.

                    For the question of which peak finder should be used for histones - the honest answer is that each peak finder has it's strong and weak points. I personally believe that the triangle weighted distribution in FindPeaks is a major advantage over the other peak finders, and that for this application, you'll absolutely require a sub-peak function. Both FindPeaks and MACS are probably your best bets. (The wold lab and SISSR versions doesn't do sub-peaks, if I recall correctly, but that may have changed.)

                    I believe I'm about 2 weeks away from tagging a FindPeaks 3.2 beta release, if all goes smoothly - and hopefully this will address the points above.
                    Last edited by apfejes; 10-07-2008, 12:31 PM. Reason: clarity
                    The more you know, the more you know you don't know. —Aristotle

                    Comment


                    • #11
                      SISSRs way of identifying peak locations makes it unnecessary to search for subpeaks since it does not cluster reads in peaks in the first place. But I agree that you have to use the control carefully - otherwise you may end up filtering away a large proportion of your true positives.

                      Seqing, did I understand you correctly that you are studying histone marks like k27me3 or k36 wher you would expect large regions to be enriched but with realatively few reads obtained per histone? Then I guess you would be better of trying a window-based scanning methid using large windows as opposed to identifying peaks from individual nucleosomes which is what findpeaks/SISSRs/MACS will do.

                      Comment


                      • #12
                        Hi Chipper

                        SISSR does do "subpeaks", in a sense, however it's based entirely on finding areas bracketed by reads facing opposite directions. From personal experience - we had implemented a version of this in FindPeaks at one point, it isn't a particularly reliable method, as peaks which appear in low-seqenceability regions will disappear completely (whether they're real or not is a different story), and small peaks don't always have reads in both directions even when they are real.

                        In any case, as for the windows, I can't think of a valid reason for using them - you'd lose resolution, and a large window would give you "blurrs" instead of positions for nucleosomes, where they're available. You'd be throwing away a lot of valuable information, while peak finders will still find the blurry regions as well just as well as a windowed method.
                        The more you know, the more you know you don't know. —Aristotle

                        Comment


                        • #13
                          Hi,

                          sorry if I was not clear enough. If your FindPeaks identifies subpeaks it will (or at least should) have opposing reads otherwise it is not going to form a subpeak so what SISSR will do is more or less the same. But it does not require a window (was it 20 bp?) to have reads from both strands, just that you go from + to -, it could be readless windows in between. If it is a good method or not is another story.

                          My interpretation of the Histone question was that he wanted to find regions that are enriched, not the histone positions. But that may be totally wrong. Anyway, if each histone gives only a few reads, or if the nucleosomes are not well positioned, there is really not much valuable information to throw away. If you for example take the avearge read density over the gene body it can still be significanltly enriched.

                          Comment


                          • #14
                            Hi Chipper,

                            Thanks for clarifying. I'm not sure we're talking about the same things, when it comes to sub-peaks. When you have a good model of the length distribution of the reads, you often see complex regions which don't necessarily switch from forward to reverse - but are made up of distinct "clusters" of reads. FindPeaks does this without worrying about the forward/reverse orientation of the reads by simply building in the model of the read lengths. Thus, the "peaks" themselves are probabilities of the number of fragments overlapping at a given point.

                            For both TF and histone, you will see clear enrichments at certain locations, using these models because the contribution of any given read is clearly directional, based upon the location and strand of the tag sequenced.

                            This is much easier to draw than to explain in text!

                            Anyhow, you could be right that seqing is looking only for area of enrichment, but a good peak finder should handle those areas just as well as those with clear TF-like enrichment.

                            Cheers!
                            The more you know, the more you know you don't know. —Aristotle

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            30 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            32 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            28 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            53 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X