Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Visualizing ChIP-seq peaks

    I'm interested in visualizing some ChIP-seq peaks for specific loci. I've got an image that plots the read depth at each position throughout an interval, and I'm fairly satisfied with it. But is this the right idea when plotting peak data? As I understand, there's a forward strand peak and a reverse strand peak. Should I worry about trying to merge the two peaks when visualizing, or would an image like mine be an acceptable way to visualize peaks?
    Attached Files

  • #2
    Coverage isn't really meaningful for ChIP-seq since you're counting reads, not bases. It would be clearer to count each read as a hit once at its 5'-most position, strand-specifically. Then you'll see the strand peaks nice and clear.

    If it's too bumpy, use kernel smoothing - which is basically what you're doing when you show coverage instead of read counts, except you're using a rectangular kernel that's centered at an arbitrary position in the fragment.

    Comment


    • #3
      Thank you for the input. Are you suggesting that I make density curves? When I do that, it changes the image to where some of the peaks have different proportions from when I examine the reads in something like IGV. I also lose the ability to plot the input data along with IP data since the scales are completely different.

      If you don't mean density curves, could you tell me what the axes would be? I assume that the x axis would be genome position, but what about the y axis?

      Comment


      • #4
        I'm proposing figure 1A here: http://www.biomedcentral.com/1471-2164/14/720/figure/F1

        The vertical axis is the number of reads that start at the given position, i.e. the number of reads whose 5'-most base is at that coordinate. I'm not sure if there's a way to do this in IGV, but you can get the 5' ends via the SAMtools API and make a wiggle file, or you could just simply use the convert_align tool provided with UniPeak.

        Comment


        • #5
          Originally posted by jwfoley View Post
          Coverage isn't really meaningful for ChIP-seq since you're counting reads, not bases. It would be clearer to count each read as a hit once at its 5'-most position, strand-specifically.
          Why is coverage not meaningful for ChIP-seq, why would you only count each read once at its 5' position, and why would you plot strand specifically (strand specificity isn't really relevant for ChIP-seq like it is for RNA-seq)? I think each read should be extended in the appropriate direction (depending on which strand it aligns to) to its estimated fragment length (usually around 200bp) and then coverage at each position plotted. Though I don't know what the best software is for doing all this, I use my own scripts.

          Comment


          • #6
            Why is coverage not meaningful for ChIP-seq, why would you only count each read once at its 5' position
            As I said, it's because you're counting reads, not bases. A 100 nt read is not twice as much evidence for DNA-protein interaction as a 50 nt read. The read length is arbitrary, and may vary if you're doing quality trimming.

            and why would you plot strand specifically (strand specificity isn't really relevant for ChIP-seq like it is for RNA-seq)?
            If you have already corrected for the strand shift, then by all means, do sum the strands into one profile - that's indeed the most meaningful view. If not, you'll get your strand-specific peaks shifted in the 5' direction from each other, so when you combine the two strands without shifting you'll get an unnecessarily low peak or even a bimodal one. Even the shift between the peaks has nothing to do with read length; it's determined by insert size. If you happen to have done paired-end sequencing, you can disregard my previous advice and use the known center of each fragment instead of the 5' end of the read (to be perfectly precise, you're counting fragments, not bases or reads); otherwise, there are lots of different ways to estimate the average strand shift and apply a uniform correction with very good resolution. This is explained in more detail in the QuEST paper (see figure 1).

            It's worth noting that a coverage plot is basically a lazy shortcut to get a roughly similar, but worse, result: you're doing kernel smoothing except the kernel function is a rectangle instead of something more mathematically efficient like a bell curve or parabola, and the kernel bandwidth is determined by the read length instead of some more meaningful value related to fragment size variation, and the kernels are centered at the middle of reads instead of the middle of fragments (so forward and reverse won't line up correctly unless insert length = read length, which is generally avoided in library construction so that you don't sequence into the adapter). It might be okay for informally browsing your data, but don't use this lazy shortcut to make a figure for publication.
            Last edited by jwfoley; 10-22-2014, 02:29 PM.

            Comment


            • #7
              Thank you, jwfoley. I've thought about what you've said and it makes a good amount of sense. I'm using ggplot2 in R to make these visualizations. It may not be the best approach, but I'm most familiar with it so it seemed like a good place to start. I think there will be some issues with adjusting the height of the input data if I visualize it with a density kernel as well, but I think I have a good idea about how to adjust the y axis. Thanks again!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 08:47 AM
              0 responses
              15 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X