Seqanswers Leaderboard Ad

**jwfoley** · 10-17-2014, 05:21 AM

Coverage isn't really meaningful for ChIP-seq since you're counting reads, not bases. It would be clearer to count each read as a hit once at its 5'-most position, strand-specifically. Then you'll see the strand peaks nice and clear.

If it's too bumpy, use kernel smoothing - which is basically what you're doing when you show coverage instead of read counts, except you're using a rectangular kernel that's centered at an arbitrary position in the fragment.

**blakeoft** · 10-22-2014, 08:56 AM

Thank you for the input. Are you suggesting that I make density curves? When I do that, it changes the image to where some of the peaks have different proportions from when I examine the reads in something like IGV. I also lose the ability to plot the input data along with IP data since the scales are completely different.

If you don't mean density curves, could you tell me what the axes would be? I assume that the x axis would be genome position, but what about the y axis?

**jwfoley** · 10-22-2014, 11:14 AM

I'm proposing figure 1A here: http://www.biomedcentral.com/1471-2164/14/720/figure/F1

The vertical axis is the number of reads that start at the given position, i.e. the number of reads whose 5'-most base is at that coordinate. I'm not sure if there's a way to do this in IGV, but you can get the 5' ends via the SAMtools API and make a wiggle file, or you could just simply use the convert_align tool provided with UniPeak.

**biocomputer** · 10-22-2014, 01:16 PM

Originally posted by jwfoley View Post

Coverage isn't really meaningful for ChIP-seq since you're counting reads, not bases. It would be clearer to count each read as a hit once at its 5'-most position, strand-specifically.

Why is coverage not meaningful for ChIP-seq, why would you only count each read once at its 5' position, and why would you plot strand specifically (strand specificity isn't really relevant for ChIP-seq like it is for RNA-seq)? I think each read should be extended in the appropriate direction (depending on which strand it aligns to) to its estimated fragment length (usually around 200bp) and then coverage at each position plotted. Though I don't know what the best software is for doing all this, I use my own scripts.

**jwfoley** · 10-22-2014, 02:23 PM

Why is coverage not meaningful for ChIP-seq, why would you only count each read once at its 5' position

As I said, it's because you're counting reads, not bases. A 100 nt read is not twice as much evidence for DNA-protein interaction as a 50 nt read. The read length is arbitrary, and may vary if you're doing quality trimming.

and why would you plot strand specifically (strand specificity isn't really relevant for ChIP-seq like it is for RNA-seq)?

If you have already corrected for the strand shift, then by all means, do sum the strands into one profile - that's indeed the most meaningful view. If not, you'll get your strand-specific peaks shifted in the 5' direction from each other, so when you combine the two strands without shifting you'll get an unnecessarily low peak or even a bimodal one. Even the shift between the peaks has nothing to do with read length; it's determined by insert size. If you happen to have done paired-end sequencing, you can disregard my previous advice and use the known center of each fragment instead of the 5' end of the read (to be perfectly precise, you're counting fragments, not bases or reads); otherwise, there are lots of different ways to estimate the average strand shift and apply a uniform correction with very good resolution. This is explained in more detail in the QuEST paper (see figure 1).

It's worth noting that a coverage plot is basically a lazy shortcut to get a roughly similar, but worse, result: you're doing kernel smoothing except the kernel function is a rectangle instead of something more mathematically efficient like a bell curve or parabola, and the kernel bandwidth is determined by the read length instead of some more meaningful value related to fragment size variation, and the kernels are centered at the middle of reads instead of the middle of fragments (so forward and reverse won't line up correctly unless insert length = read length, which is generally avoided in library construction so that you don't sequence into the adapter). It might be okay for informally browsing your data, but don't use this lazy shortcut to make a figure for publication.

**blakeoft** · 11-11-2014, 12:56 PM

Thank you, jwfoley. I've thought about what you've said and it makes a good amount of sense. I'm using ggplot2 in R to make these visualizations. It may not be the best approach, but I'm most familiar with it so it seemed like a good place to start. I think there will be some issues with adjusting the height of the input data if I visualize it with a density kernel as well, but I think I have a good idea about how to adjust the y axis. Thanks again!

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Visualizing ChIP-seq peaks

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News