I'm interested in visualizing some ChIP-seq peaks for specific loci. I've got an image that plots the read depth at each position throughout an interval, and I'm fairly satisfied with it. But is this the right idea when plotting peak data? As I understand, there's a forward strand peak and a reverse strand peak. Should I worry about trying to merge the two peaks when visualizing, or would an image like mine be an acceptable way to visualize peaks?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Coverage isn't really meaningful for ChIP-seq since you're counting reads, not bases. It would be clearer to count each read as a hit once at its 5'-most position, strand-specifically. Then you'll see the strand peaks nice and clear.
If it's too bumpy, use kernel smoothing - which is basically what you're doing when you show coverage instead of read counts, except you're using a rectangular kernel that's centered at an arbitrary position in the fragment.
-
Thank you for the input. Are you suggesting that I make density curves? When I do that, it changes the image to where some of the peaks have different proportions from when I examine the reads in something like IGV. I also lose the ability to plot the input data along with IP data since the scales are completely different.
If you don't mean density curves, could you tell me what the axes would be? I assume that the x axis would be genome position, but what about the y axis?
Comment
-
I'm proposing figure 1A here: http://www.biomedcentral.com/1471-2164/14/720/figure/F1
The vertical axis is the number of reads that start at the given position, i.e. the number of reads whose 5'-most base is at that coordinate. I'm not sure if there's a way to do this in IGV, but you can get the 5' ends via the SAMtools API and make a wiggle file, or you could just simply use the convert_align tool provided with UniPeak.
Comment
-
Originally posted by jwfoley View PostCoverage isn't really meaningful for ChIP-seq since you're counting reads, not bases. It would be clearer to count each read as a hit once at its 5'-most position, strand-specifically.
Comment
-
Why is coverage not meaningful for ChIP-seq, why would you only count each read once at its 5' position
and why would you plot strand specifically (strand specificity isn't really relevant for ChIP-seq like it is for RNA-seq)?
It's worth noting that a coverage plot is basically a lazy shortcut to get a roughly similar, but worse, result: you're doing kernel smoothing except the kernel function is a rectangle instead of something more mathematically efficient like a bell curve or parabola, and the kernel bandwidth is determined by the read length instead of some more meaningful value related to fragment size variation, and the kernels are centered at the middle of reads instead of the middle of fragments (so forward and reverse won't line up correctly unless insert length = read length, which is generally avoided in library construction so that you don't sequence into the adapter). It might be okay for informally browsing your data, but don't use this lazy shortcut to make a figure for publication.Last edited by jwfoley; 10-22-2014, 02:29 PM.
Comment
-
Thank you, jwfoley. I've thought about what you've said and it makes a good amount of sense. I'm using ggplot2 in R to make these visualizations. It may not be the best approach, but I'm most familiar with it so it seemed like a good place to start. I think there will be some issues with adjusting the height of the input data if I visualize it with a density kernel as well, but I think I have a good idea about how to adjust the y axis. Thanks again!
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 11:49 AM
|
0 responses
15 views
0 likes
|
Last Post
by seqadmin
Yesterday, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
62 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Comment