Seqanswers Leaderboard Ad

**simonandrews** · 05-05-2014, 11:46 PM

The basic problem you have here is that whatever testing regime you use you will intrinsically have different power to detect differences if you are looking in fixed size windows simply because of the nature of BS-Seq data. The same thing is true in other techniques (eg RNA-Seq performing better on long transcripts), and there's fundamentally nothing you can do about it.

Things which we've done to try to work around this to some extent:

1) Work in fixed CpG windows rather than fixed bp. In this scenario you gain equal statistical power by forcing your analysis to put the same number of CpGs in each window. This means that over CpG islands you have relatively high resolution and in intergenic regions it can be very low. This makes some of the stats more comparable, but suffers from the fact that you may well get mixed signals in the longer CpG poor stretches which will dilute any real signal which exists.

2) Don't solely rely on significance - use absolute filters as well. We take the view that a significant result should be there to lend weight to an absolute effect which has a reasonable magnitude. Taking huge sample sizes and finding that a change in methylation of 0.1% is significant may be a mathematically true result, but is not likely to be biologically relevant. We therefore use a couple of different metrics for measuring methylation over a region and insist on a certain absolute level of change in addition to statistical significance in order to put a region on our hit list. This will mean that we end up with higher false negative rates in our CpG poor regions, but the set of hits which we have should all be interesting.

**serenaliao** · 05-06-2014, 09:02 AM

Hi Simon,

Thanks for the reply.

First some comments to your suggestions:
1. fixed CpG windows sounds OK. But it still sounds a little arbitrary. I am wondering if there are ways to determine boundary of DMR as well, since CpGs within one window are not necessary having the same direction of methylation. (hyper or hypo).

2. Yes, I use the default criteria in methylKit when declaring DE genes. (q-value<0.01 and methy.diff>0.25). Here actually the q-value will be over-significant when they assume each C count is independent as a sample. The methy.diff will be easily dominated by some extreme samples which have either large number of methylated or un-methylated counts.

Do you mean I can calculate the beta-value for each biological sample and then calculate the difference of mean beta-value between two group to filter?

Originally posted by simonandrews View Post

The basic problem you have here is that whatever testing regime you use you will intrinsically have different power to detect differences if you are looking in fixed size windows simply because of the nature of BS-Seq data. The same thing is true in other techniques (eg RNA-Seq performing better on long transcripts), and there's fundamentally nothing you can do about it.

Things which we've done to try to work around this to some extent:

1) Work in fixed CpG windows rather than fixed bp. In this scenario you gain equal statistical power by forcing your analysis to put the same number of CpGs in each window. This means that over CpG islands you have relatively high resolution and in intergenic regions it can be very low. This makes some of the stats more comparable, but suffers from the fact that you may well get mixed signals in the longer CpG poor stretches which will dilute any real signal which exists.

2) Don't solely rely on significance - use absolute filters as well. We take the view that a significant result should be there to lend weight to an absolute effect which has a reasonable magnitude. Taking huge sample sizes and finding that a change in methylation of 0.1% is significant may be a mathematically true result, but is not likely to be biologically relevant. We therefore use a couple of different metrics for measuring methylation over a region and insist on a certain absolute level of change in addition to statistical significance in order to put a region on our hit list. This will mean that we end up with higher false negative rates in our CpG poor regions, but the set of hits which we have should all be interesting.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Problem of methylKit: Too many false positive?

Comment

Comment

Latest Articles

ad_right_rmr

News