
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Calculating sample size for RNASeq data  Genohub  Bioinformatics  0  12242013 01:35 PM 
Power Analysis  Sample Size Calculation  jroussarie  Bioinformatics  2  11072012 12:15 PM 
Statistical geneticist, human whole genome sequence analysis  knome  Industry Jobs!  0  05052011 10:39 AM 
PubMed: Relative power and sample size analysis on gene expression profiling data.  Newsbot!  Literature Watch  0  09182009 03:00 AM 

Thread Tools 
03232014, 06:29 PM  #1 
Junior Member
Location: Australia Join Date: Mar 2014
Posts: 5

statistical analysis of 454 bisulfite sequence data  small sample size
Hello everyone,
I am trying to figure out how to analyse some bisulfite sequencing data that I have and I am hoping that someone will have some suggestions as to how I should go about doing it. I have looked online and in statistics textbooks, but am totally stumped I have performed 454 BS sequencing of a number of PCR amplicons. I have two different treatment groups, with n=3 biological replicates in each (six sets of read data in total). I want to use two types of statistical analysis to assess differences in methylation between the treatment groups. I would like to test for differences in methylation (1) at individual CpG sites within an amplicon and (2) across each amplicon as a whole. I think that it will be necessary for me to analyse my results as count data rather than %methylation values, as I have a small sample size and the %methylation values probably do not conform to normality or homogeneity of variance assumptions. Similar studies that I have seen in the literature have used a Fisher's exact test for (1) and a negative binomial generalised linear model for (2). However, these studies have analysed unreplicated data (where biological replicates were pooled prior to PCR) far and as I know these stat tests are unable to accommodate my replicated data. In another post, somebody suggested that the program DESeq could be used for (1). After trying to use DESeq to analyse my data I realised that this is not possible as the relatively small number of CpG sites that I have to analyse result in inaccurate mean/dispersion estimates. If anyone has any idea as to which statistical tests would be appropriate for my data I would be very grateful. Thank you in advance 
03262014, 03:43 AM  #2 
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480

I'd be a bit hesitant to try to shoehorn this into DESeq or one of the other RNAseq tools, the negative binomial distribution doesn't really fit bisulfite sequencing well. This sort of data is generally handled in one of a few ways:
(1) Logistic regress (e.g., in methylKit), which you can do easily enough in R. (2) Smoothing followed by either a ttest or wilcoxon test, similar to how BSseq/Bsmooth works. (3) Betabinomial regression (e.g., in BiSeq). I would say that the Betabinomial methods will win out long term since they're actually able to model the underlying biology. You can just use the betareg package from CRAN in R to do this. The next thing to think about is if you're interested in single CpGs or whole regions. Most of the packages actually try to find regions, but if you're looking at a small number of amplicons then you're actually likely to be more interested in single CpGs, so you might just ignore the packages and use betareg. I should note that none of these methods are as of yet that ideal. There are new variants every month it seems and I actually have a tweaked version of betabinomial regression in mind to implement if no one else has already (the downside to new packages appearing every couple weeks...), so you'll likely find something to work nicely in the not too distant future. 
04022014, 09:12 PM  #4 
Junior Member
Location: Australia Join Date: Mar 2014
Posts: 5

Thank you for your helpful advice dpryan

Tags 
454, bisulfite sequencing, methylation, replicates, small sample size 
Thread Tools  

