Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • statistical analysis of 454 bisulfite sequence data - small sample size

    Hello everyone,

    I am trying to figure out how to analyse some bisulfite sequencing data that I have and I am hoping that someone will have some suggestions as to how I should go about doing it. I have looked online and in statistics textbooks, but am totally stumped

    I have performed 454 BS sequencing of a number of PCR amplicons. I have two different treatment groups, with n=3 biological replicates in each (six sets of read data in total). I want to use two types of statistical analysis to assess differences in methylation between the treatment groups. I would like to test for differences in methylation (1) at individual CpG sites within an amplicon and (2) across each amplicon as a whole. I think that it will be necessary for me to analyse my results as count data rather than %methylation values, as I have a small sample size and the %methylation values probably do not conform to normality or homogeneity of variance assumptions. Similar studies that I have seen in the literature have used a Fisher's exact test for (1) and a negative binomial generalised linear model for (2). However, these studies have analysed unreplicated data (where biological replicates were pooled prior to PCR) far and as I know these stat tests are unable to accommodate my replicated data. In another post, somebody suggested that the program DESeq could be used for (1). After trying to use DESeq to analyse my data I realised that this is not possible as the relatively small number of CpG sites that I have to analyse result in inaccurate mean/dispersion estimates.

    If anyone has any idea as to which statistical tests would be appropriate for my data I would be very grateful.

    Thank you in advance

  • #2
    I'd be a bit hesitant to try to shoe-horn this into DESeq or one of the other RNAseq tools, the negative binomial distribution doesn't really fit bisulfite sequencing well. This sort of data is generally handled in one of a few ways:

    (1) Logistic regress (e.g., in methylKit), which you can do easily enough in R.
    (2) Smoothing followed by either a t-test or wilcoxon test, similar to how BSseq/Bsmooth works.
    (3) Beta-binomial regression (e.g., in BiSeq).

    I would say that the Beta-binomial methods will win out long term since they're actually able to model the underlying biology. You can just use the betareg package from CRAN in R to do this. The next thing to think about is if you're interested in single CpGs or whole regions. Most of the packages actually try to find regions, but if you're looking at a small number of amplicons then you're actually likely to be more interested in single CpGs, so you might just ignore the packages and use betareg. I should note that none of these methods are as of yet that ideal. There are new variants every month it seems and I actually have a tweaked version of beta-binomial regression in mind to implement if no one else has already (the downside to new packages appearing every couple weeks...), so you'll likely find something to work nicely in the not too distant future.

    Comment


    • #3
      In fact, it turns out that MOABS, which just came out, already implements what I had in mind. You'll have to figure out how to get your data into it, but it's likely to give nice results.

      Comment


      • #4
        Thank you for your helpful advice dpryan

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        66 views
        0 likes
        Last Post seqadmin  
        Working...
        X