Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • statistical analysis of 454 bisulfite sequence data - small sample size

    Hello everyone,

    I am trying to figure out how to analyse some bisulfite sequencing data that I have and I am hoping that someone will have some suggestions as to how I should go about doing it. I have looked online and in statistics textbooks, but am totally stumped

    I have performed 454 BS sequencing of a number of PCR amplicons. I have two different treatment groups, with n=3 biological replicates in each (six sets of read data in total). I want to use two types of statistical analysis to assess differences in methylation between the treatment groups. I would like to test for differences in methylation (1) at individual CpG sites within an amplicon and (2) across each amplicon as a whole. I think that it will be necessary for me to analyse my results as count data rather than %methylation values, as I have a small sample size and the %methylation values probably do not conform to normality or homogeneity of variance assumptions. Similar studies that I have seen in the literature have used a Fisher's exact test for (1) and a negative binomial generalised linear model for (2). However, these studies have analysed unreplicated data (where biological replicates were pooled prior to PCR) far and as I know these stat tests are unable to accommodate my replicated data. In another post, somebody suggested that the program DESeq could be used for (1). After trying to use DESeq to analyse my data I realised that this is not possible as the relatively small number of CpG sites that I have to analyse result in inaccurate mean/dispersion estimates.

    If anyone has any idea as to which statistical tests would be appropriate for my data I would be very grateful.

    Thank you in advance

  • #2
    I'd be a bit hesitant to try to shoe-horn this into DESeq or one of the other RNAseq tools, the negative binomial distribution doesn't really fit bisulfite sequencing well. This sort of data is generally handled in one of a few ways:

    (1) Logistic regress (e.g., in methylKit), which you can do easily enough in R.
    (2) Smoothing followed by either a t-test or wilcoxon test, similar to how BSseq/Bsmooth works.
    (3) Beta-binomial regression (e.g., in BiSeq).

    I would say that the Beta-binomial methods will win out long term since they're actually able to model the underlying biology. You can just use the betareg package from CRAN in R to do this. The next thing to think about is if you're interested in single CpGs or whole regions. Most of the packages actually try to find regions, but if you're looking at a small number of amplicons then you're actually likely to be more interested in single CpGs, so you might just ignore the packages and use betareg. I should note that none of these methods are as of yet that ideal. There are new variants every month it seems and I actually have a tweaked version of beta-binomial regression in mind to implement if no one else has already (the downside to new packages appearing every couple weeks...), so you'll likely find something to work nicely in the not too distant future.

    Comment


    • #3
      In fact, it turns out that MOABS, which just came out, already implements what I had in mind. You'll have to figure out how to get your data into it, but it's likely to give nice results.

      Comment


      • #4
        Thank you for your helpful advice dpryan

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Working...
        X