Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • getzabeth
    Junior Member
    • Apr 2010
    • 2

    About negative binomial distribution fit

    Hi everybody!

    How do we test the adjustment of RNA-seq data (reads per gene) to the negative binomial distribution? We are currently using the following aproximation:

    a) We used the R function goodfit() to find the parameters of the negative binomial curve closest to our data.
    b) We used the R function ks.test() -Kolmogorov-Smirnov Test, ks- to compare our data with the negative binomial curve estimated by goodfit.

    Do we need to use the original reads per gene count as input? D:

    We weren't able to prove the adjustment of the reads per gene (as an integer vector) to the negative binomial curve. Here, goodfit was able to estimate the closest negative binomial parameters, but the p for the ks test was too low (the Ho that the distribution was negative binomial was rejected). However, when using categorized data (we imposed 20 bins each one representing a reads per gene range) the ks test prooved the adjustment of the data to the negative binomial distribution (p > 0.8).

    Are our adjustments for categorized data valid? And in case it is ...
    why are we unable to proove the adjustment of the original reads per gene vector to the negative binomial distribution?

    Thanks a lot
  • timydaley
    Member
    • Jun 2010
    • 26

    #2
    First off, a statistical test doesn't prove anything. It suggests by assigning probability to the null hypothesis. If the probability is sufficiently low, you can reject the null hypothesis.

    Secondly, a p-value greater than 0.8 is not necessarily meaningful. The negative binomial may not be a good fit for the data, depending on the application. Are you including zero count genes? Are you looking at all genes? Or are you only looking at a subset or locally? For small numbers of different categories the negative binomial is probably a good assumption, but for large numbers it may not be sufficient. Additionally there are other considerations, such as sequencing bias. I think most tools for differential expression will do the renormalization and account for these factors.

    Comment

    • Simon Anders
      Senior Member
      • Feb 2010
      • 995

      #3
      I am quite puzzled about what you are trying to achieve. What do you mean by "adjustment"? What exactly do you want to fit and why?

      I hope you are not trying to take all the per-gene count values from a sample and try to fit an NB distribution to it. (Sorry, if I make you sound overly naive, but a some people have misunderstood the whole NB stuff to mean that these values were NB distributed. Of course they are not. The values for one gene, across samples, are postulated to be NB distributed*, but this is hard to check unless you have dozens of samples.)

      * but only out of convenience, not because we really believe they are; see here: http://seqanswers.com/forums/showpos...49&postcount=5

      Comment

      • getzabeth
        Junior Member
        • Apr 2010
        • 2

        #4
        Thanks to both of you for the replies

        Simon:

        You were right, if fact we were trying to fit "all the per-gene count values from the same sample" to the NB distribution. Everyone in our lab (and maybe in other groups) thought till we read your answer that that was the meaning of the statistical assumption made by DEseq.

        Considering your answer everything it's ok with our analysis (or the opposite can't be tested) :P

        Thank you,

        Comment

        Latest Articles

        Collapse

        • GATTACAT
          Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by GATTACAT
          Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
          07-01-2026, 11:43 AM
        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 07-02-2026, 11:08 AM
        0 responses
        16 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-30-2026, 05:37 AM
        0 responses
        17 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        21 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        54 views
        0 reactions
        Last Post SEQadmin2  
        Working...