Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • t-test FPKM values

    I have two sets of genes, and I'd like to have a boxplot and do a t-test in order to know if they have significantly different expressions or not.

    However, my t-test p-value changes when using log10(FPKM+1) values or just FPKM values. Why? What should I choose?

    Thanks.

  • #2
    A t-test is dependend on the effect size - and that obviously changes if you do log2.
    The general rule is to test on the data you measure - in this case, this would be the un-logged reads per million.

    Either way: You should not be testing on the FPKM values, in summary because you loose the information about the no of reads actually behind the value -> more reads -> a better estimate.

    Consider using a testing method specifically for RNAseq data such as DESeq.

    Comment


    • #3
      FPKM is just an intuitive transformation of fragment counts and is not suitable to be used in statistics.

      Fortunately, the software package that probably gave you the FPKM values, Cufflinks, also includes a program called cuffdiff that will do the test you want to do in a statistically rigorous way based on modeling the actual fragment counts. Use that instead; don't try to do use statistical tests that are unsuited for your data type on data that are unsuited for statistics.

      Comment


      • #4
        I do not need specific RNA-seq normalization here for what I want. Both sets of genes (actually I have transcripts) come from the same RNA-seq dataset (the same fasta). One dataset is made up of coding transcripts and the second one is made up of putative lncRNAs. I just wanna know which set or group of transcripts is more expressed.

        What is your final conclusion¿
        Last edited by int11ap1; 07-17-2014, 11:14 AM.

        Comment


        • #5
          My final conclusion is the same as before: you should use a valid hypothesis test on the count data, like cuffdiff, DESeq2, or edgeR, all of which are quite rigorous, commonly used, and well documented. Do not use an invalid hypothesis test on FPKMs. FPKM is a crude normalization and cannot be used in a meaningful statistical test. Asking us again is not going to change the way numbers work.

          Comment


          • #6
            But those methods that you say (edgeR and DESeq) are for normalization between different samples or RNA-seq datasets...

            Comment


            • #7
              No, you have it backwards: those methods are all for statistical hypothesis testing, and FPKM is a (crude, statistically inappropriate) normalization for comparing different samples.

              Comment


              • #8
                I do not follow you, sorry for asking again.

                For example, I have 1000 FPKM values (from 1 RNA-seq sample) from 1000 transcripts. If I want to compare first 500 with second 500 transcripts (for seeing which set is more expressed), I need to use edgeR or DESseq¿ For what¿

                Comment


                • #9
                  Ah, I see: you're comparing some genes with other genes in the same experiment, not same gene different experiment.

                  You can use FPKM values for this if you use a distribution-free test like Mann-Whitney-Wilcoxon, but that won't be very powerful. Otherwise you could use a more effective normalization like the variance-stabilizing transformation or regularized log in DESeq2 and then use a regular t-test.

                  Comment


                  • #10
                    Here you are, thanks¡
                    Why do not apply directly the t-test¿ Where can I learn about it¿

                    Comment


                    • #11
                      The t-test assumes the populations are normally distributed. FPKMs are not. http://en.wikipedia.org/wiki/Student's_t-test

                      A log transformation may seem to help but it is still inappropriate because it fails to account for the heteroskedastic mean-variance dependency of read counts. DOI: 10.1111/j.2041-210X.2010.00021.x

                      Comment


                      • #12
                        But the arithmetic mean of my FPKM values will be normally distributed according to the central limit theorem. In large samples such as mine, t.test for skewed distributions should be fine: http://stats.stackexchange.com/quest...ormal-when-n50

                        Comment


                        • #13
                          Okay, you could do a normality test to verify that the t-test assumptions are met, but it would be more straightforward and rigorous to just use a better normalization.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM
                          • seqadmin
                            The Impact of AI in Genomic Medicine
                            by seqadmin



                            Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                            02-26-2024, 02:07 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 03-14-2024, 06:13 AM
                          0 responses
                          33 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-08-2024, 08:03 AM
                          0 responses
                          72 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-07-2024, 08:13 AM
                          0 responses
                          81 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-06-2024, 09:51 AM
                          0 responses
                          68 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X