SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Can I test for differential expression using FPKM values? JonB Bioinformatics 7 03-05-2018 03:11 AM
NOISeq with fpkm values NitaC Bioinformatics 5 07-12-2014 06:11 AM
Cufflinks 0 FPKM values herstein Bioinformatics 2 07-24-2013 11:21 PM
Cuffdiff FPKM and test statistic calculations PRingler RNA Sequencing 2 10-16-2012 03:47 AM
FPKM values are zero budgie lover Bioinformatics 1 09-12-2012 05:54 AM

Reply
 
Thread Tools
Old 07-15-2014, 01:12 AM   #1
int11ap1
Member
 
Location: Barcelona

Join Date: Jan 2014
Posts: 16
Default t-test FPKM values

I have two sets of genes, and I'd like to have a boxplot and do a t-test in order to know if they have significantly different expressions or not.

However, my t-test p-value changes when using log10(FPKM+1) values or just FPKM values. Why? What should I choose?

Thanks.
int11ap1 is offline   Reply With Quote
Old 07-15-2014, 02:53 AM   #2
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

A t-test is dependend on the effect size - and that obviously changes if you do log2.
The general rule is to test on the data you measure - in this case, this would be the un-logged reads per million.

Either way: You should not be testing on the FPKM values, in summary because you loose the information about the no of reads actually behind the value -> more reads -> a better estimate.

Consider using a testing method specifically for RNAseq data such as DESeq.
ffinkernagel is offline   Reply With Quote
Old 07-15-2014, 05:10 AM   #3
jwfoley
Senior Member
 
Location: Stanford

Join Date: Jun 2009
Posts: 181
Default

FPKM is just an intuitive transformation of fragment counts and is not suitable to be used in statistics.

Fortunately, the software package that probably gave you the FPKM values, Cufflinks, also includes a program called cuffdiff that will do the test you want to do in a statistically rigorous way based on modeling the actual fragment counts. Use that instead; don't try to do use statistical tests that are unsuited for your data type on data that are unsuited for statistics.
jwfoley is offline   Reply With Quote
Old 07-17-2014, 12:09 PM   #4
int11ap1
Member
 
Location: Barcelona

Join Date: Jan 2014
Posts: 16
Default

I do not need specific RNA-seq normalization here for what I want. Both sets of genes (actually I have transcripts) come from the same RNA-seq dataset (the same fasta). One dataset is made up of coding transcripts and the second one is made up of putative lncRNAs. I just wanna know which set or group of transcripts is more expressed.

What is your final conclusion¿

Last edited by int11ap1; 07-17-2014 at 12:14 PM.
int11ap1 is offline   Reply With Quote
Old 07-17-2014, 12:14 PM   #5
jwfoley
Senior Member
 
Location: Stanford

Join Date: Jun 2009
Posts: 181
Default

My final conclusion is the same as before: you should use a valid hypothesis test on the count data, like cuffdiff, DESeq2, or edgeR, all of which are quite rigorous, commonly used, and well documented. Do not use an invalid hypothesis test on FPKMs. FPKM is a crude normalization and cannot be used in a meaningful statistical test. Asking us again is not going to change the way numbers work.
jwfoley is offline   Reply With Quote
Old 07-17-2014, 12:17 PM   #6
int11ap1
Member
 
Location: Barcelona

Join Date: Jan 2014
Posts: 16
Default

But those methods that you say (edgeR and DESeq) are for normalization between different samples or RNA-seq datasets...
int11ap1 is offline   Reply With Quote
Old 07-17-2014, 12:18 PM   #7
jwfoley
Senior Member
 
Location: Stanford

Join Date: Jun 2009
Posts: 181
Default

No, you have it backwards: those methods are all for statistical hypothesis testing, and FPKM is a (crude, statistically inappropriate) normalization for comparing different samples.
jwfoley is offline   Reply With Quote
Old 07-17-2014, 12:28 PM   #8
int11ap1
Member
 
Location: Barcelona

Join Date: Jan 2014
Posts: 16
Default

I do not follow you, sorry for asking again.

For example, I have 1000 FPKM values (from 1 RNA-seq sample) from 1000 transcripts. If I want to compare first 500 with second 500 transcripts (for seeing which set is more expressed), I need to use edgeR or DESseq¿ For what¿
int11ap1 is offline   Reply With Quote
Old 07-17-2014, 12:32 PM   #9
jwfoley
Senior Member
 
Location: Stanford

Join Date: Jun 2009
Posts: 181
Default

Ah, I see: you're comparing some genes with other genes in the same experiment, not same gene different experiment.

You can use FPKM values for this if you use a distribution-free test like Mann-Whitney-Wilcoxon, but that won't be very powerful. Otherwise you could use a more effective normalization like the variance-stabilizing transformation or regularized log in DESeq2 and then use a regular t-test.
jwfoley is offline   Reply With Quote
Old 07-17-2014, 12:36 PM   #10
int11ap1
Member
 
Location: Barcelona

Join Date: Jan 2014
Posts: 16
Default

Here you are, thanks¡
Why do not apply directly the t-test¿ Where can I learn about it¿
int11ap1 is offline   Reply With Quote
Old 07-17-2014, 12:39 PM   #11
jwfoley
Senior Member
 
Location: Stanford

Join Date: Jun 2009
Posts: 181
Default

The t-test assumes the populations are normally distributed. FPKMs are not. http://en.wikipedia.org/wiki/Student's_t-test

A log transformation may seem to help but it is still inappropriate because it fails to account for the heteroskedastic mean-variance dependency of read counts. DOI: 10.1111/j.2041-210X.2010.00021.x
jwfoley is offline   Reply With Quote
Old 07-17-2014, 12:52 PM   #12
int11ap1
Member
 
Location: Barcelona

Join Date: Jan 2014
Posts: 16
Default

But the arithmetic mean of my FPKM values will be normally distributed according to the central limit theorem. In large samples such as mine, t.test for skewed distributions should be fine: http://stats.stackexchange.com/quest...ormal-when-n50
int11ap1 is offline   Reply With Quote
Old 07-17-2014, 12:57 PM   #13
jwfoley
Senior Member
 
Location: Stanford

Join Date: Jun 2009
Posts: 181
Default

Okay, you could do a normality test to verify that the t-test assumptions are met, but it would be more straightforward and rigorous to just use a better normalization.
jwfoley is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:18 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO