SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
t-test FPKM values int11ap1 RNA Sequencing 12 07-17-2014 12:57 PM
NOISeq with fpkm values NitaC Bioinformatics 5 07-12-2014 06:11 AM
Cufflinks 0 FPKM values herstein Bioinformatics 2 07-24-2013 11:21 PM
Calculating p-values from FPKM? Artur Jaroszewicz Bioinformatics 16 10-25-2012 01:04 PM
FPKM values are zero budgie lover Bioinformatics 1 09-12-2012 05:54 AM

Reply
 
Thread Tools
Old 04-24-2015, 04:10 AM   #1
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 148
Default DESeq with FPKM values

Hi all,

I was reading this paper today:
Transcriptomes of germinal zones of human and mouse fetal neocortex suggest a role of extracellular matrix in progenitor self-renewal

The paper is about the comparison of fetal human and embryonic mouse cells of different brain tissues (RNASeq). As a results they suggest a list of (up- or down-regulated) genes which are responsible for the regulation and control of cell adhesion and cell–extracellular matrix interactions.

But my question is not about the biological part, but instead about the analysis of the reads.

As The paper is from 2012 they have used cufflinks v.
in the method part they are mentioning the use of cufflinks to quantify the read counts per gene using the FPKM values.
But after that they are using DESeq for the differential expression analysis.

For the DESeq analysis to work (which is with integer values), they multiply the FPKM values by 10 and round them to integers.

This was followed by the normal DESeq analysis.

My question is - does it make sense to use cufflinks to calculate the FPKM values and than "reassign" them as if they were counts, so that DESeq can work with them?

There are many threads with exactly this question/problem (e.g. 1 ) and most of them suggesting not to do so..

Does this kind of analysis make sense?

thanks for the information

Assa
frymor is offline   Reply With Quote
Old 04-24-2015, 05:11 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

tldr: No one should ever do what they did.

Longer reply:
What they did makes no sense. It's a sad critique of peer review that this even got accepted, since likely no one that knew anything about data analysis actually reviewed the paper...only pure wet-lab people.

So people often have cases where they need to use some sort of expected counts rather than pure integer values, often due to only having assembled transcriptomes or needing to do transcript-level analyses. The better method to deal with this is to get expected counts (e.g., with eXpress, or rsem, or ...) and then use things like limma/voom or even edgeR with those (you could use DESeq2 in theory, but it'll throw an error).

Edit: Heck, you're even better of with rounded expected counts than rounded 10xFPKMs. The former has less precision loss.

Edit2: Is it sad that I quickly checked to ensure that I don't work directly with any of the authors before I posting?

Last edited by dpryan; 04-24-2015 at 05:20 AM.
dpryan is offline   Reply With Quote
Old 04-24-2015, 06:40 AM   #3
SylvainL
Senior Member
 
Location: Geneva

Join Date: Feb 2012
Posts: 175
Default

Quote:
Originally Posted by dpryan View Post
Edit2: Is it sad that I quickly checked to ensure that I don't work directly with any of the authors before I posting?
No, I had the same first reflex... I think this kind of paper will not be accepted in a short term future. Personally, I was already asked twice in a month to specifically review the Data analysis part, at the second stage of revision... Hope it will be soon automatic!!
SylvainL is offline   Reply With Quote
Old 04-27-2015, 04:38 AM   #4
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 148
Default

Quote:
Originally Posted by dpryan View Post
tldr: No one should ever do what they did.
yes, this is exactly what I thought.
The paper is "relatively" old and I don't think something like that will be accepted nowadays (I hope so).

Quote:
Originally Posted by dpryan View Post
So people often have cases where they need to use some sort of expected counts rather than pure integer values, often due to only having assembled transcriptomes or needing to do transcript-level analyses. The better method to deal with this is to get expected counts (e.g., with eXpress, or rsem, or ...) and then use things like limma/voom or even edgeR with those (you could use DESeq2 in theory, but it'll throw an error).
This I don't understand.
Why can't I just use htseq-count or featureCounts to get the read counts and than run DESeq like a normal work flow?
Why can I run edgeR but not DESeq?

thanks
Assa
frymor is offline   Reply With Quote
Old 04-27-2015, 04:43 AM   #5
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

DESeq2 is explicitly written to throw an error if you try to do this. That's the only reason. You could change the code to allow this and it'll be just as reliable as edgeR.
dpryan is offline   Reply With Quote
Old 04-28-2015, 04:39 AM   #6
frymor
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 148
Default

Quote:
Originally Posted by dpryan View Post
DESeq2 is explicitly written to throw an error if you try to do this.
Do you mean here "working with expected counts"?

Can edgeR work with them?
frymor is offline   Reply With Quote
Old 04-28-2015, 04:46 AM   #7
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

Yes, or anything else that isn't an integer.

Yes, edgeR doesn't throw an error (at least the last time I looked), so it'll work. I'm personally a bit more comfortable with limma/voom for this sort of thing, but that's personal preference.
dpryan is offline   Reply With Quote
Old 07-16-2015, 11:03 AM   #8
hartmaier
Member
 
Location: Pittsburgh

Join Date: Dec 2012
Posts: 12
Default

Quote:
Originally Posted by dpryan View Post
tldr: No one should ever do what they did.

Longer reply:
What they did makes no sense. It's a sad critique of peer review that this even got accepted, since likely no one that knew anything about data analysis actually reviewed the paper...only pure wet-lab people.
Wow! So now that we (i.e. those on this forum) know there is a likely catastrophic flaw in the RNAseq analysis in this paper (which is a major focus of the study), is there a responsibility to notify the journal? This is in PNAS. After my quick read of the paper, it looks like 3/4 figures directly use the results from this flawed analysis, so it likely has more than a trivial impact on the study's conclusions.
hartmaier is offline   Reply With Quote
Old 07-16-2015, 02:14 PM   #9
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 198
Default

From the paper:
Quote:
...analyzed using state-of-the-art methods.
fanli is offline   Reply With Quote
Old 07-17-2015, 05:08 AM   #10
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

Quote:
Originally Posted by hartmaier View Post
Wow! So now that we (i.e. those on this forum) know there is a likely catastrophic flaw in the RNAseq analysis in this paper (which is a major focus of the study), is there a responsibility to notify the journal? This is in PNAS. After my quick read of the paper, it looks like 3/4 figures directly use the results from this flawed analysis, so it likely has more than a trivial impact on the study's conclusions.
I suppose that one could try, but I wouldn't hold my breath that that would get a reply. What might be more worthwhile is to redo the analysis properly and see if the results change drastically. If so, then it'd be useful to notify the authors/journal. If not, maybe post a comment on pubmed central noting that so others don't need to redo the analysis to see if the results actually hold up.
dpryan is offline   Reply With Quote
Old 07-17-2015, 08:52 AM   #11
hartmaier
Member
 
Location: Pittsburgh

Join Date: Dec 2012
Posts: 12
Default

Quote:
Originally Posted by dpryan View Post
What might be more worthwhile is to redo the analysis properly and see if the results change drastically. If so, then it'd be useful to notify the authors/journal.
Yeah, that's what I was thinking as well. Something to do on a rainy weekend I guess.
hartmaier is offline   Reply With Quote
Old 04-29-2016, 01:25 PM   #12
pashu912
Junior Member
 
Location: Stockholm

Join Date: Mar 2014
Posts: 1
Default

Hi,

I am doing cross species study and found a paper about similar work. I think the data analysis in the paper is not appropriate and decided to ask here!
They have done differential gene expression analysis of FPKM data consisting of different species as follows:

1. They generate FPKM data with trinity.
2. Then they Normalize the FPKM data to account for length difference in orthologs.
3. They scale the normalized FPKM data by a common factor such that the lowest expressed gene’s value becomes 1
4. Then they round the values to the nearest integer and use edgeR.

Will the above approach give sensible results? I doubt because I don't think scaling the FPKM data makes it any similar to raw count data in terms of mean-variance relationship!
pashu912 is offline   Reply With Quote
Old 04-29-2016, 01:29 PM   #13
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

They may have gotten lucky and gotten sensible results with that method, but I suspect that they got mostly gibberish results.
dpryan is offline   Reply With Quote
Reply

Tags
deseq, fpkm, rnaseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:23 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO