Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq with FPKM values

    Hi all,

    I was reading this paper today:
    Transcriptomes of germinal zones of human and mouse fetal neocortex suggest a role of extracellular matrix in progenitor self-renewal

    The paper is about the comparison of fetal human and embryonic mouse cells of different brain tissues (RNASeq). As a results they suggest a list of (up- or down-regulated) genes which are responsible for the regulation and control of cell adhesion and cell–extracellular matrix interactions.

    But my question is not about the biological part, but instead about the analysis of the reads.

    As The paper is from 2012 they have used cufflinks v.
    in the method part they are mentioning the use of cufflinks to quantify the read counts per gene using the FPKM values.
    But after that they are using DESeq for the differential expression analysis.

    For the DESeq analysis to work (which is with integer values), they multiply the FPKM values by 10 and round them to integers.

    This was followed by the normal DESeq analysis.

    My question is - does it make sense to use cufflinks to calculate the FPKM values and than "reassign" them as if they were counts, so that DESeq can work with them?

    There are many threads with exactly this question/problem (e.g. 1 ) and most of them suggesting not to do so..

    Does this kind of analysis make sense?

    thanks for the information

    Assa

  • #2
    tldr: No one should ever do what they did.

    Longer reply:
    What they did makes no sense. It's a sad critique of peer review that this even got accepted, since likely no one that knew anything about data analysis actually reviewed the paper...only pure wet-lab people.

    So people often have cases where they need to use some sort of expected counts rather than pure integer values, often due to only having assembled transcriptomes or needing to do transcript-level analyses. The better method to deal with this is to get expected counts (e.g., with eXpress, or rsem, or ...) and then use things like limma/voom or even edgeR with those (you could use DESeq2 in theory, but it'll throw an error).

    Edit: Heck, you're even better of with rounded expected counts than rounded 10xFPKMs. The former has less precision loss.

    Edit2: Is it sad that I quickly checked to ensure that I don't work directly with any of the authors before I posting?
    Last edited by dpryan; 04-24-2015, 04:20 AM.

    Comment


    • #3
      Originally posted by dpryan View Post
      Edit2: Is it sad that I quickly checked to ensure that I don't work directly with any of the authors before I posting?
      No, I had the same first reflex... I think this kind of paper will not be accepted in a short term future. Personally, I was already asked twice in a month to specifically review the Data analysis part, at the second stage of revision... Hope it will be soon automatic!!

      Comment


      • #4
        Originally posted by dpryan View Post
        tldr: No one should ever do what they did.
        yes, this is exactly what I thought.
        The paper is "relatively" old and I don't think something like that will be accepted nowadays (I hope so).

        Originally posted by dpryan View Post
        So people often have cases where they need to use some sort of expected counts rather than pure integer values, often due to only having assembled transcriptomes or needing to do transcript-level analyses. The better method to deal with this is to get expected counts (e.g., with eXpress, or rsem, or ...) and then use things like limma/voom or even edgeR with those (you could use DESeq2 in theory, but it'll throw an error).
        This I don't understand.
        Why can't I just use htseq-count or featureCounts to get the read counts and than run DESeq like a normal work flow?
        Why can I run edgeR but not DESeq?

        thanks
        Assa

        Comment


        • #5
          DESeq2 is explicitly written to throw an error if you try to do this. That's the only reason. You could change the code to allow this and it'll be just as reliable as edgeR.

          Comment


          • #6
            Originally posted by dpryan View Post
            DESeq2 is explicitly written to throw an error if you try to do this.
            Do you mean here "working with expected counts"?

            Can edgeR work with them?

            Comment


            • #7
              Yes, or anything else that isn't an integer.

              Yes, edgeR doesn't throw an error (at least the last time I looked), so it'll work. I'm personally a bit more comfortable with limma/voom for this sort of thing, but that's personal preference.

              Comment


              • #8
                Originally posted by dpryan View Post
                tldr: No one should ever do what they did.

                Longer reply:
                What they did makes no sense. It's a sad critique of peer review that this even got accepted, since likely no one that knew anything about data analysis actually reviewed the paper...only pure wet-lab people.
                Wow! So now that we (i.e. those on this forum) know there is a likely catastrophic flaw in the RNAseq analysis in this paper (which is a major focus of the study), is there a responsibility to notify the journal? This is in PNAS. After my quick read of the paper, it looks like 3/4 figures directly use the results from this flawed analysis, so it likely has more than a trivial impact on the study's conclusions.

                Comment


                • #9
                  From the paper:
                  ...analyzed using state-of-the-art methods.

                  Comment


                  • #10
                    Originally posted by hartmaier View Post
                    Wow! So now that we (i.e. those on this forum) know there is a likely catastrophic flaw in the RNAseq analysis in this paper (which is a major focus of the study), is there a responsibility to notify the journal? This is in PNAS. After my quick read of the paper, it looks like 3/4 figures directly use the results from this flawed analysis, so it likely has more than a trivial impact on the study's conclusions.
                    I suppose that one could try, but I wouldn't hold my breath that that would get a reply. What might be more worthwhile is to redo the analysis properly and see if the results change drastically. If so, then it'd be useful to notify the authors/journal. If not, maybe post a comment on pubmed central noting that so others don't need to redo the analysis to see if the results actually hold up.

                    Comment


                    • #11
                      Originally posted by dpryan View Post
                      What might be more worthwhile is to redo the analysis properly and see if the results change drastically. If so, then it'd be useful to notify the authors/journal.
                      Yeah, that's what I was thinking as well. Something to do on a rainy weekend I guess.

                      Comment


                      • #12
                        Hi,

                        I am doing cross species study and found a paper about similar work. I think the data analysis in the paper is not appropriate and decided to ask here!
                        They have done differential gene expression analysis of FPKM data consisting of different species as follows:

                        1. They generate FPKM data with trinity.
                        2. Then they Normalize the FPKM data to account for length difference in orthologs.
                        3. They scale the normalized FPKM data by a common factor such that the lowest expressed gene’s value becomes 1
                        4. Then they round the values to the nearest integer and use edgeR.

                        Will the above approach give sensible results? I doubt because I don't think scaling the FPKM data makes it any similar to raw count data in terms of mean-variance relationship!

                        Comment


                        • #13
                          They may have gotten lucky and gotten sensible results with that method, but I suspect that they got mostly gibberish results.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          10 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          9 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          50 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          67 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X