Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • farri
    Junior Member
    • Nov 2015
    • 4

    Cuffnorm's FPKM - advisable to perform TPM conversion?

    Hi,

    I performed a detailed time-course mRNA-seq experiment without any sequencing replicates (the input RNA, however, was a pool of equal amounts of total RNA from biological replicates). As I don't have any replicate information for my sequencing data, I refrained from doing any statistical testing.

    I used the Tuxedo protocol for data analysis (Trapnel et al, 2012), i.e. read mapping with TopHat2, transcript assembly by Cufflinks for each individual time point, executed Cuffmerge afterwards, and - instead of Cuffdiff, due to the absence of replicates - I ran Cuffquant and Cuffnorm, the later with default settings.

    As a result, Cuffnorm outputs sample-normalized FPKM values. It was brought to my attention by Wagner et al (2012), that TPM values are preferable to FPKM values. Converting my FPKM data to TPM values according to https://haroldpimentel.wordpress.com...ression-units/ changes data analysis partly significantly.

    I wonder if it is "allowed" to convert cuffnorm-normalized FPKM values to TPM?
    And does it make any sense to convert cuffnorm-FPKM values to TPM?
    Or is the cuffnorm-normalization already sufficient to enable robust sample-wise comparison of changes in transcript abundances, hence FPKM to TPM conversion would not be necessary?
  • Dario1984
    Senior Member
    • Jun 2011
    • 166

    #2
    Of course it makes a substantial difference. TPM doesn't normalise by the length of the gene, FPKM does. Are you comparing between genes, such as by plotting the gene expression values in a heatmap? Then, you need to use FPKM. Otherwise, the heatmap may be misleading. Or, are you doing a differential expression analysis, such as calculating the fold changes of each gene between two timepoints? Then, TPM is fine, since you are comparing the same gene which has the same length between two timepoints (assuming that no alternative splicing is occurring).

    Comment

    • farri
      Junior Member
      • Nov 2015
      • 4

      #3
      The focus of my study is on fold change(FC)-based evaluation of transcriptional changes, thus intra-gene comparison. I am interested in genes which are strongly differentially regulated between two conditions. However, I also do some inter-gene comparison, such as comparing the transcript abundance of transcript A with transcript B, in order to determine which transcript is more abundant. For instance, if transcript A is up-regulated 10-fold, but from FPKM 1 to FPKM 10, yet transcript B is up-regulated 3-fold from FPKM 1000 to FPKM 3000, I would conclude that the up-regulation of transcript B might be metabolically more relevant. However, for most cases, I stick only to FC-values.

      My key concern is that the use of TPM is advocated over FPKM. FPKM can be easily converted to TPM by dividing the each FPKM value by the sum of all FPKM values of the respective sample, and multiplying this by 1e6. This yields TPM.

      However, Cufflinks internally has complicated statistics to compute FPKM values. Therefore I wonder if I violate any rules by converting FPKM to TPM? Does this particularily apply as I ran Cuffnorm, which already performs a normalization?

      What do you recommend? Stick to Cuffnorm's FPKM values? Or perform the simple FPKM-to-TPM conversion from above and use TPM values instead?

      Comment

      • Dario1984
        Senior Member
        • Jun 2011
        • 166

        #4
        I would use the FPKM you obtained from Cufflinks. Before you do, check the range of FPKM values calculated by Cufflinks. We find some are theoretically impossible values by considering how many reads the sample had and assuming that they all mapped to only the gene with the highest FPKM value determined by Cufflinks, then ask "what would the theoretical maximum FPKM be"? Cufflinks sometimes calculates FPKMs that are unrealistically large. If, however, for your dataset, Cufflinks calculates FPKMs that seem reasonable, then just use those. Although TPM is more popular these days, there's no reason why it's more valid than FPKM, if you are confident that the FPKM values are being calculated correctly by the software you're using.

        Comment

        • sdriscoll
          I like code
          • Sep 2009
          • 436

          #5
          Originally posted by Dario1984 View Post
          Of course it makes a substantial difference. TPM doesn't normalise by the length of the gene, FPKM does. Are you comparing between genes, such as by plotting the gene expression values in a heatmap? Then, you need to use FPKM. Otherwise, the heatmap may be misleading. Or, are you doing a differential expression analysis, such as calculating the fold changes of each gene between two timepoints? Then, TPM is fine, since you are comparing the same gene which has the same length between two timepoints (assuming that no alternative splicing is occurring).
          I know this was cleared up in the post following it but I should correct you in that TPM is transcript length normalized just as FPKM. As pointed out the two are related by a constant factor, X. TPM = FPKM*X where X = 1e6/[sum of all FPKM of a sample].
          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
          Salk Institute for Biological Studies, La Jolla, CA, USA */

          Comment

          • sdriscoll
            I like code
            • Sep 2009
            • 436

            #6
            farri-

            Converting FPKM to TPM has zero effect on whatever careful way cufflinks calculated those FPKM values. It's actually not the FPKM that's carefully calculated within cufflinks but the estimated number of fragments assigned to a transcript. They also calculate an effective length for each transcript which should end up being roughly the transcript length less the median fragment length from your alignments.

            As a side note here which may explain the behavior that Dario1984 sees the so-called "effective" length scaling of the actual read counts assigned to transcripts does change the final total read count of the experiment because:
            Code:
            effective_reads = (reads)*(length/effective_length)
            which means shorter transcripts will have greater inflation than longer and ALL will be inflated to some extent.

            Back to my point...

            It's really only advisable to "trust" fold changes between samples or groups calculated based on normalized read counts. Either that or the fold-change values calculated by cuffdiff. While FPKM and TPM values can give you a general idea I can guarantee that fold changes calculated from those values will be misleading when the change less than 2 fold. I have seen many times a negative fold change calculated by TPM that is actually a positive fold change when calculated from normalized counts. It has long been established that the count data normalization procedures are most reliable for detecting differential expression (which is fold change) and, in fact, cuffdiff uses those normalizations internally when performing differential testing and computing fold change values.
            /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
            Salk Institute for Biological Studies, La Jolla, CA, USA */

            Comment

            • Dario1984
              Senior Member
              • Jun 2011
              • 166

              #7
              It depends on what TPM is an abbreviation for. It sometimes means Tags Per Million and other times means Transcripts Per Million. I was thinking about Tags Per Million but the question asked was probably about Transcripts Per Million, so my recommendation may not be ideal.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                Yesterday, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM
              • SEQadmin2
                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                by SEQadmin2

                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                05-06-2026, 09:04 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 12:03 PM
              0 responses
              19 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, Yesterday, 11:40 AM
              0 responses
              14 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-28-2026, 11:40 AM
              0 responses
              29 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-26-2026, 10:12 AM
              0 responses
              31 views
              0 reactions
              Last Post SEQadmin2  
              Working...