Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correlation FPKM and RPKM

    Hi everyone!
    I'm trying to evaluate gene expression differences, in breast cancer cells before and after treatment. So I'm working on RNA-seq data (single-end reads).

    I tried to correlate FPKM (CuffDiff output) and RPKM (counts from HTSeq-count, then classic "Mortazavi et al." calculation).
    Reading the CuffLinks website, some papers and other forums, it seems that these values have to be the same for single-end reads data!

    I also filtered miRNAs and other genes shorter then 300bp (could give false FPKM high values).
    I hope that someone can help me!

    Thanks in advace

  • #2
    You did not include a question or any findings in your post, so what exactly do you want to know?

    Comment


    • #3
      For single-end reads, FPRKM==RPKM. Keep in mind that, due to how it works, the RPKM values produced by cufflinks will almost never be the same as those you compute by hand. Firstly, htseq-count only count uniquely mapped reads, whereas cufflinks will distribute fractional reads counts over transcripts and genes. Also, you likely have a single pre-defined length for each gene, presumably computed by just summing the length of each exon in a "union gene model". I recall that cufflinks tries to determine the actual length and distributions of the transcripts and then uses that.

      BTW, unless you really want to discover new isoforms or genes, you might just directly use the counts from HTSeq-count in DESeq2 (or edgeR or limma).

      Comment


      • #4
        Sorry, you're right! My question is:
        Is possible to obtain equal FPKM and RPKM values? All says "yes", but deeply searching into the literature, I didn't find a protocol to do it or some correlation analysis.

        The best match I've found is this:

        but he tried to correlate crude counts to fpkm (without a great success...)

        Thanks!

        Comment


        • #5
          By definition, an FPKM value computed with single-end reads is also the RPKM value (in fact, this is also true for paired-end reads if you only use reads where both ends map to the same feature). As I mentioned above, the reason that you're getting different values by hand than by cufflinks is that you're using vastly different methods to arrive at both counts and lengths.

          Comment


          • #6
            Originally posted by dpryan View Post
            For single-end reads, FPRKM==RPKM. Keep in mind that, due to how it works, the RPKM values produced by cufflinks will almost never be the same as those you compute by hand. Firstly, htseq-count only count uniquely mapped reads, whereas cufflinks will distribute fractional reads counts over transcripts and genes. Also, you likely have a single pre-defined length for each gene, presumably computed by just summing the length of each exon in a "union gene model". I recall that cufflinks tries to determine the actual length and distributions of the transcripts and then uses that.

            BTW, unless you really want to discover new isoforms or genes, you might just directly use the counts from HTSeq-count in DESeq2 (or edgeR or limma).
            sorry for my ignorance, what's the meaning of FPRKM? Thanks for your answer!
            The reason why I tried to compare FPKM and RPKM, is only to have a value control!
            I believe that I have to follow only one strand of analysis...

            Comment


            • #7
              Originally posted by dpryan View Post
              By definition, an FPKM value computed with single-end reads is also the RPKM value (in fact, this is also true for paired-end reads if you only use reads where both ends map to the same feature). As I mentioned above, the reason that you're getting different values by hand than by cufflinks is that you're using vastly different methods to arrive at both counts and lengths.
              Thank you very much for your explanation!

              Comment


              • #8
                FPRKM was just a typo I meant FPKM

                Comment


                • #9
                  Originally posted by dpryan View Post
                  FPRKM was just a typo I meant FPKM
                  Ah ok! I was afraid I missed something important!

                  Comment


                  • #10
                    I should also point you to figure 3D in this paper.

                    Comment


                    • #11
                      Originally posted by dpryan View Post
                      I should also point you to figure 3D in this paper.
                      I requested the article to my university, so interested! The 3D figure (I can't see it in high resolution for now...) seems really similar to my correlation curve between FPKM and RPKM

                      Thanks a lot!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      18 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      22 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      17 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      49 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X