Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ThePresident
    Member
    • Jun 2012
    • 72

    Calculating RPKM value manually

    Hello,

    I'm dealing with bacterial RNA-seq data. I would like to show absolute gene expression by calculating RPKM value for every gene. Initially, I've been using Cufflinks for that, but I don't like the way Cufflinks deals with it, so I decided to manually calculate my iwn RPKM values.

    The equation I want to use is:

    RPKM = (10^9 * C)/(N * L), with

    C = Number of reads mapped to a gene
    N = Total mapped reads in the experiment
    L = gene length in base-pairs for a gene

    Now, my question is the following: I want to use raw counts (obtained from HTSeq count table) as the number of reads that actually mapped to a gene (C). I think that should be okay. However, for the total mapped reads in the experiment (N) I thought about simply adding all reads from the HTSeq table.

    For exemple:

    If my gene length (L) is : 200pb
    Number of reads mapped (as from HTSeq-count) (C) : 400
    Total mapped reads (sum for all genes from HTSeq-count) (N) : 10^8

    RPKM = (10^9 * 400)/(10^8 * 200) = 20

    Would that be the right way of calculating it? My aim is to do the same thing for each of my sequencing libraries and then simply compare.

    Thanks you guys,

    TP
  • TiborNagy
    Senior Member
    • Mar 2010
    • 329

    #2
    The calculation looks OK.

    Comment

    • ThePresident
      Member
      • Jun 2012
      • 72

      #3
      Yeah, the more I think about it, the more it makes sense. HTSeq-count uses alignment files from bowtie and DESeq (used for DE analysis) uses HTSeq-count tables... Thus, there seems to be no reason why I couldn't use raw reads from HTSeq-count table for RPKM calculations.

      Comment

      • AdrianP
        Senior Member
        • Apr 2011
        • 130

        #4
        Is your data paired end? In that case, you need FPKM counts, not RPKM.

        Comment

        • ThePresident
          Member
          • Jun 2012
          • 72

          #5
          Originally posted by AdrianP View Post
          Is your data paired end? In that case, you need FPKM counts, not RPKM.
          Nope, single-end... That's why I don't like using Cufflinks who gives FPKM values.

          Comment

          • AdrianP
            Senior Member
            • Apr 2011
            • 130

            #6
            Originally posted by ThePresident View Post
            Nope, single-end... That's why I don't like using Cufflinks who gives FPKM values.
            In case of single-end reads, FPKM=RPKM. But check and see if the values match, I am curious.

            Comment

            • ThePresident
              Member
              • Jun 2012
              • 72

              #7
              Nope, it does not match for one obvious reason (and I'm again banging my head against the wall):

              - I can't just sum all the reads from HTseq table: the number is way over the total number of reads from the library. I'm trying to see why... damn!

              Comment

              • AdrianP
                Senior Member
                • Apr 2011
                • 130

                #8
                Originally posted by ThePresident View Post
                Nope, it does not match for one obvious reason (and I'm again banging my head against the wall):

                - I can't just sum all the reads from HTseq table: the number is way over the total number of reads from the library. I'm trying to see why... damn!
                Use the .fastq file with all of your reads. When you do
                Code:
                head file.fastq
                what do you get?

                Comment

                • dpryan
                  Devon Ryan
                  • Jul 2011
                  • 3478

                  #9
                  Are you counting multimappers? Also, cufflinks calulated FPKMs will almost never be the same as those calculated by hand (partly due to using a different gene length and partly due to cufflinks using fractional read counts).

                  Comment

                  • ThePresident
                    Member
                    • Jun 2012
                    • 72

                    #10
                    @AdrianP : I'm at work now and I don't have access to my linux station now. But, from bowtie alignment I know how much reads I have in my libraries (fastq). For example, in one of my libraries I have 61,402,323 reads, and when counted with HTSeq-count (sum of all reads across all genes) I get 116,898,233 which is almost double.

                    @dpryan: My understanding of HTSeq-count is that it does not count multimappers. I used intersection-nonempty mode. Here is the final output:

                    no_feature 9034352
                    ambiguous 958299
                    too_low_aQual 0
                    not_aligned 2097930
                    alignment_not_unique 0

                    Fastq file contained 61,402,323 reads and 96.58% of that number have been aligned with bowtie. So, the SAM file has to contain around 59,302,363. So, if there are multimapped reads in mu HTSeq-count, how to get rid of them?

                    Comment

                    • ThePresident
                      Member
                      • Jun 2012
                      • 72

                      #11
                      But on the other side, the total number of mapped reads in a the RPKM formula is quite arbitrary. It will be the same for every gene inside one single library, so as long as keep that in mind, the formula still makes sense.

                      On the other side, I wonder if I could compare RPKM values calculated by this manner in between libraries containing replicates? Ex. If I have three libraries of the same condition (biological replicates) but with different sequencing depth, could I calculate RPKM as explained above and then just average them...?

                      Comment

                      • Simon Anders
                        Senior Member
                        • Feb 2010
                        • 995

                        #12
                        [deleted] .
                        Last edited by Simon Anders; 01-24-2014, 10:36 AM. Reason: post was meant to appear in another thread, but I confused my browser tabs

                        Comment

                        • ThePresident
                          Member
                          • Jun 2012
                          • 72

                          #13
                          It might seems that I'm flooding this thread, but I just realized that I already had the same problem. I think that the right way of dealing with all this is to extract only uniquely-mapped reads from bowtie-generated BAM files and use them in HTSeq-count, DESeq and RPKM calculations.

                          My only concern is to lose too much reads du to the multimapping...

                          Comment

                          • AdrianP
                            Senior Member
                            • Apr 2011
                            • 130

                            #14
                            Originally posted by ThePresident View Post
                            It might seems that I'm flooding this thread, but I just realized that I already had the same problem. I think that the right way of dealing with all this is to extract only uniquely-mapped reads from bowtie-generated BAM files and use them in HTSeq-count, DESeq and RPKM calculations.

                            My only concern is to lose too much reads du to the multimapping...
                            If you use 61,402,323 as total reads, what value of RPKM do you get for any given gene compared to FPKM by cufflinks?

                            Comment

                            • ThePresident
                              Member
                              • Jun 2012
                              • 72

                              #15
                              Originally posted by AdrianP View Post
                              If you use 61,402,323 as total reads, what value of RPKM do you get for any given gene compared to FPKM by cufflinks?
                              It's somewhat near but not exactly the same thing. I did it with just a couple of values but still: 8vs9 , 146vs300, 13vs18.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 11:10 AM
                              0 responses
                              8 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              43 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              104 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              125 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...