Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • what's the reason for RPKM to be -1?

    Hi,
    From the processed dataset from this nature paper. I found more than 2000 genes RPKM=-1, some of them are only present in one genotype, most of them are across 18 genotypes. And Is it a problem for DESeq? Cause I'm planing to use one genotype as standard to calculate fold change across 18 genotypes. I'm not sure if the -1 would be a problem. Thanks a lot!

    This is the link for this dataset.
    NCBI's Gene Expression Omnibus (GEO) is a public archive and resource for gene expression data.

  • #2
    I don't understand anything about your question.

    RPKM values can never be negative obviously.
    That would be nonsensical.
    If negative RPKM values are ever reported, it would be a glaring mistake on the part of the person reporting the data.

    Yes, it would be a problem for DESeq and anyone trying to make heads or tails of the data.

    I checked the dataset, and I don't see any negative RPKM values.
    I also see only 6 samples, not 18 genotypes.

    I dont' know how to answer your question.
    Perhaps -1 is the fold change that you saw reported somewhere else.

    Comment


    • #3
      Hei, sorry that is the subset, this is the full dataset
      NCBI's Gene Expression Omnibus (GEO) is a public archive and resource for gene expression data.



      I extract some genes hopefully you can understand.


      Gene bur_0_seedling_br1 bur_0_seedling_br2 can_0_seedling_br1 can_0_seedling_br2 col_0_seedling_br1 col_0_seedling_br2 ct_1_seedling_br1 ct_1_seedling_br2 edi_0_seedling_br1 edi_0_seedling_br2 hi_0_seedling_br1 hi_0_seedling_br2 kn_0_seedling_br1 kn_0_seedling_br2 ler_0_seedling_br1 ler_0_seedling_br2 mt_0_seedling_br1 mt_0_seedling_br2 no_0_seedling_br1 no_0_seedling_br2 oy_0_seedling_br1 oy_0_seedling_br2 po_0_seedling_br1 po_0_seedling_br2 rsch_4_seedling_br1 rsch_4_seedling_br2 sf_2_seedling_br1 sf_2_seedling_br2 tsu_0_seedling_br1 tsu_0_seedling_br2 wil_2_seedling_br1 wil_2_seedling_br2 ws_0_seedling_br1 ws_0_seedling_br2 wu_0_seedling_br1 wu_0_seedling_br2 zu_0_seedling_br1 zu_0_seedling_br2
      AT1G02228 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
      AT1G03420 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
      AT1G05760 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
      AT5G66052 5.787552 6.727041 8.162033 6.428824 10.675012 3.631021 -1 -1 1.909225 4.677603 0 0 0 0 0 0 0 0 0 0 6.833976 8.834185 7.036451 10.997121 0 0.473364 19.791401 17.542688 3.971615 1.886927 0.782854 0.548066 0 0 14.767606 22.903755 0.698969 0.453234








      Originally posted by blancha View Post
      I don't understand anything about your question.

      RPKM values can never be negative obviously.
      That would be nonsensical.
      If negative RPKM values are ever reported, it would be a glaring mistake on the part of the person reporting the data.

      Yes, it would be a problem for DESeq and anyone trying to make heads or tails of the data.

      I checked the dataset, and I don't see any negative RPKM values.
      I also see only 6 samples, not 18 genotypes.

      I dont' know how to answer your question.
      Perhaps -1 is the fold change that you saw reported somewhere else.

      Comment


      • #4
        Given the low RPKM values reported, and the negative values, really the only possibility is that the values are log-transformed.
        I'd have to read the methods section in the article to be sure, but I have to get back to work.
        Sorry for my previous curt answer, the explanation is actually simple.
        If you don't want to work with log values, you can always transform them.

        2^-1 = 0.5

        Comment


        • #5
          Originally posted by blancha View Post
          Given the low RPKM values reported, and the negative values, really the only possibility is that the values are log-transformed.
          I'd have to read the methods section in the article to be sure, but I have to get back to work.
          Sorry for my previous curt answer, the explanation is actually simple.
          If you don't want to work with log values, you can always transform them.

          2^-1 = 0.5
          Hei thanks anyway, but I think they shouldn't log their RPKM, otherwise the other genes count would be enormous huge, that's some examples of other genes

          Gene bur_0_seedling_br1 bur_0_seedling_br2 can_0_seedling_br1 can_0_seedling_br2 col_0_seedling_br1 col_0_seedling_br2 ct_1_seedling_br1 ct_1_seedling_br2 edi_0_seedling_br1 edi_0_seedling_br2 hi_0_seedling_br1 hi_0_seedling_br2 kn_0_seedling_br1 kn_0_seedling_br2 ler_0_seedling_br1 ler_0_seedling_br2 mt_0_seedling_br1 mt_0_seedling_br2 no_0_seedling_br1 no_0_seedling_br2 oy_0_seedling_br1 oy_0_seedling_br2 po_0_seedling_br1 po_0_seedling_br2 rsch_4_seedling_br1 rsch_4_seedling_br2 sf_2_seedling_br1 sf_2_seedling_br2 tsu_0_seedling_br1 tsu_0_seedling_br2 wil_2_seedling_br1 wil_2_seedling_br2 ws_0_seedling_br1 ws_0_seedling_br2 wu_0_seedling_br1 wu_0_seedling_br2 zu_0_seedling_br1 zu_0_seedling_br2 sum
          AT1G67090 23474.67957 18506.76837 22558.49703 17963.88129 22261.51152 18081.79123 22864.65719 22578.59519 17712.27585 18506.9085 24195.08945 19083.98343 19867.70052 20505.78261 17510.17006 18376.21321 20868.89077 19066.31867 21073.02528 18487.86585 15744.30777 18168.18385 18588.96703 17941.06439 15221.48229 18695.12078 19099.2088 17731.46444 15555.84946 15340.75638 16646.76701 17455.03738 21415.18619 23309.2689 22300.23254 22895.71065 19666.7896 21345.76211 740665.7651
          AT5G38410 9476.749984 10113.15016 8172.071911 8642.868269 10104.84867 12461.82764 9903.355086 8560.007604 8113.037999 8472.048131 9324.890326 9291.633553 11648.2395 10350.76241 8811.560124 9436.55317 10039.0284 10108.20401 11571.54871 11031.09579 11455.4779 12152.45515 13410.04149 11773.69429 9321.60361 8758.465733 8679.371609 7493.50997 9270.940118 9529.423576 10466.42709 9999.786288 8457.714017 8249.465793 10412.21066 9403.823194 8491.003373 8876.581284 371835.4766
          AT1G79040 7377.856559 7026.800846 6529.524302 5560.600326 7286.377394 8555.898458 8385.740856 7063.791206 6264.424598 7212.806257 6523.142472 6457.388671 7188.488005 7293.568726 5584.46979 5834.04984 6597.860554 7343.098565 6052.682862 6230.979273 6287.967178 7512.839036 7506.727945 6473.571507 6511.226459 6414.141294 6019.717412 5939.857885 5635.330803 7119.00976 7243.638736 7847.612015 5985.395426 5804.339689 5808.082415 5352.37889 5337.314774 5062.679209 250231.38
          AT5G42530 7197.075699 8003.598566 7.71253 14.686264 2062.836337 3537.904828 4881.487311 3940.322224 1115.307811 1414.884252 761.18122 649.928474 2649.869629 2343.793289 898.400389 950.611909 5938.278518 4990.999365 401.256941 501.117921 1720.492291 2603.467064 2419.920666 1672.533769 1417.490495 1261.370128 1313.166287 1002.791317 190.325046 499.879371 2464.391696 1703.833834 1412.873171 1596.998013 667.41509 687.918792 0 0 74896.12051

          Comment


          • #6
            Good point.
            It's perplexing then.

            I don't have any other ideas or time to check the paper.
            I would just write to the authors to ask them, after having checked the methods section in the paper.

            Negative RPKM values make no sense unless they are log transformed.
            But, you are correct to point out that these values are extremely large if they are log transformed.

            I'm going to leave it at that. Someone else can have a shot at trying to explain the -1 RPKM values. It could be a bug or a feaure of the program, where -1 has a special significance (not found in the reference genome ???).

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X