Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to Use RNA-seq data of samples in different condition without replicate

    hi....
    I got several samples'RNA-seq data(7) which have no replicate. these rawdata were converted into readscount of each simple.here are my problems:
    Is there any good method to normalize this kind of data (withou any replication)?
    Besides RPKM ,any other way to estimate expression level ?since I wanna perform network analysis next based on expression level info,and I 'm not sure that RPKM value could be accepted.
    I tried edgeR with it's GLM method but it returned that replicated data are needed, I have no idea how to solve this now

  • #2
    Well, without replicates, my opinion is that running through a full statistical analyses is a waste of time. Even for those algorithms that will run to completion with such data, the fact is you simply cannot compute meaningful statistical significance without replicates.

    That said, you can still certainly normalize them, using RPKM, or you could use DESeq for example (estimate dispersion with a blind model, for example). But all I would do is normalize them, and then take the table of normalized feature values and rank them by magnitude of estimated expression for comparison(s). Without replicates that really is about the best you can do to my mind, and in order to publish, you'll need some independent verification of those relative expression differences you do see.
    Last edited by mbblack; 11-29-2012, 06:51 AM.
    Michael Black, Ph.D.
    ScitoVation LLC. RTP, N.C.

    Comment


    • #3
      well, thanks for your reply,this bother me for a long time. I'll try to normalize my data and just campare them with their normalized feature values, if it shows a good result , a certian exprements would be added such qPCR or do some repilcates .it seems the only way what i can do now.

      Comment


      • #4
        You can certainly do that - just compare them based on the estimated magnitude of their normalized expression.

        DESeq with a blind model may work well for your normalization (the blind model simply treats all samples as if they were replicates of one condition). It will give you fairly conservative estimates of transcript abundance, but that is the preferable approach in your case anyway. If you have not used DESeq before the vignette is pretty good. Just remember that you will need raw feature counts as input (ie. not RPKM or any other summary metric, just whole number counts for each feature).

        Statistical significance is about determining if a particular sample value reflects something other than chance. When you only have a single value per sample, there simply is no way to make any significance statement about whether that particular value may or may not have occurred purely by chance. Thus there really is no substitute or work-around for a lack of replicate sampling.
        Last edited by mbblack; 11-30-2012, 05:16 AM.
        Michael Black, Ph.D.
        ScitoVation LLC. RTP, N.C.

        Comment


        • #5
          Not to open a new topic, so I have similar issues, I just made my RPKM values for my 3 samples (no replicates), and I started comparing them ...making folds and log2

          I have 2 question:
          1. should I discard the entries if I have 0 (zero) in one of the samples?
          2. I will then rank by magnitude (fold or log value), but I have problems with the small RPKMs, If I have for example 0.03 and 0.3 I will have fold 10 which is high but if I see RPKM the values are really small and insignificant and the fold value is misleading? What method should I use to fix this issue?


          Thanks!
          ------------
          SMART - bioinfo.uni-plovdiv.bg

          Comment


          • #6
            Originally posted by vebaev View Post
            Not to open a new topic, so I have similar issues, I just made my RPKM values for my 3 samples (no replicates), and I started comparing them ...making folds and log2

            I have 2 question:
            1. should I discard the entries if I have 0 (zero) in one of the samples?
            2. I will then rank by magnitude (fold or log value), but I have problems with the small RPKMs, If I have for example 0.03 and 0.3 I will have fold 10 which is high but if I see RPKM the values are really small and insignificant and the fold value is misleading? What method should I use to fix this issue?


            Thanks!
            i just give you my solution: at first situation ,i will not discard those 0 items but use 0.1(or whatever you consider as null expression) which called imputation.to the second conds, because RPKM value is also a estimated data, I think FC between them should be taken into consideration even both values are vary low .

            Comment


            • #7
              In some cases people add am offset value so they can filter out the this noise from low counts, but if I have RPKM how and at which point I can add an offset value to each condition?
              ------------
              SMART - bioinfo.uni-plovdiv.bg

              Comment


              • #8
                Originally posted by vebaev View Post
                Not to open a new topic, so I have similar issues, I just made my RPKM values for my 3 samples (no replicates), and I started comparing them ...making folds and log2

                I have 2 question:
                1. should I discard the entries if I have 0 (zero) in one of the samples?
                2. I will then rank by magnitude (fold or log value), but I have problems with the small RPKMs, If I have for example 0.03 and 0.3 I will have fold 10 which is high but if I see RPKM the values are really small and insignificant and the fold value is misleading? What method should I use to fix this issue?


                Thanks!
                My opinion would be to discard any and all genes or features (however you've summarized things) where all samples do not have a minimum RPKM of greather than 0.1 (or whatever cutoff you decide - the point being that for those low values, you simply have far too little data to reliably quantify those features). I would then log2 transform those genes that passed my minimum inclusive cutoff, and compare the log2 transformed transcript abundance estimates.

                My reading of current literature is that the use of an offset for null values seems to be considered to introduce a bias in results and has pretty much fallen out of favor with statisticians as a valid means of dealing with null or extremely low count values. The bio-statisticians I work with also agree that very low count features are simply too unreliable estimates of abundance to be considered for inclusion with the rest of the dataset. You could think of it as an issue of signal to noise ratio, and at very low transcript abundance, the noise is very high and the signal very low, making those values highly suspect.
                Michael Black, Ph.D.
                ScitoVation LLC. RTP, N.C.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                66 views
                0 likes
                Last Post seqadmin  
                Working...
                X