Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Input data structure for SAMseq

    I know this might be a silly question but since we just started using the SAMseq method from CRAN R package samr to search for differentially expressed genes between two groups, I cannot find a definite description on the input structure from their manual.

    Should I use raw gene counts (e.g. generated from HTseq) or normalized gene counts (e.g. RPKM values) as the input data for SAMseq?

    Anyone who is familiar with the method can give me a quick answer?

    Thanks a lot

  • #2
    Based on reading the paper (http://www-stat.stanford.edu/~tibs/ftp/Li_Tibs.pdf), they definitely developed the method with raw counts in mind, and I have used it with some success in that way. However since it appears to be strictly based on rank (non parametric) statistics, it should in principle work on RPKM too, I think.
    Last edited by kopi-o; 06-01-2012, 01:01 PM.

    Comment


    • #3
      Thanks very much. I tried the SAMseq with our two class unpaired comparison. It resulted in a lot more significant genes than I what I get from edgeR or DESeq. Did you also experience the same problem?
      Then I realized that their default FDR cut-off is 0.2, which is weird. I changed it to 0.05 as most people normally worked with. Then the method got me thousands of up-regulated genes but 0 down-regulated genes....It's so different from the edgeR and DESeq results.

      What FDR cutoff do you normally use in your practice?


      Thanks a lot

      PS:
      The R code I used on SAMseq is:
      samfit <- SAMseq(g.counts, group, resp.type = "Two class unpaired", fdr.output = 0.05);
      #their default setting is fdr.output = 0.2
      Last edited by slowsmile; 06-01-2012, 12:57 PM.

      Comment


      • #4
        Interesting - I also see only up-regulated genes in the data set I am using SAMSeq on - maybe it's some kind of bug?

        No, I don't get too many significantly DE genes - for my data set SAMSeq is far more conservative than edgeR and baySeq (the two others I've tried). But its results make more sense when I look at them on a case by case basis.

        I usually use FDR < 0.05.

        Comment


        • #5
          In my case, edgeR and DESeq gave me 3000~5000 up-regulated genes and ~1000 down-regulated genes, while SAMseq got me 9000 up-regulated but no down-regulated ones based on FDR cutoff of 0.05.
          I used raw gene counts from HTseq as input data. In my mind, the SAMSeq result is way off track...

          I have 3 biological replicates in each group. Do you think the sample size plays a role in the parametric vs nonparametric method descrepancy?

          Also, do you couple your DE gene selection process with fold change cutoff?

          Comment


          • #6
            OK, sounds strange - as far as I know, non-parametric methods usually need more replicates than parametric ones to achieve significance. In my case, I have dozens of replicates per group.

            I usually don't use a fold change cutoff but many people do.
            Last edited by kopi-o; 06-02-2012, 12:22 AM.

            Comment


            • #7
              npSeq

              three biological replicates seems to less for SAMseq.

              if you have problems with the distribution of up- and down-regulated genes you could try npSeq (very similar algorithm but npSeq uses symmetric cutoffs for the nonparametric statistic, while SAMseq uses asymmetric cutoffs).:

              http://www.stanford.edu/~junli07/npSeq/

              i had no problem with the distribution of up- and downregulated genes (12 vs 12 matched pairs) and got more significant genes compared to all other methods, and the obtained gene list looked (e.g. pathway analysis)
              meaningfull regarding the raised biological question.

              Comment


              • #8
                Originally posted by kopi-o View Post
                Based on reading the paper (http://www-stat.stanford.edu/~tibs/ftp/Li_Tibs.pdf), they definitely developed the method with raw counts in mind, and I have used it with some success in that way. However since it appears to be strictly based on rank (non parametric) statistics, it should in principle work on RPKM too, I think.
                I think SAMSeq can work with a range of data. Section 10.1 of the SAM manual (http://www-stat.stanford.edu/~tibs/SAM/sam.pdf) lists several response formats, including Quantitative, Two Class, Paired etc... all of which have different coding formats and apply to different data types from different experimental setups. help(SAMseq) in R shows that is has a corresponding resp.type attribute.

                The 'Two class unpaired' and 'Paired' options here look typical for common formats of RNA-Seq experiments.

                Section 10.5 also indicated that "the user is required to normalize the data from the different experiments before running SAM".

                Comment


                • #9
                  Don't use RPKM, or other normalized counts for SAMSeq

                  For SAMSeq, although it is a nonparametric method, it does nevertheless expect original counts, not normalized counts. This is evident in a careful reading of the SAMSeq article, and also explicitly stated in the "npSeq" instructions:

                  "The normalization will be done by npSeq. RPKM cannot be used as the input data matrix"
                  --Page 3


                  npSeq is a variant of SAMSeq, also written by Li, and using the same resampling algorithm.

                  Note that, an earlier comment citing a suggestion in the SAM manual is irrelevant, as it is referring to a microarray method in SAM. NGS Seq counts are different.
                  Last edited by Paul_McMurdie; 12-11-2015, 03:17 PM. Reason: typo in package name

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  18 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  22 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  17 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X