Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Input data structure for SAMseq

    I know this might be a silly question but since we just started using the SAMseq method from CRAN R package samr to search for differentially expressed genes between two groups, I cannot find a definite description on the input structure from their manual.

    Should I use raw gene counts (e.g. generated from HTseq) or normalized gene counts (e.g. RPKM values) as the input data for SAMseq?

    Anyone who is familiar with the method can give me a quick answer?

    Thanks a lot

  • #2
    Based on reading the paper (http://www-stat.stanford.edu/~tibs/ftp/Li_Tibs.pdf), they definitely developed the method with raw counts in mind, and I have used it with some success in that way. However since it appears to be strictly based on rank (non parametric) statistics, it should in principle work on RPKM too, I think.
    Last edited by kopi-o; 06-01-2012, 01:01 PM.

    Comment


    • #3
      Thanks very much. I tried the SAMseq with our two class unpaired comparison. It resulted in a lot more significant genes than I what I get from edgeR or DESeq. Did you also experience the same problem?
      Then I realized that their default FDR cut-off is 0.2, which is weird. I changed it to 0.05 as most people normally worked with. Then the method got me thousands of up-regulated genes but 0 down-regulated genes....It's so different from the edgeR and DESeq results.

      What FDR cutoff do you normally use in your practice?


      Thanks a lot

      PS:
      The R code I used on SAMseq is:
      samfit <- SAMseq(g.counts, group, resp.type = "Two class unpaired", fdr.output = 0.05);
      #their default setting is fdr.output = 0.2
      Last edited by slowsmile; 06-01-2012, 12:57 PM.

      Comment


      • #4
        Interesting - I also see only up-regulated genes in the data set I am using SAMSeq on - maybe it's some kind of bug?

        No, I don't get too many significantly DE genes - for my data set SAMSeq is far more conservative than edgeR and baySeq (the two others I've tried). But its results make more sense when I look at them on a case by case basis.

        I usually use FDR < 0.05.

        Comment


        • #5
          In my case, edgeR and DESeq gave me 3000~5000 up-regulated genes and ~1000 down-regulated genes, while SAMseq got me 9000 up-regulated but no down-regulated ones based on FDR cutoff of 0.05.
          I used raw gene counts from HTseq as input data. In my mind, the SAMSeq result is way off track...

          I have 3 biological replicates in each group. Do you think the sample size plays a role in the parametric vs nonparametric method descrepancy?

          Also, do you couple your DE gene selection process with fold change cutoff?

          Comment


          • #6
            OK, sounds strange - as far as I know, non-parametric methods usually need more replicates than parametric ones to achieve significance. In my case, I have dozens of replicates per group.

            I usually don't use a fold change cutoff but many people do.
            Last edited by kopi-o; 06-02-2012, 12:22 AM.

            Comment


            • #7
              npSeq

              three biological replicates seems to less for SAMseq.

              if you have problems with the distribution of up- and down-regulated genes you could try npSeq (very similar algorithm but npSeq uses symmetric cutoffs for the nonparametric statistic, while SAMseq uses asymmetric cutoffs).:

              http://www.stanford.edu/~junli07/npSeq/

              i had no problem with the distribution of up- and downregulated genes (12 vs 12 matched pairs) and got more significant genes compared to all other methods, and the obtained gene list looked (e.g. pathway analysis)
              meaningfull regarding the raised biological question.

              Comment


              • #8
                Originally posted by kopi-o View Post
                Based on reading the paper (http://www-stat.stanford.edu/~tibs/ftp/Li_Tibs.pdf), they definitely developed the method with raw counts in mind, and I have used it with some success in that way. However since it appears to be strictly based on rank (non parametric) statistics, it should in principle work on RPKM too, I think.
                I think SAMSeq can work with a range of data. Section 10.1 of the SAM manual (http://www-stat.stanford.edu/~tibs/SAM/sam.pdf) lists several response formats, including Quantitative, Two Class, Paired etc... all of which have different coding formats and apply to different data types from different experimental setups. help(SAMseq) in R shows that is has a corresponding resp.type attribute.

                The 'Two class unpaired' and 'Paired' options here look typical for common formats of RNA-Seq experiments.

                Section 10.5 also indicated that "the user is required to normalize the data from the different experiments before running SAM".

                Comment


                • #9
                  Don't use RPKM, or other normalized counts for SAMSeq

                  For SAMSeq, although it is a nonparametric method, it does nevertheless expect original counts, not normalized counts. This is evident in a careful reading of the SAMSeq article, and also explicitly stated in the "npSeq" instructions:

                  "The normalization will be done by npSeq. RPKM cannot be used as the input data matrix"
                  --Page 3


                  npSeq is a variant of SAMSeq, also written by Li, and using the same resampling algorithm.

                  Note that, an earlier comment citing a suggestion in the SAM manual is irrelevant, as it is referring to a microarray method in SAM. NGS Seq counts are different.
                  Last edited by Paul_McMurdie; 12-11-2015, 03:17 PM. Reason: typo in package name

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Advancing Precision Medicine for Rare Diseases in Children
                    by seqadmin




                    Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                    12-16-2024, 07:57 AM
                  • seqadmin
                    Recent Advances in Sequencing Technologies
                    by seqadmin



                    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                    Long-Read Sequencing
                    Long-read sequencing has seen remarkable advancements,...
                    12-02-2024, 01:49 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 12-17-2024, 10:28 AM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-13-2024, 08:24 AM
                  0 responses
                  43 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-12-2024, 07:41 AM
                  0 responses
                  29 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-11-2024, 07:45 AM
                  0 responses
                  42 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X