Seqanswers Leaderboard Ad

**kopi-o** · 06-01-2012, 12:47 PM

Based on reading the paper (http://www-stat.stanford.edu/~tibs/ftp/Li_Tibs.pdf), they definitely developed the method with raw counts in mind, and I have used it with some success in that way. However since it appears to be strictly based on rank (non parametric) statistics, it should in principle work on RPKM too, I think.

**slowsmile** · 06-01-2012, 12:55 PM

Thanks very much. I tried the SAMseq with our two class unpaired comparison. It resulted in a lot more significant genes than I what I get from edgeR or DESeq. Did you also experience the same problem?
Then I realized that their default FDR cut-off is 0.2, which is weird. I changed it to 0.05 as most people normally worked with. Then the method got me thousands of up-regulated genes but 0 down-regulated genes....It's so different from the edgeR and DESeq results.

What FDR cutoff do you normally use in your practice?

Thanks a lot

PS:
The R code I used on SAMseq is:
samfit <- SAMseq(g.counts, group, resp.type = "Two class unpaired", fdr.output = 0.05);
#their default setting is fdr.output = 0.2

**kopi-o** · 06-01-2012, 01:04 PM

Interesting - I also see only up-regulated genes in the data set I am using SAMSeq on - maybe it's some kind of bug?

No, I don't get too many significantly DE genes - for my data set SAMSeq is far more conservative than edgeR and baySeq (the two others I've tried). But its results make more sense when I look at them on a case by case basis.

I usually use FDR < 0.05.

**slowsmile** · 06-01-2012, 01:15 PM

In my case, edgeR and DESeq gave me 3000~5000 up-regulated genes and ~1000 down-regulated genes, while SAMseq got me 9000 up-regulated but no down-regulated ones based on FDR cutoff of 0.05.
I used raw gene counts from HTseq as input data. In my mind, the SAMSeq result is way off track...

I have 3 biological replicates in each group. Do you think the sample size plays a role in the parametric vs nonparametric method descrepancy?

Also, do you couple your DE gene selection process with fold change cutoff?

**kopi-o** · 06-01-2012, 01:40 PM

OK, sounds strange - as far as I know, non-parametric methods usually need more replicates than parametric ones to achieve significance. In my case, I have dozens of replicates per group.

I usually don't use a fold change cutoff but many people do.

**dietmar13** · 06-01-2012, 09:46 PM

npSeq

three biological replicates seems to less for SAMseq.

if you have problems with the distribution of up- and down-regulated genes you could try npSeq (very similar algorithm but npSeq uses symmetric cutoffs for the nonparametric statistic, while SAMseq uses asymmetric cutoffs).:

http://www.stanford.edu/~junli07/npSeq/

i had no problem with the distribution of up- and downregulated genes (12 vs 12 matched pairs) and got more significant genes compared to all other methods, and the obtained gene list looked (e.g. pathway analysis)
meaningfull regarding the raised biological question.

**nickschurch** · 07-27-2012, 05:32 AM

Originally posted by kopi-o View Post

Based on reading the paper (http://www-stat.stanford.edu/~tibs/ftp/Li_Tibs.pdf), they definitely developed the method with raw counts in mind, and I have used it with some success in that way. However since it appears to be strictly based on rank (non parametric) statistics, it should in principle work on RPKM too, I think.

I think SAMSeq can work with a range of data. Section 10.1 of the SAM manual (http://www-stat.stanford.edu/~tibs/SAM/sam.pdf) lists several response formats, including Quantitative, Two Class, Paired etc... all of which have different coding formats and apply to different data types from different experimental setups. help(SAMseq) in R shows that is has a corresponding resp.type attribute.

The 'Two class unpaired' and 'Paired' options here look typical for common formats of RNA-Seq experiments.

Section 10.5 also indicated that "the user is required to normalize the data from the different experiments before running SAM".

**Paul_McMurdie** · 12-11-2015, 03:16 PM

Don't use RPKM, or other normalized counts for SAMSeq

For SAMSeq, although it is a nonparametric method, it does nevertheless expect original counts, not normalized counts. This is evident in a careful reading of the SAMSeq article, and also explicitly stated in the "npSeq" instructions:

"The normalization will be done by npSeq. RPKM cannot be used as the input data matrix"
--Page 3

http://www3.nd.edu/~jli9/npSeq/npSeq_instructions.pdf

npSeq is a variant of SAMSeq, also written by Li, and using the same resampling algorithm.

Note that, an earlier comment citing a suggestion in the SAM manual is irrelevant, as it is referring to a microarray method in SAM. NGS Seq counts are different.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Input data structure for SAMseq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News