Unconfigured Ad

**cutcopy11** · 11-27-2011, 04:24 PM

Also, I would like to mention that the liver and kidney samples were derived from data
that was used in the EdgeR vignette.

I am not sure how their gene count data set was generated.

I used data from these sources:
kidney

Illumina sequencing of Human kidney transcript fragment library - SRA - NCBI

http://www.ncbi.nlm.nih.gov/sra/SRX000605

liver

Illumina sequencing of Human liver transcript fragment library - SRA - NCBI

http://www.ncbi.nlm.nih.gov/sra/SRX000571

Each of these accession numbers contain 2 sra.lite files.
They might be technical replicates. I am not sure.

Anyway, in edgeR I did a moderated tagwise dispersion. Like in the
vignette, I set commonDisp = FALSE in the "exactTest" step and used the default
prior in the "estimateTagwiseDisp" step.

Here is a comparison of the results:
EdgeR Vignette:
4438 under a adjusted p-value of 0.05
My result:
5977 under an adjusted p-value of 0.05
5727 under an adjusted p-value of 0.01

Although these numbers under 0.05 pvalue are kind of close, I feel that
the number of differentially expressed genes under 0.01 should be much
less.

Thanks again,
Clayton

**cutcopy11** · 11-27-2011, 06:29 PM

The high number of differentially expressed genes in the 4 datasets may be due to the fact that I am essentially comparing apples and oranges.

The four datasets consist of either two different tissues or two different cell lines.
The estrogen dataset is from the same cell line but one is treated whereas the other is not. So, there is likely less differentiation.

The most differentiation occurs between the kidney and liver, the two different prostate cancer cell lines, and the two arabidopsis tissues (over 5000 genes)

Interestingly, between the mouse emybyronic stem cells and the mouse embryonic fibroblasts, there were 1000 and 2000 differentially expressed genes. These two tissues are likely more similar than the three comparisons above.

**Simon Anders** · 11-28-2011, 11:18 AM

Maybe concentrate on one of the data sets and tell u a bit more on what you did. For example, post the exact commands you typed into R.

It is also important to figure out whether you have true biological replicates. If you compare, say, two technical replicates from liver with two technical replicates from kidney, you will end up with a huge list of differentially expressed genes, which, however, is biologically completely meaningless.

**cutcopy11** · 11-28-2011, 03:09 PM

Hi Simon,

I may dig up my commands later, but I just wanted to say that when I said that DESeq reported around 7000 genes for the unfiltered estrogen dataset whereas edgeR reported around 500 to 1000 genes for adjusted pvalues between 0.01 and 0.1, this was incorrect. I realized later that many of those genes in the DESeq results had adjusted p values listed as "NA" . For an adjusted pvalue cutoff of 0.01, DESeq reported 486 genes and edgeR reported 509, which makes sense as DESeq is more conservative with low count data. Of course, you know that.

I agree with you entirely about the technical replicate issue. That makes complete sense.

-Clayton

**sdm** · 12-08-2011, 01:14 AM

edgeR - pvalue NA

Hi !

Though it is not directly related to your question, I thought I post the question to an "edgeR" thread: Using edgeR, I usually get a higher number of p values with "NA" for reasonably differentlially expressed genes. Why is that (due to TMM nolmalization?) and is there any way to get around this ?

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 45 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

DESeq and EdgeR: too many differentially expressed genes!?!?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News