Hi, I am using DES-seq to identify the DE genes from mouse RNA-seq datasets. When I used the estimateDispersions( cds ), only less than 20 genes were identified. I then change the option setting as estimateDispersions( cds, sharingMode=”fit-only” ), I got about 1000 genes. Could someone help me why the results are so different? Are they really meaningful? By the way, I only got about 10 genes when cuffdiff() was used. The mapping files (bam) were generated by tophat2 and processed by htseq-count for the countTable. I had 5 biological replicates for samples and 3 biological replicates for controls. Thanks a lot.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Read the vignette and other information:
After the empirical dispersion values have been computed for each gene, a dispersion-mean relationship is fitted for sharing information across genes in order to reduce variability of the dispersion estimates. After that, for each gene, we have two values: the empirical value (derived only from this gene's data), and the fitted value (i.e., the dispersion value typical for genes with an average expression similar to those of this gene). The sharingMode argument specifies which of these two values will be written to the featureData's disp_ columns and hence will be used by the functions nbinomTest and fitNbinomGLMs.
fit-only - use only the fitted value, i.e., the empirical value is used only as input to the fitting, and then ignored. Use this only with very few replicates, and when you are not too concerned about false positives from dispersion outliers, i.e. genes with an unusually high variability.
maximum - take the maximum of the two values. This is the conservative or prudent choice, recommended once you have at least three or four replicates and maybe even with only two replicates.
gene-est-only - No fitting or sharing, use only the empirical value. This method is preferable when the number of replicates is large and the empirical dispersion values are sufficiently reliable. If the number of replicates is small, this option may lead to many cases where the dispersion of a gene is accidentally underestimated and a false positive arises in the subsequent testing.
The default I believe is "maximum". Using the "fit-only" argument increases the number of false positives.
You have 3 and 5 replicates. Therefore you should stick to "maximum". The increased number of DE genes you get from "fit-only" are most likely all false positives.Last edited by chadn737; 08-22-2012, 08:09 AM.
-
Originally posted by chadn737 View PostRead the vignette and other information:
After the empirical dispersion values have been computed for each gene, a dispersion-mean relationship is fitted for sharing information across genes in order to reduce variability of the dispersion estimates. After that, for each gene, we have two values: the empirical value (derived only from this gene's data), and the fitted value (i.e., the dispersion value typical for genes with an average expression similar to those of this gene). The sharingMode argument specifies which of these two values will be written to the featureData's disp_ columns and hence will be used by the functions nbinomTest and fitNbinomGLMs.
fit-only - use only the fitted value, i.e., the empirical value is used only as input to the fitting, and then ignored. Use this only with very few replicates, and when you are not too concerned about false positives from dispersion outliers, i.e. genes with an unusually high variability.
maximum - take the maximum of the two values. This is the conservative or prudent choice, recommended once you have at least three or four replicates and maybe even with only two replicates.
gene-est-only - No fitting or sharing, use only the empirical value. This method is preferable when the number of replicates is large and the empirical dispersion values are sufficiently reliable. If the number of replicates is small, this option may lead to many cases where the dispersion of a gene is accidentally underestimated and a false positive arises in the subsequent testing.
The default I believe is "maximum". Using the "fit-only" argument increases the number of false positives.
You have 3 and 5 replicates. Therefore you should stick to "maximum". The increased number of DE genes you get from "fit-only" are most likely all false positives.
I also applied same analysis to those 5 replicated samples with another 5 replicate samples in different biological conditions. With the default setting, I got zero significant DE gene. I got about 700 genes with the setting changed to fit-only. I am attaching a graphic for the estimated dispersions. Is it in very high variability? so, it could justify the fit-only usage? Thanks.Attached Files
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
Yesterday, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
52 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
45 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment