SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cufflinks, differentially expressed genes statsteam Bioinformatics 5 11-15-2013 11:28 AM
DESeq not giving any differentially expressed genes in RNA seq Artur Jaroszewicz Bioinformatics 30 11-13-2012 01:49 AM
DESeq: "NA" generated in the resulted differentially expressed genes idyll_ty RNA Sequencing 8 05-02-2012 03:28 PM
DEseq: zero differentially expressed regions found rebrendi Bioinformatics 3 12-22-2011 05:00 AM
DESeq and EdgeR: too many differentially expressed genes!?!? cutcopy11 Bioinformatics 5 12-08-2011 12:14 AM

Reply
 
Thread Tools
Old 08-22-2012, 07:34 AM   #1
bliu1
Junior Member
 
Location: texas

Join Date: Feb 2011
Posts: 3
Default very different numbers of differentially expressed genes by DESeq

Hi, I am using DES-seq to identify the DE genes from mouse RNA-seq datasets. When I used the estimateDispersions( cds ), only less than 20 genes were identified. I then change the option setting as estimateDispersions( cds, sharingMode=”fit-only” ), I got about 1000 genes. Could someone help me why the results are so different? Are they really meaningful? By the way, I only got about 10 genes when cuffdiff() was used. The mapping files (bam) were generated by tophat2 and processed by htseq-count for the countTable. I had 5 biological replicates for samples and 3 biological replicates for controls. Thanks a lot.
bliu1 is offline   Reply With Quote
Old 08-22-2012, 08:07 AM   #2
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Read the vignette and other information:

After the empirical dispersion values have been computed for each gene, a dispersion-mean relationship is fitted for sharing information across genes in order to reduce variability of the dispersion estimates. After that, for each gene, we have two values: the empirical value (derived only from this gene's data), and the fitted value (i.e., the dispersion value typical for genes with an average expression similar to those of this gene). The sharingMode argument specifies which of these two values will be written to the featureData's disp_ columns and hence will be used by the functions nbinomTest and fitNbinomGLMs.

fit-only - use only the fitted value, i.e., the empirical value is used only as input to the fitting, and then ignored. Use this only with very few replicates, and when you are not too concerned about false positives from dispersion outliers, i.e. genes with an unusually high variability.

maximum - take the maximum of the two values. This is the conservative or prudent choice, recommended once you have at least three or four replicates and maybe even with only two replicates.

gene-est-only - No fitting or sharing, use only the empirical value. This method is preferable when the number of replicates is large and the empirical dispersion values are sufficiently reliable. If the number of replicates is small, this option may lead to many cases where the dispersion of a gene is accidentally underestimated and a false positive arises in the subsequent testing.


The default I believe is "maximum". Using the "fit-only" argument increases the number of false positives.

You have 3 and 5 replicates. Therefore you should stick to "maximum". The increased number of DE genes you get from "fit-only" are most likely all false positives.

Last edited by chadn737; 08-22-2012 at 08:09 AM.
chadn737 is offline   Reply With Quote
Old 08-22-2012, 08:41 AM   #3
bliu1
Junior Member
 
Location: texas

Join Date: Feb 2011
Posts: 3
Default

Quote:
Originally Posted by chadn737 View Post
Read the vignette and other information:

After the empirical dispersion values have been computed for each gene, a dispersion-mean relationship is fitted for sharing information across genes in order to reduce variability of the dispersion estimates. After that, for each gene, we have two values: the empirical value (derived only from this gene's data), and the fitted value (i.e., the dispersion value typical for genes with an average expression similar to those of this gene). The sharingMode argument specifies which of these two values will be written to the featureData's disp_ columns and hence will be used by the functions nbinomTest and fitNbinomGLMs.

fit-only - use only the fitted value, i.e., the empirical value is used only as input to the fitting, and then ignored. Use this only with very few replicates, and when you are not too concerned about false positives from dispersion outliers, i.e. genes with an unusually high variability.

maximum - take the maximum of the two values. This is the conservative or prudent choice, recommended once you have at least three or four replicates and maybe even with only two replicates.

gene-est-only - No fitting or sharing, use only the empirical value. This method is preferable when the number of replicates is large and the empirical dispersion values are sufficiently reliable. If the number of replicates is small, this option may lead to many cases where the dispersion of a gene is accidentally underestimated and a false positive arises in the subsequent testing.


The default I believe is "maximum". Using the "fit-only" argument increases the number of false positives.

You have 3 and 5 replicates. Therefore you should stick to "maximum". The increased number of DE genes you get from "fit-only" are most likely all false positives.
Thanks a lot for your reply. You are right fit-only potentially incease the number of false positive. however, I didn't expect the difference was so big.
I also applied same analysis to those 5 replicated samples with another 5 replicate samples in different biological conditions. With the default setting, I got zero significant DE gene. I got about 700 genes with the setting changed to fit-only. I am attaching a graphic for the estimated dispersions. Is it in very high variability? so, it could justify the fit-only usage? Thanks.
Attached Images
File Type: png cds_default_estDisp.png (38.8 KB, 24 views)
bliu1 is offline   Reply With Quote
Reply

Tags
deseq, estimatedipersions, fit-only, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:42 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO