SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq2: Continuos variable in the design sindrle RNA Sequencing 14 05-27-2014 07:40 AM
How do I analyze double nested design? (DESeq2) Sciurus Bioinformatics 5 04-15-2014 02:59 AM
non balanced design in DESeq2 libro.s Bioinformatics 1 01-10-2014 05:10 AM
RNA-Seq analysis: Paired design with mutliple time points griffon42 Bioinformatics 0 04-08-2013 07:49 AM
EdgeR design-matrix design extended.wobble RNA Sequencing 3 07-11-2011 06:58 AM

Reply
 
Thread Tools
Old 05-26-2014, 07:01 PM   #1
KristenC
Junior Member
 
Location: New Zealand

Join Date: Nov 2013
Posts: 7
Default Paired design versus unpaired design in DESeq2

Dear all,
I am assessing differential expression between tumour and normal gland samples from the same subjects using DESeq2 and have run into a problem using the paired design.
The dataset consists of 19 matched tumour and gland samples from subjects and 2 gland samples from 1 healthy subject all run on several miseq runs.
For the unpaired design, the conditions “Tumour” and “Gland” were used for the differential expression analysis which resulted in 19 replicates for the Tumour condition and 21 replicates for the Gland condition. Running DESeq2 on this dataset caused refitting of outliers detected using Cook’s distance in 7 genes. The R command output is listed below.

Code:
> Panel2Design
       condition3  condition
S26_T         S26     Tumour
S26_a         S26      Gland
S01_T         S01     Tumour
S28_a         S28      Gland
S02_T         S02     Tumour
S02_a         S02      Gland
S09_T         S09     Tumour
S09_a         S09      Gland
S19_T         S19     Tumour
S19_a         S19      Gland
S20_T         S20     Tumour
S20_a         S20      Gland
S21_T         S21     Tumour
S21_a         S21      Gland
S23_T         S23     Tumour
S23_a         S23      Gland
S24_T         S24     Tumour
S24_a         S24      Gland
S32_T         S32     Tumour
S32_a         S32      Gland
S33_T         S33     Tumour
S33_a         S33      Gland
S34_T         S34     Tumour
S34_a         S34      Gland
S35_T         S35     Tumour
S35_a         S35      Gland
S36_T         S36     Tumour
S36_a         S36      Gland
S37_T         S37     Tumour
S37_a         S37      Gland
S38_T         S38     Tumour
S38_a         S38      Gland
S39_T         S39     Tumour
S39_a         S39      Gland
S40_T         S40     Tumour
S40_a         S40      Gland
S41_T         S41     Tumour
S41_a         S41      Gland
S22_a         S22      Gland
S22_a         S22      Gland

> Panel2CDS=DESeqDataSetFromMatrix(countData=Panel2,colData=Panel2Design,design=~condition)
> Panel2CDS_1=DESeq(Panel2CDS)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 7 genes
-- DESeq argument ‘minReplicatesForReplace’ = 7
-- original counts are preserved in counts(dds))
estimating dispersions
fitting model and testing
For the paired design, an additional condition column (“condition3”) was created in the colData file to indicate which samples were from the same subject. DESeq2 was then run using the second condition in addition to the first condition, as per the R command output listed below. At this point, no outliers were removed/refitted and the GLM fit did not converge.

Code:
> Panel2pCDS=DESeqDataSetFromMatrix(countData=Panel2,colData=Panel2Design,design=~condition3+condition)
> Panel2pCDS2_1=DESeq(Panel2pCDS)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: algorithm did not converge
3: glm.fit: algorithm did not converge
I assume the Cook’s distance outlier detection is not working because under the second condition for each subject there is only 1 replicate for the tumour and 1 replicate for the gland and Cook’s only kicks in from at least 3 replicates. But this now causes the GLM not to fit anymore, therefore should I be worried about the dispersion estimates? Or should I just opt for a different fit-type (local or mean)? How do I know which one is better?
I thought the paired design would improve the power for the differential expression testing. How do I know that it did? I am detecting slightly more differentially expressed genes, but could this be due to the fact that some outliers were not removed via Cook’s Cutoff and a worse dispersion estimation?

FYI Settings used:
Code:
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_New Zealand.1252  LC_CTYPE=English_New Zealand.1252   
[3] LC_MONETARY=English_New Zealand.1252 LC_NUMERIC=C                        
[5] LC_TIME=English_New Zealand.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.4.5            RcppArmadillo_0.4.300.0 Rcpp_0.11.1             GenomicRanges_1.16.3   
 [5] GenomeInfoDb_1.0.2      IRanges_1.22.6          DESeq_1.16.0            lattice_0.20-29        
 [9] locfit_1.5-9.1          Biobase_2.24.0          BiocGenerics_0.10.0    

loaded via a namespace (and not attached):
 [1] annotate_1.42.0      AnnotationDbi_1.26.0 DBI_0.2-7            genefilter_1.46.1   
 [5] geneplotter_1.42.0   grid_3.1.0           RColorBrewer_1.0-5   RSQLite_0.11.4      
 [9] splines_3.1.0        stats4_3.1.0         survival_2.37-7      tools_3.1.0         
[13] XML_3.98-1.1         xtable_1.7-3         XVector_0.4.0
KristenC is offline   Reply With Quote
Old 05-29-2014, 11:05 AM   #2
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

hi Kristen,

Yes, you're right about the reason for the outlier replacement not taking place, because with the design including the subject no longer has replicates for each combination of the variable levels.

The glm.fit warning is not very informative here, and I should catch this warning internally to not confuse users.

This warning is not from the main GLM of the model, but from the parametric fit of the curve through dispersion estimates (the red line in Figure 7 in the main vignette). You can use fitType="local". This is typically very similar to using fitType="parametric". Good to check

plotDispEsts(dds)

to make sure the curve is a good fit to the "cloud" of dispersion estimates.
Michael Love is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:07 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO