SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq2 error : the model matrix is not full rank NicoBxl Bioinformatics 9 03-14-2017 06:10 AM
model matrix design for multiple siRNAs targeting same gene Brandi Bioinformatics 2 02-27-2014 04:18 AM
EdgeR model matrix interpretation sindrle Bioinformatics 4 10-24-2013 03:28 AM
FPKM formula g781 RNA Sequencing 2 10-18-2013 01:41 AM
SNPdat: Easy annotation of novel and known SNPs from model and non-model organisms d1antho Bioinformatics 0 03-15-2013 08:58 AM

Reply
 
Thread Tools
Old 10-30-2014, 01:29 PM   #21
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

You can use any valid design except ~ 1, this design won't be used because you provide the model matrix at each step.

Last edited by Michael Love; 10-31-2014 at 07:00 AM.
Michael Love is offline   Reply With Quote
Old 10-30-2014, 01:47 PM   #22
lmolokin
Member
 
Location: Beltsville, MD, USA

Join Date: Jul 2012
Posts: 24
Default

Thanks!

Just ran through the workflow without a hitch.

My next question is how do I specify different contrasts since resultsNames(dds) now only gives me just [1] Intercept?
lmolokin is offline   Reply With Quote
Old 10-31-2014, 07:03 AM   #23
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

hi,

I just tried this out and found a bug, associated with using a design of ~ 1. You can use any valid design other than this one (~ tx), and it should work. I plan to support this functionality at a higher level in the next release, and in the process will add unit tests for situations like this!
Michael Love is offline   Reply With Quote
Old 10-31-2014, 07:37 AM   #24
lmolokin
Member
 
Location: Beltsville, MD, USA

Join Date: Jul 2012
Posts: 24
Default

Thanks for the help thus far, Michael.

Is there a way to insert the modified model.matrix into the dds object in order to test contrasts from modelMatrix=m?

In other words, so that when I call resultsNames(dds) I will see all of the interactions terms from ~tx+tx:subj+tx:time instead of just what the formula placeholder (~tx) gives?
lmolokin is offline   Reply With Quote
Old 10-31-2014, 07:45 AM   #25
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

Unfortunately, this trick won't work beyond the simple LRT results table for the current release. Contrasts on user provided model matrices aren't possible with the current release code. Update 2015.01.05 below

It's relatively easy for me to implement, but this has to be done in the development branch. I will likely implement this in Nov or Dec.

Last edited by Michael Love; 01-05-2015 at 02:40 PM.
Michael Love is offline   Reply With Quote
Old 10-31-2014, 08:29 AM   #26
lmolokin
Member
 
Location: Beltsville, MD, USA

Join Date: Jul 2012
Posts: 24
Default

I see.

Is there no way to specify a model matrix within the DESeqDataSetFromMatrix function in lieu of a formula?
lmolokin is offline   Reply With Quote
Old 10-31-2014, 09:50 AM   #27
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

I tried an example, and you can use numeric or list style contrasts after nbinomLRT with a user-supplied model matrix in version 1.6. Like so:

Code:
res2 <- results(dds, test="Wald", contrast=c(0,0,1))
res2 <- results(dds, test="Wald", contrast=list("conditionB"))
Michael Love is offline   Reply With Quote
Old 10-31-2014, 10:49 AM   #28
lmolokin
Member
 
Location: Beltsville, MD, USA

Join Date: Jul 2012
Posts: 24
Default

I'm not sure I understand where the "user specified model matrix" is applied to dds.

I create the dds object with a place-holder formula here:
Code:
dds <- DESeqDataSetFromMatrix(countData = countData,
                              colData = designframe,
                              formula(~tx))
Then create the actual model and supply it for dispersion estimations here:
Code:
m <- model.matrix(~tx+tx:subj+tx:time)
colnames(m)
m <- m[,-24] # get rid of the 0's column
dds <- estimateSizeFactors(dds)
dds <- estimateDispersionsGeneEst(dds, modelMatrix=m)
dds <- estimateDispersionsFit(dds)
dds <- estimateDispersionsMAP(dds, modelMatrix=m)
dds <- nbinomWaldTest(dds)
But then inputting:
Code:
res_ct2.ct1 <- results(dds, test="Wald", contrast=list("txc.time2","txc.time1"))
...isn't valid because resultsNames(dds) doesn't contain interactions from modelMatrix=m, it only has the elements given by the place-holder formula ~tx.
lmolokin is offline   Reply With Quote
Old 10-31-2014, 11:00 AM   #29
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

Ah, now I see the problem. You can't use nbinomWaldTest(dd) in the last step. You need to use all the lines of code from the thread I pointed you to, which was Update 2015.01.05 below

Code:
dds <- estimateSizeFactors(dds)
dds <- estimateDispersionsGeneEst(dds, modelMatrix=m)
dds <- estimateDispersionsFit(dds)
dds <- estimateDispersionsMAP(dds, modelMatrix=m)
dds <- nbinomLRT(dds, full=m, reduced=m[,-idx])
where idx is a numeric vector of columns to remove from the model matrix for the LRT. There is no support with nbinomWaldTest in the current version to supply model matrices, but I've promised to work on this in devel.

Last edited by Michael Love; 01-06-2015 at 05:30 AM.
Michael Love is offline   Reply With Quote
Old 10-31-2014, 11:25 AM   #30
lmolokin
Member
 
Location: Beltsville, MD, USA

Join Date: Jul 2012
Posts: 24
Default

You're right! It appears to have worked now.

I received this message after running the LRT: 32 rows did not converge in beta, labelled in mcols(object)$fullBetaConv. Use larger maxit argument with nbinomLRT.

Does this suggest that I need to rerun it with different nbinomLRT parameters? Not sure what to make of the message.

Thanks.
lmolokin is offline   Reply With Quote
Old 10-31-2014, 12:33 PM   #31
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

Fitting a GLM is an iterative process, and so you need to define a stopping criteria. We look for relative changes to the likelihood to define convergence. Sometimes the GLM doesn't converge for some rows when the counts are mostly zero and with very sparse model matrices (although in the recent releases, typically all rows converge). You can likely ignore this message, or remove these rows from the results if you like: res = res[mcols(dds)$fullBetaConv,]

What package version are you using: packageVersion("DESeq2")?
Michael Love is offline   Reply With Quote
Old 10-31-2014, 12:40 PM   #32
lmolokin
Member
 
Location: Beltsville, MD, USA

Join Date: Jul 2012
Posts: 24
Default

Gotcha. Session info below. Really appreciate your help, Michael!

Aleksey

Code:
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
character(0)

other attached packages:
[1] DESeq2_1.6.1

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3           annotate_1.44.0           AnnotationDbi_1.28.1     
 [4] base_3.1.1                base64enc_0.1-2           BatchJobs_1.4            
 [7] BBmisc_1.7                Biobase_2.26.0            BiocGenerics_0.12.0      
[10] BiocInstaller_1.16.0      BiocParallel_1.0.0        brew_1.0-6               
[13] checkmate_1.5.0           cluster_1.15.2            codetools_0.2-8          
[16] colorspace_1.2-4          datasets_3.1.1            DBI_0.3.1                
[19] digest_0.6.4              edgeR_3.8.2               fail_1.2                 
[22] foreach_1.4.2             foreign_0.8-61            Formula_1.1-2            
[25] genefilter_1.48.1         geneplotter_1.44.0        GenomeInfoDb_1.2.2       
[28] GenomicRanges_1.18.1      ggplot2_1.0.0             graphics_3.1.1           
[31] grDevices_3.1.1           grid_3.1.1                gtable_0.1.2             
[34] Hmisc_3.14-5              IRanges_2.0.0             iterators_1.0.7          
[37] labeling_0.3              lattice_0.20-29           latticeExtra_0.6-26      
[40] limma_3.22.1              locfit_1.5-9.1            MASS_7.3-33              
[43] methods_3.1.1             munsell_0.4.2             nnet_7.3-8               
[46] parallel_3.1.1            plyr_1.8.1                proto_0.3-10             
[49] RColorBrewer_1.0-5        Rcpp_0.11.3               RcppArmadillo_0.4.450.1.0
[52] reshape2_1.4              rpart_4.1-8               RSQLite_1.0.0            
[55] S4Vectors_0.4.0           scales_0.2.4              sendmailR_1.2-1          
[58] splines_3.1.1             stats_3.1.1               stats4_3.1.1             
[61] stringr_0.6.2             survival_2.37-7           tools_3.1.1              
[64] utils_3.1.1               XML_3.98-1.1              xtable_1.7-4             
[67] XVector_0.6.0
lmolokin is offline   Reply With Quote
Old 10-31-2014, 01:07 PM   #33
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

Ok just curious. The convergence of these rows is not something to worry about. You can just ignore these rows or remove them using the above code.
Michael Love is offline   Reply With Quote
Old 11-10-2014, 06:59 AM   #34
lmolokin
Member
 
Location: Beltsville, MD, USA

Join Date: Jul 2012
Posts: 24
Default

Quote:
Originally Posted by Michael Love View Post
Ah, now I see the problem. You can't use nbinomWaldTest(dd) in the last step. You need to use all the lines of code from the thread I pointed you to, which was

Code:
dds <- estimateSizeFactors(dds)
dds <- estimateDispersionsGeneEst(dds, modelMatrix=m)
dds <- estimateDispersionsFit(dds)
dds <- estimateDispersionsMAP(dds, modelMatrix=m)
dds <- nbinomLRT(dds, full=m, reduced=m[,-idx])
where idx is a numeric vector of columns to remove from the model matrix for the LRT. There is no support with nbinomWaldTest in the current version to supply model matrices, but I've promised to work on this in devel.
Hi Michael,
I wanted to clarify the purpose of the reduced model in my code. Since my goal was not to compare deferentially expressed genes between a full and a reduced model, I assume that the purpose of including it above was just a work-around (a place-holder) to allow the workflow to run with a user supplied model matrix?

So in other words, when I run:
Code:
c2 <- results(dds, test="Wald", contrast=list("txc.time2"))
...the results I get are as if I had run a nbinomWaldTest?

Aleksey
lmolokin is offline   Reply With Quote
Old 11-10-2014, 07:06 AM   #35
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

Yes, for version 1.6, this call to results will give Wald test statistics and p-values. Suppling list or numeric arguments to 'contrast' will work, as will using 'name'.
Michael Love is offline   Reply With Quote
Old 01-05-2015, 02:42 PM   #36
Michael Love
Senior Member
 
Location: Boston

Join Date: Jul 2013
Posts: 333
Default

A note: DESeq2 versions >= 1.7 allow user-supplied model matrices to top level functions DESeq() and estimateDispersions(), so the above extra lines of code are no longer necessary.
Michael Love is offline   Reply With Quote
Reply

Tags
deseq2, formula, model.matrix, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO