SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Samtools "is recognized as '*'" "truncated file" error axiom7 Bioinformatics 3 11-26-2014 03:53 AM
"allele balance ratio" and "quality by depth" in VCF files efoss Bioinformatics 2 10-25-2011 12:13 PM
The position file formats ".clocs" and "_pos.txt"? Ist there any difference? elgor Illumina/Solexa 0 06-27-2011 08:55 AM
"Systems biology and administration" & "Genome generation: no engineering allowed" seb567 Bioinformatics 0 05-25-2010 01:19 PM
SEQanswers second "publication": "How to map billions of short reads onto genomes" ECO Literature Watch 0 06-30-2009 12:49 AM

Reply
 
Thread Tools
Old 09-24-2012, 07:03 PM   #1
fpadilla
Francisco Padilla O.
 
Location: Columbus Ohio

Join Date: Jul 2011
Posts: 5
Default DEXSeq error in estimateDispersions: match.arg(start.method, c("log(y)", "mean"))

Hey Guys:

I got the following error many times while calling estimateDispersions:

Error in match.arg(start.method, c("log(y)", "mean")) : 'arg' must be NULL or a character vector

Below is the code I'm runing, the weird thing is execution of estimateDispersions is finished and continue to fitDispersionFunction, when it is stopped because estimateDispersions failed.

Code:
exonStats = read.HTSeqCounts(
                countfiles = file.path("./", paste(rownames(samples),"HTSeqCount.txt",sep=".") ),
		design = samples, flattenedfile = annotationFile)

exonStats <- estimateSizeFactors(exonStats)
exonStats <- estimateDispersions(exonStats, quiet=TRUE) # Error is here
exonStats <- fitDispersionFunction(exonStats) # Execution stop here
Also, I have a few datasets for testing and still I get error. Here is the experiment design

Code:
> samples
       condition replicate        type
31 4cm from base         1 single-read
32 4cm from base         2 single-read
33           tip         1 single-read
34           tip         2 single-read
This is the session info:
Code:
> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  tools     stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] multicore_0.1-7 DEXSeq_1.0.2    Biobase_2.14.0 

loaded via a namespace (and not attached):
[1] hwriter_1.3    plyr_1.7.1     statmod_1.4.15 stringr_0.6.1
I'll appreciate your help !

Francisco
fpadilla is offline   Reply With Quote
Old 09-28-2012, 11:25 AM   #2
fpadilla
Francisco Padilla O.
 
Location: Columbus Ohio

Join Date: Jul 2011
Posts: 5
Default

For the record:

I upgraded R version to 2.15.1 and Bioconductor to 2.10 and the error doesn't ocurr any more, but now I get a new error at estimatelog2FoldChanges function:

Error: subscript out of bounds

exonStats <- estimatelog2FoldChanges( exonStats )

Code:
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  tools     stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] multicore_0.1-7    DEXSeq_1.2.1       Biobase_2.16.0     BiocGenerics_0.2.0

loaded via a namespace (and not attached):
[1] biomaRt_2.12.0 hwriter_1.3    plyr_1.7.1     RCurl_1.91-1   statmod_1.4.15
[6] stringr_0.6.1  XML_3.6-2

Last edited by fpadilla; 09-28-2012 at 11:28 AM.
fpadilla is offline   Reply With Quote
Old 10-01-2012, 04:44 AM   #3
areyes
Senior Member
 
Location: Heidelberg

Join Date: Aug 2010
Posts: 165
Default

Hi Francisco,

Can you post the error message that you are getting now?
areyes is offline   Reply With Quote
Old 10-09-2012, 11:49 AM   #4
katzman
Junior Member
 
Location: CA

Join Date: Jun 2011
Posts: 2
Default

I am a longtime DESeq user (and a big fan).

On my first attempt with DEXSeq, I got the same message as fpadilla while calling estimateDispersions():

Error in match.arg(start.method, c("log(y)", "mean")) : 'arg' must be NULL or a character vector

Thanks to fpadilla for the suggestion to use R 2.15.1

That eliminated that problem. Fortunately, I have not seen the subsequent problem with estimatelog2FoldChanges(). I am not (yet) running multicore.

My sessionInfo is below.

/Sol.

Code:
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DEXSeq_1.2.1       Biobase_2.16.0     BiocGenerics_0.2.0

loaded via a namespace (and not attached):
[1] RCurl_1.95-0.1.2 XML_3.95-0.1     biomaRt_2.12.0   hwriter_1.3      plyr_1.7.1       statmod_1.4.16   stringr_0.6.1    tools_2.15.1
katzman is offline   Reply With Quote
Old 10-11-2012, 02:38 PM   #5
fpadilla
Francisco Padilla O.
 
Location: Columbus Ohio

Join Date: Jul 2011
Posts: 5
Default

I figurate out! The error is because still I have the replicate column in the design data.frame. Also I have space characters in the values of the condition column.
fpadilla is offline   Reply With Quote
Old 12-03-2012, 09:22 AM   #6
zimmernv
Junior Member
 
Location: Dallas, Texas

Join Date: Dec 2012
Posts: 4
Default

Hey Francisco,

I have the same "Error: match.arg()" that you had previously. I have the most up-to-date versions of R (2.15.2), DEXSeq (1.4.0), and Bioconductor (2.11) -- or so I think. But the error does not disappear.

My ecs variable has valid count data (not the 0's or NULL values that have been known to occur). So can you explain in greater detail what you mean by "I still have replicate column in the design data.frame" and "space characters in the values of the condition column"? Should you not have a replicate column when you use estimateDispersions()? My design definitely still has that column but I never saw any indication in the latest vignette to remove that information.

Could you show how you wrote your "design"? I have two controls and two transgenics, so I just wrote:

> samples = data.frame(condition = c(rep("control", 2), rep("transgenic", 2)), replicate = c(1:2, 1:2), row.names = countfiles, stringsAsFactors=TRUE, check.names=FALSE)

where 'countfiles' is defined earlier in my script.

I welcome any help. Thanks in advance.
zimmernv is offline   Reply With Quote
Old 12-03-2012, 01:07 PM   #7
areyes
Senior Member
 
Location: Heidelberg

Join Date: Aug 2010
Posts: 165
Default

Dear zimmernv,

To remove the "replicate" column, instead of:

Code:
> samples = data.frame(condition = c(rep("control", 2), rep("transgenic", 2)), replicate = c(1:2, 1:2), row.names = countfiles, stringsAsFactors=TRUE, check.names=FALSE)
just do

Code:
> samples = data.frame(condition = c(rep("control", 2), rep("transgenic", 2)), row.names = countfiles, stringsAsFactors=TRUE, check.names=FALSE)
In fact, you are correct in saying that you did not see any change in the documentation, I should have added that. The exact change is that all the columns of the design data frame should be factors, so it would also solve the problem to convert your "replicate" column into a factor.

Let me know if that solves the problem, if not, could you add the complete (reproducible) code that you used and the complete output of the sessionInfo() ?

Alejandro
areyes is offline   Reply With Quote
Old 12-12-2012, 01:49 PM   #8
zimmernv
Junior Member
 
Location: Dallas, Texas

Join Date: Dec 2012
Posts: 4
Default thanks ... one more question

Hey Alejandro,

Thanks for the response. That fixed the issue and I can now visualize my results.

Somewhat astonishingly, the differential exon usage test does not yield any positive differential splicing results for my dataset (2 replicates for both the controls and the transgenics, with very high count numbers). For a whole-transcriptome 45 million reads (i.e. decent coverage) RNA-seq experiment, not seeing a single alternative splicing event is a bit odd, don't you think?

One area of concern is the use of the non-overlapping exonic regions made via the dexseq_prepare_annotation.py script. I can understand the motivation for creating these regions, but I'm struggling to see how I can then convert the final output (counts assigned to new but non-biological exonic regions) to biologically-meaningful outputs (i.e. deconvolving the counts over these exonic regions into counts over the exons that helped to build the exonic region). To take an example, the Apoer2 gene in M. musculus has 20 exons. But the dexseq script produces a gff file that lists Apoer2 as having 40 exonic regions due to the overlaps of transcripts. The DEXSeq plots then show 40 exons rather than the conventional 20. How can I get an image that would reveal the counts over the 20 exons? Should I manually change the exonCountSet to reflect the biological reality?

To summarize the difficulty I'm having: DEXSeq, as I see it, is testing for Differential Exonic Regions rather than Differential Exons (where only the latter is biologically-meaningful, or so I think ... I come from the computer science side more so than biology).

I eagerly await your critique and insights.
zimmernv is offline   Reply With Quote
Old 12-14-2012, 01:25 AM   #9
areyes
Senior Member
 
Location: Heidelberg

Join Date: Aug 2010
Posts: 165
Default

Hey zimmernv,

Is there any obvious change that you would expect any differential exon usage regulation between the transgenics and the controls? (e.g. the inserted gene being somehow related to the splicing machinery?)

I guess the exon bining is an attempt to test for all possible exon usage regulation according by testing all unique non-overlapping exonic parts from any transcript, but at the end its an optional step. You could input in DEXSeq with "real" exons. My guess is that it won't make a huge difference, but let me know if I am wrong!

Alejandro
areyes is offline   Reply With Quote
Old 12-14-2012, 03:04 AM   #10
areyes
Senior Member
 
Location: Heidelberg

Join Date: Aug 2010
Posts: 165
Default

Also you could check if you have enough counts in the exons to achieve statistical power to detect something, data quality or if you have a lot of variability between replicates!

Last edited by areyes; 12-14-2012 at 03:08 AM. Reason: wording
areyes is offline   Reply With Quote
Old 12-19-2012, 01:41 PM   #11
zimmernv
Junior Member
 
Location: Dallas, Texas

Join Date: Dec 2012
Posts: 4
Default

Quote:
Originally Posted by areyes View Post
Hey zimmernv,

Is there any obvious change that you would expect any differential exon usage regulation between the transgenics and the controls? (e.g. the inserted gene being somehow related to the splicing machinery?)

I guess the exon bining is an attempt to test for all possible exon usage regulation according by testing all unique non-overlapping exonic parts from any transcript, but at the end its an optional step. You could input in DEXSeq with "real" exons. My guess is that it won't make a huge difference, but let me know if I am wrong!

Alejandro
Hey Alejandro,

I tried it again with "real" exons, and I got 1 differentially-expressed exon rather than 0. So, as you implied, the difference is not substantial (though that 1 hit is very interesting as far as the biology is concerned).

I was expecting a more pronounced change in exon regulation -- the gene that is inserted is a known splicing factor, so a single event of differential splicing is a bit unexpected. I'm using BitSeq at the moment to see if I can reproduce the results.

Speaking of comparisons, I ran into a thread on SeqAnswers in which S. Anders commented that a golden standard for gene expression and exon expression would be to have 20 replicates of one condition vs. 20 replicates of another condition and then using inferential statistics to find the significant hits. I'm curious about the number 20 -- is there a statistical reason for that value? If it can spare you a long technical explanation, could you send me the titles of the relevant papers?

Thanks a ton,
Vincent
zimmernv is offline   Reply With Quote
Old 12-26-2012, 11:18 AM   #12
fpadilla
Francisco Padilla O.
 
Location: Columbus Ohio

Join Date: Jul 2011
Posts: 5
Default

Sorry guys. I missed this post, I'm glad you have fixed it!
fpadilla is offline   Reply With Quote
Old 12-27-2012, 02:21 AM   #13
Wolfgang Huber
Senior Member
 
Location: Heidelberg, Germany

Join Date: Aug 2009
Posts: 108
Default

I am pretty sure Simon used '20' just to indicate a sufficiently large number, so that the power of standard tests (e.g. Wilcoxon, t- or permutation tests) is sufficient to detect every relevant differentially expressed feature.
__________________
Wolfgang Huber
EMBL
Wolfgang Huber is offline   Reply With Quote
Old 06-07-2013, 03:22 PM   #14
muthu545
Member
 
Location: san antonio

Join Date: Jul 2011
Posts: 32
Default

Hi,

I'm am novice to using DEXSeq.
@zimmernv and @Alejandro
How could you get the flattened GTF file with 'real' exons instead of the exonic parts?

Thanks
--
Muthu

Quote:
Originally Posted by areyes

I guess the exon bining is an attempt to test for all possible exon usage regulation according by testing all unique non-overlapping exonic parts from any transcript, but at the end its an optional step. You could input in DEXSeq with "real" exons. My guess is that it won't make a huge difference, but let me know if I am wrong!

Quote:
Originally Posted by zimmernv

I tried it again with "real" exons, and I got 1 differentially-expressed exon rather than 0. So, as you implied, the difference is not substantial (though that 1 hit is very interesting as far as the biology is concerned).
muthu545 is offline   Reply With Quote
Old 07-03-2013, 03:11 PM   #15
muthu545
Member
 
Location: san antonio

Join Date: Jul 2011
Posts: 32
Default

Areyes and Zimmernv,

Could you let me know, how could I get flattened GTF file with 'real' exons instead of the exonic parts?

Thanks
--
Muthu
muthu545 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:55 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO