Seqanswers Leaderboard Ad

**fpadilla** · 09-28-2012, 10:25 AM

For the record:

I upgraded R version to 2.15.1 and Bioconductor to 2.10 and the error doesn't ocurr any more, but now I get a new error at estimatelog2FoldChanges function:

Error: subscript out of bounds

exonStats <- estimatelog2FoldChanges( exonStats )

Code:

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  tools     stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] multicore_0.1-7    DEXSeq_1.2.1       Biobase_2.16.0     BiocGenerics_0.2.0

loaded via a namespace (and not attached):
[1] biomaRt_2.12.0 hwriter_1.3    plyr_1.7.1     RCurl_1.91-1   statmod_1.4.15
[6] stringr_0.6.1  XML_3.6-2

**areyes** · 10-01-2012, 03:44 AM

Hi Francisco,

Can you post the error message that you are getting now?

**katzman** · 10-09-2012, 10:49 AM

I am a longtime DESeq user (and a big fan).

On my first attempt with DEXSeq, I got the same message as fpadilla while calling estimateDispersions():

Error in match.arg(start.method, c("log(y)", "mean")) : 'arg' must be NULL or a character vector

Thanks to fpadilla for the suggestion to use R 2.15.1

That eliminated that problem. Fortunately, I have not seen the subsequent problem with estimatelog2FoldChanges(). I am not (yet) running multicore.

My sessionInfo is below.

/Sol.

Code:

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DEXSeq_1.2.1       Biobase_2.16.0     BiocGenerics_0.2.0

loaded via a namespace (and not attached):
[1] RCurl_1.95-0.1.2 XML_3.95-0.1     biomaRt_2.12.0   hwriter_1.3      plyr_1.7.1       statmod_1.4.16   stringr_0.6.1    tools_2.15.1

**fpadilla** · 10-11-2012, 01:38 PM

I figurate out! The error is because still I have the replicate column in the design data.frame. Also I have space characters in the values of the condition column.

**zimmernv** · 12-03-2012, 09:22 AM

Hey Francisco,

I have the same "Error: match.arg()" that you had previously. I have the most up-to-date versions of R (2.15.2), DEXSeq (1.4.0), and Bioconductor (2.11) -- or so I think. But the error does not disappear.

My ecs variable has valid count data (not the 0's or NULL values that have been known to occur). So can you explain in greater detail what you mean by "I still have replicate column in the design data.frame" and "space characters in the values of the condition column"? Should you not have a replicate column when you use estimateDispersions()? My design definitely still has that column but I never saw any indication in the latest vignette to remove that information.

Could you show how you wrote your "design"? I have two controls and two transgenics, so I just wrote:

> samples = data.frame(condition = c(rep("control", 2), rep("transgenic", 2)), replicate = c(1:2, 1:2), row.names = countfiles, stringsAsFactors=TRUE, check.names=FALSE)

where 'countfiles' is defined earlier in my script.

I welcome any help. Thanks in advance.

**areyes** · 12-03-2012, 01:07 PM

Dear zimmernv,

To remove the "replicate" column, instead of:

Code:

> samples = data.frame(condition = c(rep("control", 2), rep("transgenic", 2)), replicate = c(1:2, 1:2), row.names = countfiles, stringsAsFactors=TRUE, check.names=FALSE)

just do

Code:

> samples = data.frame(condition = c(rep("control", 2), rep("transgenic", 2)), row.names = countfiles, stringsAsFactors=TRUE, check.names=FALSE)

In fact, you are correct in saying that you did not see any change in the documentation, I should have added that. The exact change is that all the columns of the design data frame should be factors, so it would also solve the problem to convert your "replicate" column into a factor.

Let me know if that solves the problem, if not, could you add the complete (reproducible) code that you used and the complete output of the sessionInfo() ?

Alejandro

**zimmernv** · 12-12-2012, 01:49 PM

thanks ... one more question

Hey Alejandro,

Thanks for the response. That fixed the issue and I can now visualize my results.

Somewhat astonishingly, the differential exon usage test does not yield any positive differential splicing results for my dataset (2 replicates for both the controls and the transgenics, with very high count numbers). For a whole-transcriptome 45 million reads (i.e. decent coverage) RNA-seq experiment, not seeing a single alternative splicing event is a bit odd, don't you think?

One area of concern is the use of the non-overlapping exonic regions made via the dexseq_prepare_annotation.py script. I can understand the motivation for creating these regions, but I'm struggling to see how I can then convert the final output (counts assigned to new but non-biological exonic regions) to biologically-meaningful outputs (i.e. deconvolving the counts over these exonic regions into counts over the exons that helped to build the exonic region). To take an example, the Apoer2 gene in M. musculus has 20 exons. But the dexseq script produces a gff file that lists Apoer2 as having 40 exonic regions due to the overlaps of transcripts. The DEXSeq plots then show 40 exons rather than the conventional 20. How can I get an image that would reveal the counts over the 20 exons? Should I manually change the exonCountSet to reflect the biological reality?

To summarize the difficulty I'm having: DEXSeq, as I see it, is testing for Differential Exonic Regions rather than Differential Exons (where only the latter is biologically-meaningful, or so I think ... I come from the computer science side more so than biology).

I eagerly await your critique and insights.

**areyes** · 12-14-2012, 01:25 AM

Hey zimmernv,

Is there any obvious change that you would expect any differential exon usage regulation between the transgenics and the controls? (e.g. the inserted gene being somehow related to the splicing machinery?)

I guess the exon bining is an attempt to test for all possible exon usage regulation according by testing all unique non-overlapping exonic parts from any transcript, but at the end its an optional step. You could input in DEXSeq with "real" exons. My guess is that it won't make a huge difference, but let me know if I am wrong!

Alejandro

**areyes** · 12-14-2012, 03:04 AM

Also you could check if you have enough counts in the exons to achieve statistical power to detect something, data quality or if you have a lot of variability between replicates!

**zimmernv** · 12-19-2012, 01:41 PM

Originally posted by areyes View Post

Hey zimmernv,

Is there any obvious change that you would expect any differential exon usage regulation between the transgenics and the controls? (e.g. the inserted gene being somehow related to the splicing machinery?)

I guess the exon bining is an attempt to test for all possible exon usage regulation according by testing all unique non-overlapping exonic parts from any transcript, but at the end its an optional step. You could input in DEXSeq with "real" exons. My guess is that it won't make a huge difference, but let me know if I am wrong!

Alejandro

Hey Alejandro,

I tried it again with "real" exons, and I got 1 differentially-expressed exon rather than 0. So, as you implied, the difference is not substantial (though that 1 hit is very interesting as far as the biology is concerned).

I was expecting a more pronounced change in exon regulation -- the gene that is inserted is a known splicing factor, so a single event of differential splicing is a bit unexpected. I'm using BitSeq at the moment to see if I can reproduce the results.

Speaking of comparisons, I ran into a thread on SeqAnswers in which S. Anders commented that a golden standard for gene expression and exon expression would be to have 20 replicates of one condition vs. 20 replicates of another condition and then using inferential statistics to find the significant hits. I'm curious about the number 20 -- is there a statistical reason for that value? If it can spare you a long technical explanation, could you send me the titles of the relevant papers?

Thanks a ton,
Vincent

**fpadilla** · 12-26-2012, 11:18 AM

Sorry guys. I missed this post, I'm glad you have fixed it!

**Wolfgang Huber** · 12-27-2012, 02:21 AM

I am pretty sure Simon used '20' just to indicate a sufficiently large number, so that the power of standard tests (e.g. Wilcoxon, t- or permutation tests) is sufficient to detect every relevant differentially expressed feature.

**muthu545** · 06-07-2013, 02:22 PM

Hi,

I'm am novice to using DEXSeq.
@zimmernv and @Alejandro
How could you get the flattened GTF file with 'real' exons instead of the exonic parts?

Thanks
--
Muthu

Quote:
Originally Posted by areyes

I guess the exon bining is an attempt to test for all possible exon usage regulation according by testing all unique non-overlapping exonic parts from any transcript, but at the end its an optional step. You could input in DEXSeq with "real" exons. My guess is that it won't make a huge difference, but let me know if I am wrong!

Quote:
Originally Posted by zimmernv

I tried it again with "real" exons, and I got 1 differentially-expressed exon rather than 0. So, as you implied, the difference is not substantial (though that 1 hit is very interesting as far as the biology is concerned).

**muthu545** · 07-03-2013, 02:11 PM

Areyes and Zimmernv,

Could you let me know, how could I get flattened GTF file with 'real' exons instead of the exonic parts?

Thanks
--
Muthu

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

DEXSeq error in estimateDispersions: match.arg(start.method, c("log(y)", "mean"))

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News