Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEXSeq error in estimateDispersions: match.arg(start.method, c("log(y)", "mean"))

    Hey Guys:

    I got the following error many times while calling estimateDispersions:

    Error in match.arg(start.method, c("log(y)", "mean")) : 'arg' must be NULL or a character vector

    Below is the code I'm runing, the weird thing is execution of estimateDispersions is finished and continue to fitDispersionFunction, when it is stopped because estimateDispersions failed.

    Code:
    exonStats = read.HTSeqCounts(
                    countfiles = file.path("./", paste(rownames(samples),"HTSeqCount.txt",sep=".") ),
    		design = samples, flattenedfile = annotationFile)
    
    exonStats <- estimateSizeFactors(exonStats)
    exonStats <- estimateDispersions(exonStats, quiet=TRUE) # Error is here
    exonStats <- fitDispersionFunction(exonStats) # Execution stop here
    Also, I have a few datasets for testing and still I get error. Here is the experiment design

    Code:
    > samples
           condition replicate        type
    31 4cm from base         1 single-read
    32 4cm from base         2 single-read
    33           tip         1 single-read
    34           tip         2 single-read
    This is the session info:
    Code:
    > sessionInfo()
    R version 2.14.1 (2011-12-22)
    Platform: x86_64-pc-linux-gnu (64-bit)
    
    locale:
     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
     [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
     [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
     [7] LC_PAPER=C                 LC_NAME=C                 
     [9] LC_ADDRESS=C               LC_TELEPHONE=C            
    [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
    
    attached base packages:
    [1] parallel  tools     stats     graphics  grDevices utils     datasets 
    [8] methods   base     
    
    other attached packages:
    [1] multicore_0.1-7 DEXSeq_1.0.2    Biobase_2.14.0 
    
    loaded via a namespace (and not attached):
    [1] hwriter_1.3    plyr_1.7.1     statmod_1.4.15 stringr_0.6.1
    I'll appreciate your help !

    Francisco

  • #2
    For the record:

    I upgraded R version to 2.15.1 and Bioconductor to 2.10 and the error doesn't ocurr any more, but now I get a new error at estimatelog2FoldChanges function:

    Error: subscript out of bounds

    exonStats <- estimatelog2FoldChanges( exonStats )

    Code:
    > sessionInfo()
    R version 2.15.1 (2012-06-22)
    Platform: x86_64-pc-linux-gnu (64-bit)
    
    locale:
     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
     [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
     [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
     [7] LC_PAPER=C                 LC_NAME=C                 
     [9] LC_ADDRESS=C               LC_TELEPHONE=C            
    [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
    
    attached base packages:
    [1] parallel  tools     stats     graphics  grDevices utils     datasets 
    [8] methods   base     
    
    other attached packages:
    [1] multicore_0.1-7    DEXSeq_1.2.1       Biobase_2.16.0     BiocGenerics_0.2.0
    
    loaded via a namespace (and not attached):
    [1] biomaRt_2.12.0 hwriter_1.3    plyr_1.7.1     RCurl_1.91-1   statmod_1.4.15
    [6] stringr_0.6.1  XML_3.6-2
    Last edited by fpadilla; 09-28-2012, 10:28 AM.

    Comment


    • #3
      Hi Francisco,

      Can you post the error message that you are getting now?

      Comment


      • #4
        I am a longtime DESeq user (and a big fan).

        On my first attempt with DEXSeq, I got the same message as fpadilla while calling estimateDispersions():

        Error in match.arg(start.method, c("log(y)", "mean")) : 'arg' must be NULL or a character vector

        Thanks to fpadilla for the suggestion to use R 2.15.1

        That eliminated that problem. Fortunately, I have not seen the subsequent problem with estimatelog2FoldChanges(). I am not (yet) running multicore.

        My sessionInfo is below.

        /Sol.

        Code:
        > sessionInfo()
        R version 2.15.1 (2012-06-22)
        Platform: x86_64-unknown-linux-gnu (64-bit)
        
        locale:
        [1] C
        
        attached base packages:
        [1] stats     graphics  grDevices utils     datasets  methods   base     
        
        other attached packages:
        [1] DEXSeq_1.2.1       Biobase_2.16.0     BiocGenerics_0.2.0
        
        loaded via a namespace (and not attached):
        [1] RCurl_1.95-0.1.2 XML_3.95-0.1     biomaRt_2.12.0   hwriter_1.3      plyr_1.7.1       statmod_1.4.16   stringr_0.6.1    tools_2.15.1

        Comment


        • #5
          I figurate out! The error is because still I have the replicate column in the design data.frame. Also I have space characters in the values of the condition column.

          Comment


          • #6
            Hey Francisco,

            I have the same "Error: match.arg()" that you had previously. I have the most up-to-date versions of R (2.15.2), DEXSeq (1.4.0), and Bioconductor (2.11) -- or so I think. But the error does not disappear.

            My ecs variable has valid count data (not the 0's or NULL values that have been known to occur). So can you explain in greater detail what you mean by "I still have replicate column in the design data.frame" and "space characters in the values of the condition column"? Should you not have a replicate column when you use estimateDispersions()? My design definitely still has that column but I never saw any indication in the latest vignette to remove that information.

            Could you show how you wrote your "design"? I have two controls and two transgenics, so I just wrote:

            > samples = data.frame(condition = c(rep("control", 2), rep("transgenic", 2)), replicate = c(1:2, 1:2), row.names = countfiles, stringsAsFactors=TRUE, check.names=FALSE)

            where 'countfiles' is defined earlier in my script.

            I welcome any help. Thanks in advance.

            Comment


            • #7
              Dear zimmernv,

              To remove the "replicate" column, instead of:

              Code:
              > samples = data.frame(condition = c(rep("control", 2), rep("transgenic", 2)), replicate = c(1:2, 1:2), row.names = countfiles, stringsAsFactors=TRUE, check.names=FALSE)
              just do

              Code:
              > samples = data.frame(condition = c(rep("control", 2), rep("transgenic", 2)), row.names = countfiles, stringsAsFactors=TRUE, check.names=FALSE)
              In fact, you are correct in saying that you did not see any change in the documentation, I should have added that. The exact change is that all the columns of the design data frame should be factors, so it would also solve the problem to convert your "replicate" column into a factor.

              Let me know if that solves the problem, if not, could you add the complete (reproducible) code that you used and the complete output of the sessionInfo() ?

              Alejandro

              Comment


              • #8
                thanks ... one more question

                Hey Alejandro,

                Thanks for the response. That fixed the issue and I can now visualize my results.

                Somewhat astonishingly, the differential exon usage test does not yield any positive differential splicing results for my dataset (2 replicates for both the controls and the transgenics, with very high count numbers). For a whole-transcriptome 45 million reads (i.e. decent coverage) RNA-seq experiment, not seeing a single alternative splicing event is a bit odd, don't you think?

                One area of concern is the use of the non-overlapping exonic regions made via the dexseq_prepare_annotation.py script. I can understand the motivation for creating these regions, but I'm struggling to see how I can then convert the final output (counts assigned to new but non-biological exonic regions) to biologically-meaningful outputs (i.e. deconvolving the counts over these exonic regions into counts over the exons that helped to build the exonic region). To take an example, the Apoer2 gene in M. musculus has 20 exons. But the dexseq script produces a gff file that lists Apoer2 as having 40 exonic regions due to the overlaps of transcripts. The DEXSeq plots then show 40 exons rather than the conventional 20. How can I get an image that would reveal the counts over the 20 exons? Should I manually change the exonCountSet to reflect the biological reality?

                To summarize the difficulty I'm having: DEXSeq, as I see it, is testing for Differential Exonic Regions rather than Differential Exons (where only the latter is biologically-meaningful, or so I think ... I come from the computer science side more so than biology).

                I eagerly await your critique and insights.

                Comment


                • #9
                  Hey zimmernv,

                  Is there any obvious change that you would expect any differential exon usage regulation between the transgenics and the controls? (e.g. the inserted gene being somehow related to the splicing machinery?)

                  I guess the exon bining is an attempt to test for all possible exon usage regulation according by testing all unique non-overlapping exonic parts from any transcript, but at the end its an optional step. You could input in DEXSeq with "real" exons. My guess is that it won't make a huge difference, but let me know if I am wrong!

                  Alejandro

                  Comment


                  • #10
                    Also you could check if you have enough counts in the exons to achieve statistical power to detect something, data quality or if you have a lot of variability between replicates!
                    Last edited by areyes; 12-14-2012, 03:08 AM. Reason: wording

                    Comment


                    • #11
                      Originally posted by areyes View Post
                      Hey zimmernv,

                      Is there any obvious change that you would expect any differential exon usage regulation between the transgenics and the controls? (e.g. the inserted gene being somehow related to the splicing machinery?)

                      I guess the exon bining is an attempt to test for all possible exon usage regulation according by testing all unique non-overlapping exonic parts from any transcript, but at the end its an optional step. You could input in DEXSeq with "real" exons. My guess is that it won't make a huge difference, but let me know if I am wrong!

                      Alejandro
                      Hey Alejandro,

                      I tried it again with "real" exons, and I got 1 differentially-expressed exon rather than 0. So, as you implied, the difference is not substantial (though that 1 hit is very interesting as far as the biology is concerned).

                      I was expecting a more pronounced change in exon regulation -- the gene that is inserted is a known splicing factor, so a single event of differential splicing is a bit unexpected. I'm using BitSeq at the moment to see if I can reproduce the results.

                      Speaking of comparisons, I ran into a thread on SeqAnswers in which S. Anders commented that a golden standard for gene expression and exon expression would be to have 20 replicates of one condition vs. 20 replicates of another condition and then using inferential statistics to find the significant hits. I'm curious about the number 20 -- is there a statistical reason for that value? If it can spare you a long technical explanation, could you send me the titles of the relevant papers?

                      Thanks a ton,
                      Vincent

                      Comment


                      • #12
                        Sorry guys. I missed this post, I'm glad you have fixed it!

                        Comment


                        • #13
                          I am pretty sure Simon used '20' just to indicate a sufficiently large number, so that the power of standard tests (e.g. Wilcoxon, t- or permutation tests) is sufficient to detect every relevant differentially expressed feature.
                          Wolfgang Huber
                          EMBL

                          Comment


                          • #14
                            Hi,

                            I'm am novice to using DEXSeq.
                            @zimmernv and @Alejandro
                            How could you get the flattened GTF file with 'real' exons instead of the exonic parts?

                            Thanks
                            --
                            Muthu

                            Quote:
                            Originally Posted by areyes

                            I guess the exon bining is an attempt to test for all possible exon usage regulation according by testing all unique non-overlapping exonic parts from any transcript, but at the end its an optional step. You could input in DEXSeq with "real" exons. My guess is that it won't make a huge difference, but let me know if I am wrong!

                            Quote:
                            Originally Posted by zimmernv

                            I tried it again with "real" exons, and I got 1 differentially-expressed exon rather than 0. So, as you implied, the difference is not substantial (though that 1 hit is very interesting as far as the biology is concerned).

                            Comment


                            • #15
                              Areyes and Zimmernv,

                              Could you let me know, how could I get flattened GTF file with 'real' exons instead of the exonic parts?

                              Thanks
                              --
                              Muthu

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              33 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              72 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              81 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X