![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Samtools "is recognized as '*'" "truncated file" error | axiom7 | Bioinformatics | 3 | 11-26-2014 03:53 AM |
"allele balance ratio" and "quality by depth" in VCF files | efoss | Bioinformatics | 2 | 10-25-2011 12:13 PM |
The position file formats ".clocs" and "_pos.txt"? Ist there any difference? | elgor | Illumina/Solexa | 0 | 06-27-2011 08:55 AM |
"Systems biology and administration" & "Genome generation: no engineering allowed" | seb567 | Bioinformatics | 0 | 05-25-2010 01:19 PM |
SEQanswers second "publication": "How to map billions of short reads onto genomes" | ECO | Literature Watch | 0 | 06-30-2009 12:49 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Francisco Padilla O.
Location: Columbus Ohio Join Date: Jul 2011
Posts: 5
|
![]()
Hey Guys:
I got the following error many times while calling estimateDispersions: Error in match.arg(start.method, c("log(y)", "mean")) : 'arg' must be NULL or a character vector Below is the code I'm runing, the weird thing is execution of estimateDispersions is finished and continue to fitDispersionFunction, when it is stopped because estimateDispersions failed. Code:
exonStats = read.HTSeqCounts( countfiles = file.path("./", paste(rownames(samples),"HTSeqCount.txt",sep=".") ), design = samples, flattenedfile = annotationFile) exonStats <- estimateSizeFactors(exonStats) exonStats <- estimateDispersions(exonStats, quiet=TRUE) # Error is here exonStats <- fitDispersionFunction(exonStats) # Execution stop here Code:
> samples condition replicate type 31 4cm from base 1 single-read 32 4cm from base 2 single-read 33 tip 1 single-read 34 tip 2 single-read Code:
> sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel tools stats graphics grDevices utils datasets [8] methods base other attached packages: [1] multicore_0.1-7 DEXSeq_1.0.2 Biobase_2.14.0 loaded via a namespace (and not attached): [1] hwriter_1.3 plyr_1.7.1 statmod_1.4.15 stringr_0.6.1 Francisco |
![]() |
![]() |
![]() |
#2 |
Francisco Padilla O.
Location: Columbus Ohio Join Date: Jul 2011
Posts: 5
|
![]()
For the record:
I upgraded R version to 2.15.1 and Bioconductor to 2.10 and the error doesn't ocurr any more, but now I get a new error at estimatelog2FoldChanges function: Error: subscript out of bounds exonStats <- estimatelog2FoldChanges( exonStats ) Code:
> sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel tools stats graphics grDevices utils datasets [8] methods base other attached packages: [1] multicore_0.1-7 DEXSeq_1.2.1 Biobase_2.16.0 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] biomaRt_2.12.0 hwriter_1.3 plyr_1.7.1 RCurl_1.91-1 statmod_1.4.15 [6] stringr_0.6.1 XML_3.6-2 Last edited by fpadilla; 09-28-2012 at 11:28 AM. |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Heidelberg Join Date: Aug 2010
Posts: 165
|
![]()
Hi Francisco,
Can you post the error message that you are getting now? |
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: CA Join Date: Jun 2011
Posts: 2
|
![]()
I am a longtime DESeq user (and a big fan).
On my first attempt with DEXSeq, I got the same message as fpadilla while calling estimateDispersions(): Error in match.arg(start.method, c("log(y)", "mean")) : 'arg' must be NULL or a character vector Thanks to fpadilla for the suggestion to use R 2.15.1 That eliminated that problem. Fortunately, I have not seen the subsequent problem with estimatelog2FoldChanges(). I am not (yet) running multicore. My sessionInfo is below. /Sol. Code:
> sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] DEXSeq_1.2.1 Biobase_2.16.0 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] RCurl_1.95-0.1.2 XML_3.95-0.1 biomaRt_2.12.0 hwriter_1.3 plyr_1.7.1 statmod_1.4.16 stringr_0.6.1 tools_2.15.1 |
![]() |
![]() |
![]() |
#5 |
Francisco Padilla O.
Location: Columbus Ohio Join Date: Jul 2011
Posts: 5
|
![]()
I figurate out! The error is because still I have the replicate column in the design data.frame. Also I have space characters in the values of the condition column.
|
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: Dallas, Texas Join Date: Dec 2012
Posts: 4
|
![]()
Hey Francisco,
I have the same "Error: match.arg()" that you had previously. I have the most up-to-date versions of R (2.15.2), DEXSeq (1.4.0), and Bioconductor (2.11) -- or so I think. But the error does not disappear. My ecs variable has valid count data (not the 0's or NULL values that have been known to occur). So can you explain in greater detail what you mean by "I still have replicate column in the design data.frame" and "space characters in the values of the condition column"? Should you not have a replicate column when you use estimateDispersions()? My design definitely still has that column but I never saw any indication in the latest vignette to remove that information. Could you show how you wrote your "design"? I have two controls and two transgenics, so I just wrote: > samples = data.frame(condition = c(rep("control", 2), rep("transgenic", 2)), replicate = c(1:2, 1:2), row.names = countfiles, stringsAsFactors=TRUE, check.names=FALSE) where 'countfiles' is defined earlier in my script. I welcome any help. Thanks in advance. |
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Heidelberg Join Date: Aug 2010
Posts: 165
|
![]()
Dear zimmernv,
To remove the "replicate" column, instead of: Code:
> samples = data.frame(condition = c(rep("control", 2), rep("transgenic", 2)), replicate = c(1:2, 1:2), row.names = countfiles, stringsAsFactors=TRUE, check.names=FALSE) Code:
> samples = data.frame(condition = c(rep("control", 2), rep("transgenic", 2)), row.names = countfiles, stringsAsFactors=TRUE, check.names=FALSE) Let me know if that solves the problem, if not, could you add the complete (reproducible) code that you used and the complete output of the sessionInfo() ? Alejandro |
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: Dallas, Texas Join Date: Dec 2012
Posts: 4
|
![]()
Hey Alejandro,
Thanks for the response. That fixed the issue and I can now visualize my results. Somewhat astonishingly, the differential exon usage test does not yield any positive differential splicing results for my dataset (2 replicates for both the controls and the transgenics, with very high count numbers). For a whole-transcriptome 45 million reads (i.e. decent coverage) RNA-seq experiment, not seeing a single alternative splicing event is a bit odd, don't you think? One area of concern is the use of the non-overlapping exonic regions made via the dexseq_prepare_annotation.py script. I can understand the motivation for creating these regions, but I'm struggling to see how I can then convert the final output (counts assigned to new but non-biological exonic regions) to biologically-meaningful outputs (i.e. deconvolving the counts over these exonic regions into counts over the exons that helped to build the exonic region). To take an example, the Apoer2 gene in M. musculus has 20 exons. But the dexseq script produces a gff file that lists Apoer2 as having 40 exonic regions due to the overlaps of transcripts. The DEXSeq plots then show 40 exons rather than the conventional 20. How can I get an image that would reveal the counts over the 20 exons? Should I manually change the exonCountSet to reflect the biological reality? To summarize the difficulty I'm having: DEXSeq, as I see it, is testing for Differential Exonic Regions rather than Differential Exons (where only the latter is biologically-meaningful, or so I think ... I come from the computer science side more so than biology). I eagerly await your critique and insights. |
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Heidelberg Join Date: Aug 2010
Posts: 165
|
![]()
Hey zimmernv,
Is there any obvious change that you would expect any differential exon usage regulation between the transgenics and the controls? (e.g. the inserted gene being somehow related to the splicing machinery?) I guess the exon bining is an attempt to test for all possible exon usage regulation according by testing all unique non-overlapping exonic parts from any transcript, but at the end its an optional step. You could input in DEXSeq with "real" exons. My guess is that it won't make a huge difference, but let me know if I am wrong! Alejandro |
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: Heidelberg Join Date: Aug 2010
Posts: 165
|
![]()
Also you could check if you have enough counts in the exons to achieve statistical power to detect something, data quality or if you have a lot of variability between replicates!
Last edited by areyes; 12-14-2012 at 03:08 AM. Reason: wording |
![]() |
![]() |
![]() |
#11 | |
Junior Member
Location: Dallas, Texas Join Date: Dec 2012
Posts: 4
|
![]() Quote:
I tried it again with "real" exons, and I got 1 differentially-expressed exon rather than 0. So, as you implied, the difference is not substantial (though that 1 hit is very interesting as far as the biology is concerned). I was expecting a more pronounced change in exon regulation -- the gene that is inserted is a known splicing factor, so a single event of differential splicing is a bit unexpected. I'm using BitSeq at the moment to see if I can reproduce the results. Speaking of comparisons, I ran into a thread on SeqAnswers in which S. Anders commented that a golden standard for gene expression and exon expression would be to have 20 replicates of one condition vs. 20 replicates of another condition and then using inferential statistics to find the significant hits. I'm curious about the number 20 -- is there a statistical reason for that value? If it can spare you a long technical explanation, could you send me the titles of the relevant papers? Thanks a ton, Vincent |
|
![]() |
![]() |
![]() |
#12 |
Francisco Padilla O.
Location: Columbus Ohio Join Date: Jul 2011
Posts: 5
|
![]()
Sorry guys. I missed this post, I'm glad you have fixed it!
|
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: Heidelberg, Germany Join Date: Aug 2009
Posts: 109
|
![]()
I am pretty sure Simon used '20' just to indicate a sufficiently large number, so that the power of standard tests (e.g. Wilcoxon, t- or permutation tests) is sufficient to detect every relevant differentially expressed feature.
__________________
Wolfgang Huber EMBL |
![]() |
![]() |
![]() |
#14 |
Member
Location: san antonio Join Date: Jul 2011
Posts: 32
|
![]()
Hi,
I'm am novice to using DEXSeq. @zimmernv and @Alejandro How could you get the flattened GTF file with 'real' exons instead of the exonic parts? Thanks -- Muthu Quote: Originally Posted by areyes I guess the exon bining is an attempt to test for all possible exon usage regulation according by testing all unique non-overlapping exonic parts from any transcript, but at the end its an optional step. You could input in DEXSeq with "real" exons. My guess is that it won't make a huge difference, but let me know if I am wrong! Quote: Originally Posted by zimmernv I tried it again with "real" exons, and I got 1 differentially-expressed exon rather than 0. So, as you implied, the difference is not substantial (though that 1 hit is very interesting as far as the biology is concerned). |
![]() |
![]() |
![]() |
#15 |
Member
Location: san antonio Join Date: Jul 2011
Posts: 32
|
![]()
Areyes and Zimmernv,
Could you let me know, how could I get flattened GTF file with 'real' exons instead of the exonic parts? Thanks -- Muthu |
![]() |
![]() |
![]() |
Thread Tools | |
|
|