Seqanswers Leaderboard Ad

**areyes** · 03-10-2014, 02:46 AM

Hi @Mudita,

The command looks good. If you look close enough to your gtf file you will notice that when using "-r yes", some of the fields "gene_id" from your gtf file have two gene identifiers concatenated by a "+".

In case your file does not contain such instances, it means that your initial gtf file does not contain overlapping exons.

Best regards,
Alejandro

**Mudita** · 03-10-2014, 03:05 AM

Thank you so much for your reply.
However, I just found the difference from non aggregate genes (-r no) to aggregate genes (-r yes).

output with with -r yes:
chr1 dexseq_prepare_annotation.py aggregate_gene 846815 850351 . + . gene_id "ENSG00000230699+ENSG00000241180"

output with -r no:
chr1 dexseq_prepare_annotation.py aggregate_gene 850329 850351 . + . gene_id "ENSG00000241180"

I could find that when I use -r no, i do not find any gene with + sign. This is what I wanted.

Thank you
Mudita

**Mudita** · 03-14-2014, 04:17 AM

Dear Areyes,
I would like to run paired sample analysis in DEXSeq.
I could find from this thred (http://seqanswers.com/forums/showthread.php?t=13299) that following link should be useful.

Bioconductor - Help

http://bioconductor.org/packages/2.9...doc/DEXSeq.pdf

The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.

However, I could not understand.
Could you please let me know which is the formula to modify and how to modify, in order to run a paired analysis.

Thank you for your help.

Regards,
Mudita

**areyes** · 03-14-2014, 04:23 AM

Hi Mudita,

- what is the output of 'sessionInfo()'?
- what is the output of 'design(ecs)', of your DEXSeq object?
- what column of 'design(ecs)' would you like to see if it has an effect on differential exon usage?
- what is the column of 'design(ecs)' that indicates the pairing of the samples?

Alejandro

**Mudita** · 03-24-2014, 02:02 AM

I am sorry for late reply.
Please find the response below.

Output of sessionInfo::

Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN
[4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN
[7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] DEXSeq_1.8.0 Biobase_2.22.0 BiocGenerics_0.8.0
[4] BiocInstaller_1.12.0

loaded via a namespace (and not attached):
[1] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6
[4] GenomicRanges_1.14.4 hwriter_1.3 IRanges_1.20.6
[7] RCurl_1.95-4.1 Rsamtools_1.14.3 statmod_1.4.18
[10] stats4_3.0.2 stringr_0.6.2 tools_3.0.2
[13] XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0

Output of design(ecs)::

countFile condition libType
19_N.counts control paired-end
19_V.counts knockdown paired-end
22_N.counts control paired-end
22_V.counts knockdown paired-end
33_N.counts control paired-end
33_V.counts knockdown paired-end
39_N.counts control paired-end
39_V.counts knockdown paired-end
65_N.counts control paired-end
65_V.counts knockdown paired-end

- what column of 'design(ecs)' would you like to see if it has an effect on differential exon usage?
Between "control and knockdown"

- what is the column of 'design(ecs)' that indicates the pairing of the samples?

all N-V samples are paired samples. For example: 19_N and 19_V, 22_N and 22_V, 33_N and 33_V ans so on.

**dpryan** · 03-24-2014, 03:57 AM

How about just using a design like:

Code:

countFile   condition group
19_N.counts control   A
19_V.counts knockdown A
22_N.counts control   B
22_V.counts knockdown B
33_N.counts control   C
33_V.counts knockdown C
39_N.counts control   D
39_V.counts knockdown D
65_N.counts control   E
65_V.counts knockdown E

and then add "group" as a factor in the design. That seems to be the normal way this is handled. Make sure to put "condition" at the end of the design formula, since that's then used for graphs.

**Mudita** · 03-25-2014, 03:01 AM

Thank you Devon,

I understand that I should apply this formula while calculating dispersion as well as testing for differential expression:
formula=count ~ sample+group*exon+condition

Regards,
Mudita

**areyes** · 03-25-2014, 07:31 AM

Hi Mudita,

As a general comment, you seem to be using a vignette that does not correspond to the DEXSeq version that you have installed, this is not ideal. If you do browseVignettes("DEXSeq") in your R session you will find the vignette associated to the version of DEXSeq that you are using.

I would follow @dpryan suggestion in specifying the design, and then use the formulas:

formulaFullModel <- ~ sample + exon + group:exon + condition:exon
formulaReducedModel <- ~ sample + exon + group:exon

specify this formulas in estimateDispersions and testForDEU as the vignette indicates and that should be it!

Bests,
Alejandro

**Mudita** · 03-27-2014, 09:11 PM

Dear Areyes,

I have tried as suggested. However, I am getting all NA values for p and p adjusted columns.
I am using the R script below.

#!/usr/bin/env Rscript
library(DEXSeq)

annotationfile = file.path("/media/gadgil/Data1/RNA_Seq/Index/Ensemble_GTF_file/Downloaded_6_Jan_2014/Prepare_annotations_for_DEXseq/for_DEXSeq_annotations_withr-no.gff")

sample <- data.frame(row.names = c( "19_N", "19_V", "22_N", "22_V", "33_N", "33_V", "39_N", "39_V", "65_N", "65_V"), countFile = c( "19_N.counts", "19_V.counts", "22_N.counts", "22_V.counts", "33_N.counts", "33_V.counts", "39_N.counts", "39_V.counts", "65_N.counts", "65_V.counts" ), condition = c("control", "knockdown", "control", "knockdown", "control", "knockdown", "control", "knockdown", "control", "knockdown" ), group=c("A", "A", "B", "B", "C", "C", "D", "D", "E", "E"), libType = c( "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end" ) )

fullFilenames<- list.files("/media/gadgil/Seagate Expansion Drive/Data/OUTPUT/DEXSeq_output/",full.names=TRUE,pattern=".count")

ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = sample,flattenedfile = annotationfile)

formulafullmodel=~sample + exon + group:exon + condition:exon
ecs<- estimateSizeFactors(ecs)
library(parallel)
ecs<- estimateDispersions(ecs,formula=formulafullmodel, nCores=12)

ecs<- fitDispersionFunction(ecs)
ecs<- estimatelog2FoldChanges(ecs)

test<- testForDEU(ecs, formula=formulafullmodel, nCores=12)
res1<- DEUresultTable(test)

write.table(res1, file="/media/gadgil/Seagate Expansion Drive/Data/OUTPUT/DEXSeq_output/result_paired.txt")

Could you please let me know if this is correct.

Thank you,
Regards,
Mudita

**areyes** · 03-28-2014, 12:24 AM

Hi Mudita,

Are you getting an error message? I would almost bet that you are getting this error:

"Error in testForDEU: argument 2 matches multiple formal arguments"

You have to change your line:

test<- testForDEU(ecs, formula=formulafullmodel, nCores=12)

to

test<- testForDEU(ecs, formula0=~sample + exon + group:exon, formula1=formulafullmodel, nCores=12)

I think that should do the trick

**Mudita** · 04-06-2014, 09:33 PM

yes, everything works now.
Thanks a lot for all your help.

**Mudita** · 04-13-2014, 11:12 PM

hello again,

I am trying to generate plot in DEXSeq, but I do not see the distribution of expression.
could you please let me know why?

plot.ppt

Thank you,
Regards,
Mudita

**areyes** · 04-13-2014, 11:26 PM

Hi Mudita,

Could you include the code that you are using and the output of sessionInfo()?
Are you getting an error or a warning?
Does it happen with all the genes?
Does it happen also when your plotting device is a x11, pdf?

**Mudita** · 04-14-2014, 12:15 AM

Thank you,
Please find below the response:

code to plot:
plotDEXSeq(test, "ENSG00000167613")

warning while generating the plot:

Warning message:
In segments(intervals[rango], matr[rango, j], intervals[rango + :
semi-transparency is not supported on this device: reported only once per page

Also, there was a warning while calculating the dispersion which I am trying to reproduce.

sessionInfo:

Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN
[4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN
[7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] DEXSeq_1.8.0 Biobase_2.22.0 BiocGenerics_0.8.0

loaded via a namespace (and not attached):
[1] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6
[4] GenomicRanges_1.14.4 hwriter_1.3 IRanges_1.20.7
[7] RCurl_1.95-4.1 Rsamtools_1.14.3 statmod_1.4.18
[10] stats4_3.0.2 stringr_0.6.2 XML_3.98-1.1

Does it happen with all the genes?
I tried with 4 or 5 genes and it happened with all.

Does it happen also when your plotting device is a x11, pdf?
I just run this code "plotDEXSeq(test, "ENSG00000167613")" and generated image is copied in power point.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Usage of gene aggregation option in DEXSeq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News