SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
DEXSeq exonic_part_number paolo.kunder Bioinformatics 1 04-12-2012 08:40 AM
glmnb.fit in DEXSeq RockChalkJayhawk Bioinformatics 5 12-06-2011 08:29 AM
DEXSeq error xguo Bioinformatics 4 12-06-2011 06:31 AM
DEXSeq and Biobase MerFer Bioinformatics 1 10-14-2011 04:50 AM

Reply
 
Thread Tools
Old 04-29-2012, 11:06 AM   #1
r_sitaram
Junior Member
 
Location: Helsinki

Join Date: Apr 2012
Posts: 9
Default DEXSeq output not convincing

Hello,
I'm running DEXSeq for finding out the differential exon usage between 3 Wild type and 3 transgenic rat data. I obtained the exon count using the scripts dexseq_count.py and the gff file using dexseq_prepare_annotation.py. After following all the steps given in your vignette,When I run the DEUresultTable command, I get a lot of NA's in the pvalue column and just 1's in the padjust column. Also after executing the estimateDispersions function, the column dispersion_CR_est is showing NULL. I have a feeling that I'm making a mistake in the exoncount data object but I'm not sure. Could someone throw some light on this?

I could probably highlight the steps(in bold) I have done.

annotationfile = file.path("/triton/ics/scratch/rnaseq/exoncount/Rattus_norvegicus.RGSC3.4.66_DEXSEQ.gff")

annotationfile
[1] "/triton/ics/scratch/rnaseq/exoncount/Rattus_norvegicus.RGSC3.4.66_DEXSEQ.gff"

samples=data.frame(condition=c(rep("wild",3),rep("trans",3)),replicate=c(1,1,1,1,1,1),stringsAsFactors = TRUE,check.names = FALSE)
row.names(samples) <- c("01_5b_WT.txt","02_6b_WT.txt","03_15b_WT.txt","04_4b_TG.txt","05_7b_TG.txt","06_16b_TG.txt")


samples
condition replicate
01_5b_WT.txt wild 1
02_6b_WT.txt wild 1
03_15b_WT.txt wild 1
04_4b_TG.txt trans 1
05_7b_TG.txt trans 1
06_16b_TG.txt trans 1

countfiles = file.path("/triton/ics/scratch/rnaseq/exoncount",paste(row.names(samples))

countfiles
[1] "/triton/ics/scratch/rnaseq/exoncount/01_5b_WT.txt"
[2] "/triton/ics/scratch/rnaseq/exoncount/02_6b_WT.txt"
[3] "/triton/ics/scratch/rnaseq/exoncount/03_15b_WT.txt"
[4] "/triton/ics/scratch/rnaseq/exoncount/04_4b_TG.txt"
[5] "/triton/ics/scratch/rnaseq/exoncount/05_7b_TG.txt"
[6] "/triton/ics/scratch/rnaseq/exoncount/06_16b_TG.txt"

ecs = read.HTSeqCounts(countfiles,design=samples,flattenedfile=annotationfile)

ExonCountSet (storageMode: environment)
assayData: 236327 features, 6 samples
element names: counts
protocolData: none
phenoData
sampleNames: 01_5b_WT.txt 02_6b_WT.txt ... 06_16b_TG.txt (6 total)
varLabels: sizeFactor condition replicate countfiles
varMetadata: labelDescription
featureData
featureNames: ENSRNOG00000000001:001 ENSRNOG00000000001:002 ...
ENSRNOG00000045521:001 (236327 total)
fvarLabels: geneID exonID ... transcripts (13 total)
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:

sampleNames(ecs) = row.names(samples)

ecs <- estimateSizeFactors(ecs)


sizeFactors(ecs)
01_5b_WT.txt 02_6b_WT.txt 03_15b_WT.txt 04_4b_TG.txt 05_7b_TG.txt
1.6794610 0.5407073 0.8215439 1.7972423 0.8516481
06_16b_TG.txt
0.9878676

ecs <- estimateDispersions(ecs)


featureData(ecs)$dispersion_CR_est
NULL

ecs <- fitDispersionFunction(ecs)

ecs <- testForDEU(ecs)

res <- DEUresultTable(ecs)
r_sitaram is offline   Reply With Quote
Old 04-29-2012, 04:38 PM   #2
ngsAnalytics
Junior Member
 
Location: Eastern US

Join Date: Apr 2012
Posts: 3
Default

Hi,

I would start by checking that the ExonContSet (ecs) was built correctly.

A couple of things to try

design(ecs) - This should match what you intended.

sampleNames(ecs) - the row names of the design output should match this

table(geneIDs(ecs)) will show you how many genes are in the set.

you can also try head(modelFrameforGene(ecs, "Pick some gene" )) of course replace the Pick some gene with the name of a gene from your genome.
This will show if something for counts was read in.

Also normally the values for replicate would be 1,2, and 3 for each sample type. I don't think having them all set to one will cause a problem, but I have never tried it.



Craig
ngsAnalytics is offline   Reply With Quote
Old 04-29-2012, 09:00 PM   #3
r_sitaram
Junior Member
 
Location: Helsinki

Join Date: Apr 2012
Posts: 9
Default

Thank you Craig. I will look into it immediately and get back here.
r_sitaram is offline   Reply With Quote
Old 04-30-2012, 10:06 AM   #4
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 978
Default

First of all: The command "featureData(ecs)$dispersion_CR_est" did not work because you are reading an outdated manual. The slot "dispersion_CR_est" is now called "dispBeforeSharing".

Use the command "openVignette()" to get documentation that fits your installed package versions.

In the current manual, you will see instructions to plot the estimated dispersions against the mean count value. Have a look, maybe your data is simply to noisy to get any results. (If you unsure how to interpret the plot, just post it here.)

Also have a look at the count values plot for few genes (with plotDEXSeq(ecs, "my_favourite_geneID", norCounts = TRUE)). Does it look like you should detect something?
Simon Anders is offline   Reply With Quote
Old 05-03-2012, 02:29 AM   #5
r_sitaram
Junior Member
 
Location: Helsinki

Join Date: Apr 2012
Posts: 9
Default

Hello Craig,
I checked my dataset with the commands that you mentioned. Everything looks fine. The counts were also read. I just did one change. I got information that my data had 3 replicates. So I re-ran the process by replacing the 1's with 1,2 and 3 for wild and 1,2 and 3 for trans.
r_sitaram is offline   Reply With Quote
Old 05-03-2012, 02:52 AM   #6
r_sitaram
Junior Member
 
Location: Helsinki

Join Date: Apr 2012
Posts: 9
Default

Hello Simon,
Thanks for that. I downloaded the new vignette and proceeded as stated. Here is the plot between mean count value and dispersion.

Could you interpret the plot ?

I selected a particular gene for testing purpose. That gene coincidentally had data so I could visualise using the plotDEXSeq function. I'm trying to verify the result with the exon count to be sure that the output is valid.
r_sitaram is offline   Reply With Quote
Old 05-03-2012, 02:56 AM   #7
r_sitaram
Junior Member
 
Location: Helsinki

Join Date: Apr 2012
Posts: 9
Default

It looks like the image is not rendering. Here is it.
Attached Images
File Type: jpg Plot.jpg (96.5 KB, 75 views)
r_sitaram is offline   Reply With Quote
Old 05-18-2012, 03:10 AM   #8
r_sitaram
Junior Member
 
Location: Helsinki

Join Date: Apr 2012
Posts: 9
Default

Hello Simon. I'm not sure if you received my other posts here. But I proceeded with my analysis and I have arrived at some results for which I need your help in the analysis. I sorted the dexseq output according to p value and visualised the exons.

Gene ID: ENSRNOG00000017466
Exon : E016
p Value : 0.000166557183275073

For the above mentioned gene, the fitted expression did not help me conclude anything so I went for the fitted splicing. I see that a lot of exons like E016,E017 and so on show similar results but none of their p values are even close to E016. In such a case, how can I make the required conclusion? Am I missing something in the analysis ? Is there another way to look at it?
Attached Files
File Type: pdf RplotsforENSRNOG00000017466.pdf (15.2 KB, 79 views)
r_sitaram is offline   Reply With Quote
Old 05-21-2012, 01:38 AM   #9
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 978
Default

Note how one of your three wild-type samples looks completely different than the other two. It seems that the whole left half is overrepresented. Hence, DEXSeq considers these exons as too noisy to detect anything. I'm a bit surprised that E016 got a low p value, as it does not look that much better.

Is this the gene with the strongest effect that you got? Then, it were justified if you got no significant hits.

Check a few more genes to see if it is always one sample that behaves strangely, and if so, if it is always the same. (You can instruct the plotting function to use different colours for the replicates.) If it is one sample that is behaving oddly, you may want to see what happens if you repeat the analysis without it.
Simon Anders is offline   Reply With Quote
Old 05-21-2012, 05:12 AM   #10
r_sitaram
Junior Member
 
Location: Helsinki

Join Date: Apr 2012
Posts: 9
Default

Hello Simon,
This is the third best gene available. I checked the top 5 and I see this behaviour very distinct only in 2 of them. Iím attaching the pdf of that for your reference. The remaining do not highlight a lot. I should also mention that all the adjusted p values were 1. I also read another post of yours connected with DESeq where you had mentioned about a correlation between high variance and the output. I have posted a graph showing the connection between mean value and dispersion in reply #7. Does the graph hint anything? I'm not able to interpret that one very well.
Attached Files
File Type: pdf Rplotcol.pdf (29.0 KB, 54 views)
r_sitaram is offline   Reply With Quote
Old 05-21-2012, 05:15 AM   #11
r_sitaram
Junior Member
 
Location: Helsinki

Join Date: Apr 2012
Posts: 9
Default

I forgot to mention the p values for those 5 genes. They are as follows.
ENSRNOG00000005330 E035 0.000108786
ENSRNOG00000009864 E005 0.000152295
ENSRNOG00000017466 E016 0.000166557
ENSRNOG00000005434 E046 0.000262485
ENSRNOG00000032391 E004 0.00050834
r_sitaram is offline   Reply With Quote
Reply

Tags
dexseq, exoncountset

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:01 AM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.