SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
htseq-count paolo.kunder Bioinformatics 10 10-22-2014 05:45 AM
Problem with HTSeq-count anikng RNA Sequencing 3 08-16-2013 09:33 PM
Which ID should be used for HTSeq-count? syintel87 Bioinformatics 11 02-07-2013 01:16 AM
multiBamCov or htseq-count to count read per feature ? NicoBxl Bioinformatics 1 07-03-2012 03:05 AM
htseq-count on UTR emilyjia2000 Bioinformatics 9 04-06-2012 01:13 PM

Reply
 
Thread Tools
Old 10-18-2013, 03:44 AM   #1
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default HTseq-count to DEseq2: Need a little help..

Hi!
This is what Ive done:

sampleFiles <- list.files(path="/Volumes/timemachine/HTseq_DEseq2",pattern="*.txt")
sampleCondition <- read.table("/Volumes/timemachine/HTseq_DEseq2/03_SampleCondition.txt", header=TRUE)
sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition)
directory <- c("/Volumes/timemachine/HTseq_DEseq2/")
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= ~ condition)


Im getting an error:

"Error in validObject(.Object) :
invalid class "DESeqDataSet" object: all variables in design formula must be columns in colData"

Last edited by sindrle; 10-18-2013 at 04:56 AM.
sindrle is offline   Reply With Quote
Old 10-18-2013, 05:44 AM   #2
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

I have also tried this:

status <- factor(c(rep("Healthy",26), rep("Diabetic",22)))
timepoints = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2)
des <- formula(~timepoints+status)
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= ~ des)

Same error..
sindrle is offline   Reply With Quote
Old 10-18-2013, 06:45 AM   #3
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

What are the dimensions of sampleTable and have you tried "design=des" rather than "design=~des"? For your first post, what were the dimensions of condition?
dpryan is offline   Reply With Quote
Old 10-18-2013, 06:47 AM   #4
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Heres my "sampleTable":
sampleName fileName Condition
1 D104.txt D104.txt Diabetic_pre-exercise
2 D121.txt D121.txt Diabetic_pre-exercise
3 D153.txt D153.txt Diabetic_pre-exercise
4 D155.txt D155.txt Diabetic_pre-exercise
5 D161.txt D161.txt Diabetic_pre-exercise
6 D162.txt D162.txt Diabetic_pre-exercise
7 D167.txt D167.txt Diabetic_pre-exercise
8 D173.txt D173.txt Diabetic_pre-exercise
9 D176.txt D176.txt Diabetic_pre-exercise
10 D177.txt D177.txt Diabetic_pre-exercise
11 D179.txt D179.txt Diabetic_pre-exercise
12 D204.txt D204.txt Diabetic_post-exercise
13 D221.txt D221.txt Diabetic_post-exercise
14 D253.txt D253.txt Diabetic_post-exercise
15 D255.txt D255.txt Diabetic_post-exercise
16 D261.txt D261.txt Diabetic_post-exercise
17 D262.txt D262.txt Diabetic_post-exercise
18 D267.txt D267.txt Diabetic_post-exercise
19 D273.txt D273.txt Diabetic_post-exercise
20 D276.txt D276.txt Diabetic_post-exercise
21 D277.txt D277.txt Diabetic_post-exercise
22 D279.txt D279.txt Diabetic_post-exercise
23 N101.txt N101.txt Healthy_pre-exercise
24 N108.txt N108.txt Healthy_pre-exercise
25 N113.txt N113.txt Healthy_pre-exercise
26 N170.txt N170.txt Healthy_pre-exercise
27 N171.txt N171.txt Healthy_pre-exercise
28 N172.txt N172.txt Healthy_pre-exercise
29 N175.txt N175.txt Healthy_pre-exercise
30 N181.txt N181.txt Healthy_pre-exercise
31 N182.txt N182.txt Healthy_pre-exercise
32 N183.txt N183.txt Healthy_pre-exercise
33 N186.txt N186.txt Healthy_pre-exercise
34 N187.txt N187.txt Healthy_pre-exercise
35 N188.txt N188.txt Healthy_pre-exercise
36 N201.txt N201.txt Healthy_post-exercise
37 N208.txt N208.txt Healthy_post-exercise
38 N213.txt N213.txt Healthy_post-exercise
39 N270.txt N270.txt Healthy_post-exercise
40 N271.txt N271.txt Healthy_post-exercise
41 N272.txt N272.txt Healthy_post-exercise
42 N275.txt N275.txt Healthy_post-exercise
43 N281.txt N281.txt Healthy_post-exercise
44 N282.txt N282.txt Healthy_post-exercise
45 N283.txt N283.txt Healthy_post-exercise
46 N286.txt N286.txt Healthy_post-exercise
47 N287.txt N287.txt Healthy_post-exercise
48 N288.txt N288.txt Healthy_post-exercise
sindrle is offline   Reply With Quote
Old 10-18-2013, 06:48 AM   #5
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Design = des doen not work:

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= des)
Error in validObject(.Object) :
invalid class "DESeqDataSet" object: all variables in design formula must be columns in colData
sindrle is offline   Reply With Quote
Old 10-18-2013, 06:51 AM   #6
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

For the first post:

sampleCondition =read.table("/Volumes/timemachine/HTseq_DEseq2/03_SampleCondition.txt", header=TRUE)
condition = sampleCondition

03_SampleCondition.txt:

Condition
Diabetic_pre-exercise
Diabetic_pre-exercise
Diabetic_pre-exercise
Diabetic_pre-exercise
Diabetic_pre-exercise
Diabetic_pre-exercise
Diabetic_pre-exercise
Diabetic_pre-exercise
Diabetic_pre-exercise
Diabetic_pre-exercise
Diabetic_pre-exercise
Diabetic_post-exercise
Diabetic_post-exercise
Diabetic_post-exercise
Diabetic_post-exercise
Diabetic_post-exercise
Diabetic_post-exercise
Diabetic_post-exercise
Diabetic_post-exercise
Diabetic_post-exercise
Diabetic_post-exercise
Diabetic_post-exercise
Healthy_pre-exercise
Healthy_pre-exercise
Healthy_pre-exercise
Healthy_pre-exercise
Healthy_pre-exercise
Healthy_pre-exercise
Healthy_pre-exercise
Healthy_pre-exercise
Healthy_pre-exercise
Healthy_pre-exercise
Healthy_pre-exercise
Healthy_pre-exercise
Healthy_pre-exercise
Healthy_post-exercise
Healthy_post-exercise
Healthy_post-exercise
Healthy_post-exercise
Healthy_post-exercise
Healthy_post-exercise
Healthy_post-exercise
Healthy_post-exercise
Healthy_post-exercise
Healthy_post-exercise
Healthy_post-exercise
Healthy_post-exercise
Healthy_post-exercise

Last edited by sindrle; 10-18-2013 at 06:57 AM.
sindrle is offline   Reply With Quote
Old 10-18-2013, 07:05 AM   #7
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Ah, I see what the problem is. colData is is set from SampleTable (it's all but the first 2 columns). However, there's just a single column there and you have two factors in your design. Try the following:

Code:
sampleFiles <- list.files(path="/Volumes/timemachine/HTseq_DEseq2",pattern="*.txt")
status <- factor(c(rep("Healthy",26), rep("Diabetic",22)))
timepoints = as.factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2))
sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, status=status, timepoints=timepoints)
directory <- c("/Volumes/timemachine/HTseq_DEseq2/")
des <- formula(~timepoints+status)
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= des)
Something along those lines should work (I always use DESeqDataSetFromMatrix).
dpryan is offline   Reply With Quote
Old 10-18-2013, 07:22 AM   #8
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

No surprise, that worked like a charm!

Thanks again, Ill try to run DEseq2 now for diff.exp.
sindrle is offline   Reply With Quote
Old 10-18-2013, 07:51 AM   #9
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

One fast question:

What does the log2FoldChange from the DESeq2 results now tell? In light of the design= des?

Is it correct to interpret it as:

"These genes are significantly changed in diabetic patients from time point one to time point two, with healthy as controls"?
sindrle is offline   Reply With Quote
Old 10-18-2013, 08:20 AM   #10
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

With a multifactor design, there's more than a single set of results. Given a DESeqDataSet named "dds" (they use this name in the vignette, so I'll use it here for the sake of consistency), you can type:
Code:
resultsNames(dds)
to get the coefficient names, which will be something like, "Intercept", "timepoints_1_vs_2", and "status_healthy_vs_diabetic", for you. You can the retrieve each results table:
Code:
statusResults <- results(dds, "status_healthy_vs_diabetic")
timepointsResults <- results(dds, "timepoints_1_vs_2")
The log2FoldChange column in statusResults would be "The log2() fold change in diabetic vs. control patients when controlling for timepoint". The equivalent in timepointsResults would be "The log2() fold change in timepoint 2 vs 1, controlling for diabetic status". BTW, since your timepoints are before/after treatment, I imagine that you're interested in a possible interaction between status and timepoint/treatment. You could specify that in your design by swapping a "*" for the "+".

One more thing to think about is if these samples were drawn from the same subjects at both timepoints. If so, you can model this as a paired design. I don't think there are examples of that in the DESeq2 vignette, but you can probably find an example in the limma user guide (the model setup steps are more or less the same).
dpryan is offline   Reply With Quote
Old 10-18-2013, 08:30 AM   #11
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Very good! Thank you.

"One more thing to think about is if these samples were drawn from the same subjects at both timepoints. If so, you can model this as a paired design. I don't think there are examples of that in the DESeq2 vignette, but you can probably find an example in the limma user guide (the model setup steps are more or less the same)."

This is true, they are drawn from the same subjects! Ill look into that.

Thanks again!
sindrle is offline   Reply With Quote
Old 10-18-2013, 01:42 PM   #12
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Just to check if I got this correctly:

These are the 4 results I get:

statusResults <- results(dds, "status_Healthy_vs_Diabetic");
timepointsResults <- results(dds, "timepoints_2_vs_1");
interceptResults <- results(dds, "Intercept");
status&treatmentResults <- results(dds, "timepoints2.statusHealthy")

# statusResults: "The log2() fold change in diabetic vs. control patients when controlling for timepoint". Kinda like its "no treatment", meaning what are the difference in gene expression between diabetics and healthy? Genes changed due only to treatment are not shown. Assumes that the treatment has the same effect on both groups?

# timepointsResults: "The log2() fold change in timepoint 2 vs 1, controlling for diabetic status". Meaning how the treatment effects gene expression regardless of diabetes? Assumes that the treatment has the same effect on both groups?

#interceptResults: What is this??

# status&treatmentResults: "interaction between status and treatment". Meaning how the treatment affects gene expression differently in healthy or diabetic?

I have posted a new thread on how to implement paired data between time point 1 and 2 btw.

Thanks!
sindrle is offline   Reply With Quote
Old 10-18-2013, 03:19 PM   #13
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

One change I should have mentioned earlier is
Code:
status <- factor(c(rep("Healthy",26), rep("Diabetic",22)), levels=c("Healthy", "Diabetic")
which will make coefficients in the directions you want.

Quote:
interceptResults <- results(dds, "Intercept");
Yeah, you can ignore that one. There are a number of ways to parametrize these sorts of models, one of which is without a Intercept. These results would effectively be things significantly higher than zero at the baseline condition (Diabetic at timepoint 1 in the original model, but Healthy at timepoint 1 with the "status" from above).

Quote:
Kinda like its "no treatment", meaning what are the difference in gene expression between diabetics and healthy? Genes changed due only to treatment are not shown. Assumes that the treatment has the same effect on both groups?
Yes, this is the difference due simply to being diabetic. I kind of assume that this isn't really that interesting for you from a biological standpoint, other than to show that you found similar changes to everyone else who's looked at gene expression in diabetes.

Quote:
Meaning how the treatment effects gene expression regardless of diabetes? Assumes that the treatment has the same effect on both groups?
This is the change you would expect due to treatment, regardless of the persons diabetic status. If there are changes due to a combination of treatment and diabetic status, they generally shouldn't show up here (these would be termed "interaction effects").

Quote:
Meaning how the treatment affects gene expression differently in healthy or diabetic?
Yeah. I assume that the point of the treatment is to reverse some aspect of diabetes. You could then think of diabetes increasing expression of some gene which is decreased in those taking the treatment only if they have diabetes. It's not always the case that including interaction terms make sense, but I would be interested in looking for these genes if I were you.

Quote:
I have posted a new thread on how to implement paired data between time point 1 and 2 btw.
I noticed that and the post to the bioconductor mailing list. I won't reply there in hope that one of the DESeq authors know of a better way than what I'll present here to deal with that. The simplest way is to just treat each individual as a factor. So, something like:

Code:
patients <- factor(c(rep(1:13,2), rep(14:24,2)))
des <- formula(~patients + timepoints*status)
You don't really care about the various patient-specific differences, so don't bother with the results from that.

Last edited by dpryan; 10-18-2013 at 04:08 PM. Reason: Changed the patients factor so it should be correct, previously, things were incorrectly paired.
dpryan is offline   Reply With Quote
Old 10-18-2013, 05:06 PM   #14
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Wow!
That was a great answer, thank you yet again.

It was Simon Anders who said I should post via the mailing list, never used it before. Ill post the answer in the thread if I get it, so others can read as well.
sindrle is offline   Reply With Quote
Old 10-18-2013, 06:58 PM   #15
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

It runs fine, I changed "healthy" and "diabetic" to fit how it was imported (diabetics first). Does that influence anything?

I got this message after the run:

> dds <- DESeq(ddsHTSeq)
159 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

Is that a problem?

So my results is now fine? Paired data is handled correctly?

Last edited by sindrle; 10-18-2013 at 07:07 PM.
sindrle is offline   Reply With Quote
Old 10-19-2013, 12:23 AM   #16
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Swapping Healthy/Diabetic will just change the sign on the fold changes. With the updated version of "status", Normal/timepoint 1 is the baseline for all of the comparisons, which is how I expect you and others would want to think about the experiment. Previously it was Diabetic/timepoint 1.

Regarding the error message, just change the "maxit" option to something bigger to see if that goes away. The model is now rather more complicated, so I'm not surprised that it takes more iterations to fit. If you still have things not fitting, then see which rows they are and don't trust the results from them (the other 30,000 or so rows should be fine, however). The results should be from a paired-analysis then.
dpryan is offline   Reply With Quote
Old 10-19-2013, 02:04 AM   #17
sindrle
Senior Member
 
Location: Norway

Join Date: Aug 2013
Posts: 266
Default

Very good!
Ill start reading up on EdgeR later, the manual there had some nice sections on design.

Thank you again, have a nice weekend.
sindrle is offline   Reply With Quote
Old 11-20-2013, 06:46 AM   #18
nbahlis
Member
 
Location: Canada

Join Date: May 2013
Posts: 25
Default

Hi Ryan,

DESeq2 question
Is the script below the correct way to set up a comparison between paired samples pre- and post treatment?
thank you
> sampleFiles <- list.files(path="~/Desktop/Realigned_to_human_g1K_v37/Cuffdiff_IMIDS_Nov/HTSeq/HTseq_gene_counts" , pattern="*.counts")
> Table3 <- data.frame(
+ row.names = c( "P110", "P124", "P149", "P185", "P189", "P192", "P218", "P227", "P235", "P280", "P308", "P351", "P357", "P367", "P377", "P384", "P426", "P543", "P584", "P590", "P594", "P610" ),
+
> sampleFiles <- list.files(path="~/Desktop/Realigned_to_human_g1K_v37/Cuffdiff_IMIDS_Nov/HTSeq/HTseq_gene_counts" , pattern="*.counts")
> Table3 <- data.frame(
+ sampleName = sampleFiles, fileName = sampleFiles,
+ condition = c( "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "post", "pre", "post", "post", "post", "post", "post", "post", "post", "post", "post" ),
+ libType = c( "pair8", "pair10", "pair9", "pair1", "pair7", "pair11", "pair2", "pair3", "pair4", "pair5", "pair5", "pair6", "pair7", "pair1", "pair3", "pair4", "pair11", "pair2", "pair6", "pair10", "pair9", "pair8" ) )
Error in data.frame(sampleName = sampleFiles, fileName = sampleFiles, :
arguments imply differing number of rows: 22, 20
> Table3 <- data.frame(
+ sampleName = sampleFiles, fileName = sampleFiles,
+ condition = c( "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "post", "pre", "post", "post", "post", "post", "post", "post", "post", "post", "post", "post" ),
+ libType = c( "pair8", "pair10", "pair9", "pair1", "pair7", "pair11", "pair2", "pair3", "pair4", "pair5", "pair5", "pair6", "pair7", "pair1", "pair3", "pair4", "pair11", "pair2", "pair6", "pair10", "pair9", "pair8" ) )
> directory <- c("~/Desktop/Realigned_to_human_g1K_v37/Cuffdiff_IMIDS_Nov/HTSeq/HTseq_gene_counts/")
> design <- formula(~ libType + condition)
> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable= Table3, directory= directory, design= design)
> Table3
sampleName fileName condition libType
1 P110.counts P110.counts pre pair8
2 P124.counts P124.counts pre pair10
3 P149.counts P149.counts pre pair9
4 P185.counts P185.counts pre pair1
5 P189.counts P189.counts pre pair7
6 P192.counts P192.counts pre pair11
7 P218.counts P218.counts pre pair2
8 P227.counts P227.counts pre pair3
9 P235.counts P235.counts pre pair4
10 P280.counts P280.counts pre pair5
11 P308.counts P308.counts post pair5
12 P351.counts P351.counts pre pair6
13 P357.counts P357.counts post pair7
14 P367.counts P367.counts post pair1
15 P377.counts P377.counts post pair3
16 P384.counts P384.counts post pair4
17 P426.counts P426.counts post pair11
18 P543.counts P543.counts post pair2
19 P584.counts P584.counts post pair6
20 P590.counts P590.counts post pair10
21 P594.counts P594.counts post pair9
22 P610.counts P610.counts post pair8

Many thanks
nbahlis is offline   Reply With Quote
Old 11-20-2013, 07:03 AM   #19
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

That appears to be correct. That doesn't look for any pair:treatment interaction, but that's likely not of interest (and would really suck up the degrees of freedom).
dpryan is offline   Reply With Quote
Old 11-20-2013, 07:13 AM   #20
nbahlis
Member
 
Location: Canada

Join Date: May 2013
Posts: 25
Default

thank you for the quick reply. I am attempting to identify genes involved in resistance to treatment "X". RNAseq were done on samples collected from individual patients pre-treatment and at the time of clinical progression (resistant). Therefore I think I need to account for pair:treatment interaction, no? I am not sure how to account for this interaction in my script. Greatly appreciate your advice.
nbahlis is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:36 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO