Seqanswers Leaderboard Ad

**MDonlin** · 11-01-2013, 10:15 AM

In trying to understand my issue with the use of the DESeqDataSetfromMatrix command in DESeq2 I tried to compare my data to the Pasilla data used in the vignette:

> data("pasillaGenes")
> countData <- counts(pasillaGenes)

> colData <- pData(pasillaGenes)[,c("condition","type")]
> colData
condition type
treated1fb treated single-read
treated2fb treated paired-end
treated3fb treated paired-end
untreated1fb untreated single-read
untreated2fb untreated single-read
untreated3fb untreated paired-end
untreated4fb untreated paired-end

Which imports nicely into a DEseq Data Set:
> pasilla_dds <- DESeqDataSetFromMatrix(countData, colData, formula(~condition))

However, I don't understand where does the type information come from. It doesn't seem to be in the countData matrix because when I look at countData, all I see are 7 columns of data, (GeneIDs plus the count columns).

> head(countData)
treated1fb treated2fb treated3fb untreated1fb untreated2fb untreated3fb untreated4fb
FBgn0000003 0 0 1 0 0 0 0
FBgn0000008 78 46 43 47 89 53 27
FBgn0000014 2 0 0 0 0 1 0
FBgn0000015 1 0 1 0 1 1 2
FBgn0000017 3187 1672 1859 2445 4615 2063 1711
FBgn0000018 369 150 176 288 383 135 174

> dim(countData)
[1] 14470 7

My data seems to be quite similar, but when I try using pData to set the colData variable, I get an error.

My data:
> head(bckCountTable)
ctl1 ctl2 ctl3 del1 del2 del3
CNAG_00001 0 0 0 0 0 0
CNAG_00002 29 34 27 26 13 21
CNAG_00003 38 26 38 41 38 27
CNAG_00004 63 42 58 58 62 55
CNAG_00005 57 49 49 30 39 40
CNAG_00006 433 398 571 422 353 435

> dim(bckCountTable)
[1] 6967 6

When I try to set colData using the same commands:
> colData <- pData(bckCountTable)[,c("condition","type")]
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘pData’ for signature ‘"data.frame"’

I'm still not clear how my matrix of counts differs from that generated for the Pasilla genes.

I also tried following the example in the reference guide:
> colData <- data.frame(condition=factor(c("ctl","del")))

> colData
condition
1 ctl
2 del

But when I try to set up the DEseqDataSet, I get an error
> bck_dds <- DESeqDataSetFromMatrix(bckCountTable, colData, formula(~condition))
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol

Again, any advice or ideas would be most appreciated.

Regards,
Maureen

**dpryan** · 11-02-2013, 05:35 AM

The original problem is partly that "bckCountTable" contains a column with gene names. Just remove that. Also, ExpDesign only has one column, so you can just:

Code:

bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=ExpDesign, design=~condition)

or something along those lines. The number of rows in colData need to match the number of samples that you're analysing.

Regarding "type", you won't find it in the count matrix because it just describes whether things were run as single or paired-end. It's only used latter in the vignette where they discuss multifactor designs. For simple experiments, ignore it. You can't use the design from the pasilla experiment because it had 7 samples and you have 6.

**MDonlin** · 11-04-2013, 09:06 AM

I appreciate your help, but unless I've missed something, if I remove the gene name IDs from the count data, then I won't be able to figure out what genes are differentially expressed.

The issue really seems to be with how the colData is defined.

I tried defining a samples data frame:

> samples
samples condition
1 ctl1 ctl
2 ctl2 ctl
3 ctl3 ctl
4 del1 del
5 del2 del
6 del3 del

And then using:
> bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=samples$samples, design=samples$condition)

But I still get the same error:
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object: invalid object for slot "colData" in class "SummarizedExperiment": got class "factor", should be or extend class "DataFrame"

**dpryan** · 11-04-2013, 09:10 AM

The gene names need to be the row.names, not a column of their own as they were in your example. In what you just wrote, try instead:

Code:

samples <- data.frame(row.names=c("ctl1","ctl2","ctl3","del1","del2","del3"),
    condition=as.factor(c(rep("ctl",3), rep("del",3))))
bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=samples, design=~condition)

**MDonlin** · 11-04-2013, 10:54 AM

Thank you dpryan! That worked quite well.

To summarize what worked:

> bckCountTable <- read.table("bck_counts.txt", header=TRUE, row.names=1)
> head(bckCountTable)
ctl1 ctl2 ctl3 del1 del2 del3
CNAG_00001 0 0 0 0 0 0
CNAG_00002 29 34 27 26 13 21
CNAG_00003 38 26 38 41 38 27
CNAG_00004 63 42 58 58 62 55
CNAG_00005 57 49 49 30 39 40
CNAG_00006 433 398 571 422 353 435

> samples <- data.frame(row.names=c("ctl1","ctl2","ctl3","del1","del2","del3"), condition=as.factor(c(rep("ctl",3),rep("del",3))))
> samples
condition
ctl1 ctl
ctl2 ctl
ctl3 ctl
del1 del
del2 del
del3 del

> bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=samples, design=~condition)
> bckCDS_1 <- DESeq(bckCDS)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

> bck_res <- results(bckCDS_1)

> head(bck_res)
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
CNAG_00001 0.00000 NA NA NA NA NA
CNAG_00002 25.38193 -0.50881187 0.2159023 -2.3566760 0.018439326 NA
CNAG_00003 34.46418 -0.08588360 0.2010327 -0.4272120 0.669224911 0.84870693
CNAG_00004 56.07399 -0.05626052 0.1771525 -0.3175824 0.750801714 0.89623448
CNAG_00005 44.69432 -0.52210975 0.1925740 -2.7112160 0.006703695 0.05953184
CNAG_00006 434.90795 -0.35984037 0.1175182 -3.0619982 0.002198648 0.02705483

> write.csv(bck_res,file="bck_results.csv")

I should have filtered out those genes with few or no counts, but I can do that after importing the data into Excel.

Thanks again.

Maureen

**gong chen** · 12-09-2014, 07:44 AM

I just want to say thank you Mdonlin. I am new to data analysis and your final summary helped me a lot.

**MDonlin** · 12-09-2014, 09:37 AM

Gong,

Glad the summary helped.

Maureen

**m10001** · 05-11-2015, 08:38 PM

Maureen,

Many thanks for your detailed summary. This was perfect to get me up and running.

Cheers,
Mark

**Sow** · 02-16-2016, 05:52 PM

Hi,
I'm still not able to enter the data. Here's what it shows..

RivsRU <-read.table("RIvsRUdata", header=TRUE, row.names = 1)
> head(RivsRU)
FEATURE_ID RI2 RI3 RI1 RU1 RU2 RU3
1 AAAAAAAAAAAAAAAAA 2 3 2 8 1 2
2 AAAAAAAAAAAAAAAAAA 2 2 1 10 1 5
3 AAAAAAAAAAAAAAAAAAA 0 1 0 3 2 1
4 AAAAAAAAAAAAAAAAAAAA 1 1 0 2 0 0
5 AAAAAAAAAAAAAAAAAAAAA 0 0 1 2 1 2
6 AAAAAAAAAAAAAAAAAAAAAACAAAAA 1 0 0 0 0 0
>samples <- data.frame(row.names=c("RI1","RI2","RI3","RU1","RU2","RU3"), condition=as.factor(c(rep("RI",3),rep("RU",3))))

> RIvsRU <- DESeqDataSetFromMatrix(countData = RivsRU, colData = samples, design=~conditions)

Error in validObject(.Object) :
invalid class “SummarizedExperiment0” object: 'assays' ncol differs from 'colData' nrow
In addition: Warning message:
In sort(rownames(colData)) == sort(colnames(countData)) :
longer object length is not a multiple of shorter object length

Could someone please tell me what am I doing wrong?
Thanks

**Sow** · 02-16-2016, 06:20 PM

I see that I still have the gene names as a separate column - my bad.
What would be the code to change row names to gene names?
Sorry if this is a simple thing in R, I'm still very new to R!

**Michael.Ante** · 02-17-2016, 12:32 AM

Hi Sow,

just use:

Code:

x=RivsRU[,-1] #removing the first column
rownames(x)=RivsRU[,1] # adding the first column as rownames

Afterwards you'll just use countData=x.

P.S.: be careful using similar names for different variables: RivsRU and RIvsRU will be hard for others to understand your code.

Cheers,
Michael

**Sow** · 02-17-2016, 12:41 AM

Thanks Michele,

I will keep your suggestion in mind as well.

Sow

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

DESeq2 DataSetFromMatrix question

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News