Unconfigured Ad

**dpryan** · 07-16-2014, 12:22 PM

That command won't produce the error you showed.

**KYR** · 07-16-2014, 12:59 PM

Originally posted by dpryan View Post

That command won't produce the error you showed.

Indeed, I started again from a fresh R command shell, and that error doesn't appear anymore. However it generates another one:

Code:

library( "DESeq2" )
library("Biobase")
mydata = read.table("matrix4.txt", header=TRUE)
samples <- data.frame(row.names=c("C1", "C2", "C3", "B1", "B2", "B3", "A1", "A2", "A3"), condition=as.factor(c(rep("C",3), rep("B", 3), rep("A", 3))))

dds <- DESeqDataSetFromMatrix(countData = as.matrix(mydata), colData=samples, design=~condition)

Error in validObject(.Object) : 
  invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol

So I check the number of columns and rows for each and ncol has 1 more than ncol

I'm guessing it's coming from the gene symbol column..though how can I fix this??

**dpryan** · 07-16-2014, 01:05 PM

What's the output of

Code:

dim(as.matrix(mydata))

I'm guessing that the mydata object has the gene names as a column rather than as the row names.

**Richard Finney** · 07-16-2014, 01:11 PM

Slight Tangent ... Using "htseq-count" outputs ... this works for me ....

Here's the template I hack for deseq2 ....

source("http://bioconductor.org/biocLite.R")
library(DESeq2,lib.loc="/home/finneyr/Rlib")

sampleFiles = c(
"file1.txt" ,
"file2.txt" ,
#... (fill in the names of you htseq count files here 1 to N files.
:"filen.txt"
)

#set your condtions for the files in SampleFiles
sampleCondition = c( "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "untreated" , "untreated" , "untreated" , "untreated" , "untreated" )

sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)

#might need this ... I'm not sure
libType = c ( "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" )

options(max.print=100000)
options(width=500)

#set directory to your place where you keep your files llisted in "sampleFiles" which are htseqcount output files.
directory="/data/nextgen/finneyr/novo/CNT"
ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)
print(ddsHTSeq)

colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("treated","untreated"))

dds<-DESeq(ddsHTSeq)
print(dds)
res<-results(dds)
print(res);
# sort by padj (:adjusted p-value") ...
res<-res[order(res$padj),]

#write results to file name "rpt5" , change this to your output file name, deseq2 explains log2foldchange and other fields.
write.csv(as.data.frame(res),file="rpt5")
q(save="no")

**dpryan** · 07-16-2014, 01:18 PM

Why do you print the results (and not even in an abbreviated form!) to screen when you're just going to write them to a file as well?

**Michael Love** · 07-16-2014, 01:23 PM

see Devon Ryan's answer above. if you show us head(as.matrix(mydata)) I'm guessing the first column might not be counts. it could be gene names, converted by read.table into factors.

**KYR** · 07-16-2014, 01:29 PM

Originally posted by dpryan View Post

What's the output of

Code:

dim(as.matrix(mydata))

I'm guessing that the mydata object has the gene names as a column rather than as the row names.

I gives me the following:

Code:

> dim(as.matrix(mydata4))
[1] 25197    10

I guess I have one more row column because of the gene symbol column which is first and should be row.names.

**KYR** · 07-16-2014, 01:32 PM

Originally posted by Michael Love View Post

see Devon Ryan's answer above. if you show us head(as.matrix(mydata)) I'm guessing the first column might not be counts. it could be gene names, converted by read.table into factors.

yes that's what's happening here, the first column gene symbol should be row.names. Though I don't know how to fix that. Any help would be greatlly appreciated..

**dpryan** · 07-16-2014, 01:33 PM

You actually have 4 additional columns, since you only described 9 samples. You probably just need to:

Code:

mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1)

**KYR** · 07-16-2014, 01:39 PM

Originally posted by dpryan View Post

You actually have 4 additional columns, since you only described 9 samples. You probably just need to:

Code:

mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1)

That's what I've done originally, but it gave me the following error:

Code:

> mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1)
Error in read.table("matrix4.txt", header = TRUE, row.names = 1) : 
  duplicate 'row.names' are not allowed

So I decided to dump the row.names, this is weird it never happened before.

**dpryan** · 07-16-2014, 01:42 PM

You could also simply:

Code:

mydata4 <- mydata4[,-1]

However, you should investigate why you have duplicate gene names. It's likely that something went amiss when that file was made.

**KYR** · 07-16-2014, 01:50 PM

Originally posted by dpryan View Post

You could also simply:

Code:

mydata4 <- mydata4[,-1]

However, you should investigate why you have duplicate gene names. It's likely that something went amiss when that file was made.

Uhh indeed we have duplicated gene names, we have to investigate before proceeding further. Thanks for your answers

**rookie_genomics** · 03-11-2019, 11:51 AM

Hi

I just started using deseq2 for DE analysis
I have an excel sheet input with gene names followed by 16 columns with reads.
I tried to generate a matrix using this file and I keep getting an error similar to that is mentioned here
So this is what is happening

analysis2 <- as.matrix(deseq2_analysis2)
(condition <- factor(c(rep("group1", 4), rep("group2", 4), rep("group3", 4), rep("group4", 4))))
group1 group1 group1 group1 group2 group2 group2 group2 group3 group3 group3 group3 group4
group4 group4 group4
Levels: group1 group2 group3 group4
(coldata <- data.frame(row.names=colnames(analysis2), condition))
Error in data.frame(row.names = colnames(analysis2), condition) :
row names supplied are of the wrong length

What am I doing wrong? I want an output that is sorted by gene names

This is the output of the head command

head(analysis2)
gene.name Sample1_group1 Sample2_group1 Sample3_group1 Sample4_group1 Sample1_group2
[1,] "YAL068C" " 0" " 0" " 0" " 0" " 2"
[2,] "YAL067W-A" " 0" " 0" " 0" " 0" " 0"
[3,] "YAL067C" " 243" " 242" " 109" " 130" " 271"
[4,] "YAL065C" " 16" " 7" " 52" " 30" " 23"
[5,] "YAL064W-B" " 23" " 21" " 21" " 33" " 11"
[6,] "YAL064C-A" " 38" " 42" " 76" " 88" " 47"
Sample2_group2 Sample3_group2 Sample4_group2 Sample1_group3 Sample2_group3 Sample3_group3
[1,] " 0" " 2" " 0" " 0" " 1" " 6"
[2,] " 0" " 2" " 0" " 0" " 0" " 0"
[3,] " 233" " 132" " 150" " 228" " 212" " 174"
[4,] " 10" " 22" " 46" " 15" " 17" " 46"
[5,] " 28" " 56" " 19" " 19" " 22" " 40"
[6,] " 40" " 44" " 65" " 42" " 35" " 74"
Sample4_group3 Sample1_group4 Sample2_group4 Sample3_group4 Sample4_group4
[1,] " 2" " 0" " 1" " 0" " 0"
[2,] " 0" " 0" " 0" " 0" " 0"
[3,] " 176" " 96" " 73" " 132" " 77"
[4,] " 39" " 18" " 11" " 39" " 27"
[5,] " 27" " 20" " 16" " 26" " 18"
[6,] " 83" " 49" " 23" " 55" " 52"

I am new to R and DE analysis and any help will be appreciated

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 107 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

Need help with DESeq2

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News