SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   RNA Sequencing (http://seqanswers.com/forums/forumdisplay.php?f=26)
-   -   Need help with DESeq2 (http://seqanswers.com/forums/showthread.php?t=45029)

KYR 07-16-2014 12:15 PM

Need help with DESeq2
 
I have a text file, containing read counts per gene for each treatments and control, with the following column

[Gene Symbol] [C1] [C2] [C3] [B1] [B2] [B3] [A1] [A2] [A3]
C is Control
A is Treatment 1
B is Treatment 2
-> Each of C, A an B have 3 replicates

When I do data.frame it generates an error

Code:

library( "DESeq2" )
library("Biobase")
mydata4 = read.table("matrix4.txt", header=TRUE)
head(mydata4)

samples <- data.frame(row.names=c("C1", "C2", "C3", "B1", "B2", "B3", "A1", "A2", "A3"), condition=as.factor(c(rep("C",3), rep("B", 3), rep("A", 3))))
Error in data.frame(row.names = c("C1", "C2", "C3", "B1", "B2", "B3",  :
  row names supplied are of the wrong length


How can I fix that ? I seems I need to subset my data though I don't know how to do this and deseq2 doesn't transpose my columns to rows :confused::confused:

dpryan 07-16-2014 12:22 PM

That command won't produce the error you showed.

KYR 07-16-2014 12:59 PM

Quote:

Originally Posted by dpryan (Post 145157)
That command won't produce the error you showed.

Indeed, I started again from a fresh R command shell, and that error doesn't appear anymore. However it generates another one:


Code:

library( "DESeq2" )
library("Biobase")
mydata = read.table("matrix4.txt", header=TRUE)
samples <- data.frame(row.names=c("C1", "C2", "C3", "B1", "B2", "B3", "A1", "A2", "A3"), condition=as.factor(c(rep("C",3), rep("B", 3), rep("A", 3))))

dds <- DESeqDataSetFromMatrix(countData = as.matrix(mydata), colData=samples, design=~condition)

Error in validObject(.Object) :
  invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol


So I check the number of columns and rows for each and ncol has 1 more than ncol

I'm guessing it's coming from the gene symbol column..though how can I fix this??

dpryan 07-16-2014 01:05 PM

What's the output of
Code:

dim(as.matrix(mydata))
I'm guessing that the mydata object has the gene names as a column rather than as the row names.

Richard Finney 07-16-2014 01:11 PM

Slight Tangent ... Using "htseq-count" outputs ... this works for me ....

Here's the template I hack for deseq2 ....

source("http://bioconductor.org/biocLite.R")
library(DESeq2,lib.loc="/home/finneyr/Rlib")

sampleFiles = c(
"file1.txt" ,
"file2.txt" ,
#... (fill in the names of you htseq count files here 1 to N files.
:"filen.txt"
)

#set your condtions for the files in SampleFiles
sampleCondition = c( "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "untreated" , "untreated" , "untreated" , "untreated" , "untreated" )

sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)

#might need this ... I'm not sure
libType = c ( "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" )

options(max.print=100000)
options(width=500)

#set directory to your place where you keep your files llisted in "sampleFiles" which are htseqcount output files.
directory="/data/nextgen/finneyr/novo/CNT"
ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)
print(ddsHTSeq)

colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("treated","untreated"))

dds<-DESeq(ddsHTSeq)
print(dds)
res<-results(dds)
print(res);
# sort by padj (:adjusted p-value") ...
res<-res[order(res$padj),]

#write results to file name "rpt5" , change this to your output file name, deseq2 explains log2foldchange and other fields.
write.csv(as.data.frame(res),file="rpt5")
q(save="no")

dpryan 07-16-2014 01:18 PM

Why do you print the results (and not even in an abbreviated form!) to screen when you're just going to write them to a file as well?

Michael Love 07-16-2014 01:23 PM

see Devon Ryan's answer above. if you show us head(as.matrix(mydata)) I'm guessing the first column might not be counts. it could be gene names, converted by read.table into factors.

KYR 07-16-2014 01:29 PM

Quote:

Originally Posted by dpryan (Post 145164)
What's the output of
Code:

dim(as.matrix(mydata))
I'm guessing that the mydata object has the gene names as a column rather than as the row names.

I gives me the following:

Code:

> dim(as.matrix(mydata4))
[1] 25197    10

I guess I have one more row column because of the gene symbol column which is first and should be row.names.

KYR 07-16-2014 01:32 PM

Quote:

Originally Posted by Michael Love (Post 145169)
see Devon Ryan's answer above. if you show us head(as.matrix(mydata)) I'm guessing the first column might not be counts. it could be gene names, converted by read.table into factors.


yes that's what's happening here, the first column gene symbol should be row.names. Though I don't know how to fix that. Any help would be greatlly appreciated.. :)

dpryan 07-16-2014 01:33 PM

You actually have 4 additional columns, since you only described 9 samples. You probably just need to:
Code:

mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1)

KYR 07-16-2014 01:39 PM

Quote:

Originally Posted by dpryan (Post 145172)
You actually have 4 additional columns, since you only described 9 samples. You probably just need to:
Code:

mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1)


That's what I've done originally, but it gave me the following error:

Code:

> mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1)
Error in read.table("matrix4.txt", header = TRUE, row.names = 1) :
  duplicate 'row.names' are not allowed

So I decided to dump the row.names, this is weird it never happened before.

dpryan 07-16-2014 01:42 PM

You could also simply:
Code:

mydata4 <- mydata4[,-1]
However, you should investigate why you have duplicate gene names. It's likely that something went amiss when that file was made.

KYR 07-16-2014 01:50 PM

Quote:

Originally Posted by dpryan (Post 145175)
You could also simply:
Code:

mydata4 <- mydata4[,-1]
However, you should investigate why you have duplicate gene names. It's likely that something went amiss when that file was made.


Uhh indeed we have duplicated gene names, we have to investigate before proceeding further. Thanks for your answers :)

rookie_genomics 03-11-2019 11:51 AM

Hi

I just started using deseq2 for DE analysis
I have an excel sheet input with gene names followed by 16 columns with reads.
I tried to generate a matrix using this file and I keep getting an error similar to that is mentioned here
So this is what is happening

Quote:

analysis2 <- as.matrix(deseq2_analysis2)
(condition <- factor(c(rep("group1", 4), rep("group2", 4), rep("group3", 4), rep("group4", 4))))
group1 group1 group1 group1 group2 group2 group2 group2 group3 group3 group3 group3 group4
group4 group4 group4
Levels: group1 group2 group3 group4
(coldata <- data.frame(row.names=colnames(analysis2), condition))
Error in data.frame(row.names = colnames(analysis2), condition) :
row names supplied are of the wrong length

What am I doing wrong? I want an output that is sorted by gene names

This is the output of the head command

Quote:

head(analysis2)
gene.name Sample1_group1 Sample2_group1 Sample3_group1 Sample4_group1 Sample1_group2
[1,] "YAL068C" " 0" " 0" " 0" " 0" " 2"
[2,] "YAL067W-A" " 0" " 0" " 0" " 0" " 0"
[3,] "YAL067C" " 243" " 242" " 109" " 130" " 271"
[4,] "YAL065C" " 16" " 7" " 52" " 30" " 23"
[5,] "YAL064W-B" " 23" " 21" " 21" " 33" " 11"
[6,] "YAL064C-A" " 38" " 42" " 76" " 88" " 47"
Sample2_group2 Sample3_group2 Sample4_group2 Sample1_group3 Sample2_group3 Sample3_group3
[1,] " 0" " 2" " 0" " 0" " 1" " 6"
[2,] " 0" " 2" " 0" " 0" " 0" " 0"
[3,] " 233" " 132" " 150" " 228" " 212" " 174"
[4,] " 10" " 22" " 46" " 15" " 17" " 46"
[5,] " 28" " 56" " 19" " 19" " 22" " 40"
[6,] " 40" " 44" " 65" " 42" " 35" " 74"
Sample4_group3 Sample1_group4 Sample2_group4 Sample3_group4 Sample4_group4
[1,] " 2" " 0" " 1" " 0" " 0"
[2,] " 0" " 0" " 0" " 0" " 0"
[3,] " 176" " 96" " 73" " 132" " 77"
[4,] " 39" " 18" " 11" " 39" " 27"
[5,] " 27" " 20" " 16" " 26" " 18"
[6,] " 83" " 49" " 23" " 55" " 52"
I am new to R and DE analysis and any help will be appreciated


All times are GMT -8. The time now is 12:28 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.