![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
DESeq2 | Simon Anders | Bioinformatics | 123 | 07-06-2015 02:45 AM |
getting data into DESeq2 | Gordona | RNA Sequencing | 1 | 06-27-2014 11:04 AM |
DESeq2 for R-version 3.1? | IsBeth | RNA Sequencing | 2 | 04-11-2014 08:56 AM |
Getting started with DESeq2 | ThePresident | Bioinformatics | 9 | 02-27-2014 05:40 PM |
DESeq2 SummarizedExperiment help | sindrle | Bioinformatics | 10 | 10-23-2013 05:20 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Toronto Join Date: May 2012
Posts: 18
|
![]()
I have a text file, containing read counts per gene for each treatments and control, with the following column
[Gene Symbol] [C1] [C2] [C3] [B1] [B2] [B3] [A1] [A2] [A3] C is Control A is Treatment 1 B is Treatment 2 -> Each of C, A an B have 3 replicates When I do data.frame it generates an error Code:
library( "DESeq2" ) library("Biobase") mydata4 = read.table("matrix4.txt", header=TRUE) head(mydata4) samples <- data.frame(row.names=c("C1", "C2", "C3", "B1", "B2", "B3", "A1", "A2", "A3"), condition=as.factor(c(rep("C",3), rep("B", 3), rep("A", 3)))) Error in data.frame(row.names = c("C1", "C2", "C3", "B1", "B2", "B3", : row names supplied are of the wrong length How can I fix that ? I seems I need to subset my data though I don't know how to do this and deseq2 doesn't transpose my columns to rows ![]() ![]() |
![]() |
![]() |
![]() |
#2 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
That command won't produce the error you showed.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Toronto Join Date: May 2012
Posts: 18
|
![]()
Indeed, I started again from a fresh R command shell, and that error doesn't appear anymore. However it generates another one:
Code:
library( "DESeq2" ) library("Biobase") mydata = read.table("matrix4.txt", header=TRUE) samples <- data.frame(row.names=c("C1", "C2", "C3", "B1", "B2", "B3", "A1", "A2", "A3"), condition=as.factor(c(rep("C",3), rep("B", 3), rep("A", 3)))) dds <- DESeqDataSetFromMatrix(countData = as.matrix(mydata), colData=samples, design=~condition) Error in validObject(.Object) : invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol So I check the number of columns and rows for each and ncol has 1 more than ncol I'm guessing it's coming from the gene symbol column..though how can I fix this?? Last edited by KYR; 07-16-2014 at 02:11 PM. Reason: typo |
![]() |
![]() |
![]() |
#4 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
What's the output of
Code:
dim(as.matrix(mydata)) |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
Slight Tangent ... Using "htseq-count" outputs ... this works for me ....
Here's the template I hack for deseq2 .... source("http://bioconductor.org/biocLite.R") library(DESeq2,lib.loc="/home/finneyr/Rlib") sampleFiles = c( "file1.txt" , "file2.txt" , #... (fill in the names of you htseq count files here 1 to N files. :"filen.txt" ) #set your condtions for the files in SampleFiles sampleCondition = c( "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "treated" , "untreated" , "untreated" , "untreated" , "untreated" , "untreated" ) sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition) #might need this ... I'm not sure libType = c ( "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" , "paired-end" ) options(max.print=100000) options(width=500) #set directory to your place where you keep your files llisted in "sampleFiles" which are htseqcount output files. directory="/data/nextgen/finneyr/novo/CNT" ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition) print(ddsHTSeq) colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("treated","untreated")) dds<-DESeq(ddsHTSeq) print(dds) res<-results(dds) print(res); # sort by padj (:adjusted p-value") ... res<-res[order(res$padj),] #write results to file name "rpt5" , change this to your output file name, deseq2 explains log2foldchange and other fields. write.csv(as.data.frame(res),file="rpt5") q(save="no") Last edited by Richard Finney; 07-16-2014 at 02:21 PM. |
![]() |
![]() |
![]() |
#6 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
Why do you print the results (and not even in an abbreviated form!) to screen when you're just going to write them to a file as well?
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Boston Join Date: Jul 2013
Posts: 333
|
![]()
see Devon Ryan's answer above. if you show us head(as.matrix(mydata)) I'm guessing the first column might not be counts. it could be gene names, converted by read.table into factors.
|
![]() |
![]() |
![]() |
#8 | |
Member
Location: Toronto Join Date: May 2012
Posts: 18
|
![]() Quote:
Code:
> dim(as.matrix(mydata4)) [1] 25197 10 Last edited by KYR; 07-16-2014 at 02:40 PM. Reason: typo on column number |
|
![]() |
![]() |
![]() |
#9 | |
Member
Location: Toronto Join Date: May 2012
Posts: 18
|
![]() Quote:
yes that's what's happening here, the first column gene symbol should be row.names. Though I don't know how to fix that. Any help would be greatlly appreciated.. ![]() |
|
![]() |
![]() |
![]() |
#10 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
You actually have 4 additional columns, since you only described 9 samples. You probably just need to:
Code:
mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1) |
![]() |
![]() |
![]() |
#11 | |
Member
Location: Toronto Join Date: May 2012
Posts: 18
|
![]() Quote:
That's what I've done originally, but it gave me the following error: Code:
> mydata4 = read.table("matrix4.txt", header=TRUE, row.names=1) Error in read.table("matrix4.txt", header = TRUE, row.names = 1) : duplicate 'row.names' are not allowed |
|
![]() |
![]() |
![]() |
#12 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
You could also simply:
Code:
mydata4 <- mydata4[,-1] |
![]() |
![]() |
![]() |
#13 | |
Member
Location: Toronto Join Date: May 2012
Posts: 18
|
![]() Quote:
Uhh indeed we have duplicated gene names, we have to investigate before proceeding further. Thanks for your answers ![]() |
|
![]() |
![]() |
![]() |
#14 | ||
Junior Member
Location: USA Join Date: Mar 2019
Posts: 4
|
![]()
Hi
I just started using deseq2 for DE analysis I have an excel sheet input with gene names followed by 16 columns with reads. I tried to generate a matrix using this file and I keep getting an error similar to that is mentioned here So this is what is happening Quote:
This is the output of the head command Quote:
Last edited by rookie_genomics; 03-11-2019 at 12:58 PM. |
||
![]() |
![]() |
![]() |
Tags |
deseq2 |
Thread Tools | |
|
|