SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Getting raw counts needed for Deseq/EdgeR from TCGA RSEM files dnet Bioinformatics 4 03-27-2014 10:17 AM
Help Needed to Interpret DESeq Plots wilson90 Bioinformatics 11 04-26-2013 12:48 AM
cuffdiff does not output all the CDS in cds.FPKM.tracking file xiangq Bioinformatics 20 04-26-2012 11:39 AM
CummeRbund error without CDS dvanic Bioinformatics 5 03-12-2012 09:24 AM
help needed with Bambus error..grommit error roshanbernard Bioinformatics 0 03-17-2011 07:41 PM

Reply
 
Thread Tools
Old 01-23-2014, 09:55 AM   #1
ccard28
Member
 
Location: Rhode Island

Join Date: Jan 2012
Posts: 20
Default DESeq cds error; help needed

Hello,

I am trying to do some differential expression work on my two Illumina datasets. I only have experience with tophat => cufflinks (mostly through galaxy but some command line) so my computing skills are at a minimum, especially in R. I am trying to run the DESeq package in R but an having trouble with the creating the cds to do analysis through DESeq.

I made a raw count file using htseqcount and used copy and paste to put them in excel with 3 columns: the genes, sample A counts, sample B counts. My initial steps in R for reading the file seem to be working but I get an error "not an integer: missing value where TRUE/FALSE needed" when making the cds.

Thank you in advance for any help in correcting my errors. My R work is as follows after loading the DESeq library:

> countTable <- read.csv( "~/Desktop/mergedcounts.csv", header=TRUE, row.names=1)
> head(countTable)
A B
20ALPHA-HSD 0 0
A1BG 0 0
A2M 0 0
A2ML1 0 21
A4GNT 0 0
AAAS 0 1
> conds <- factor( c( "highfert", "lowfert" ) )
> conds
[1] highfert lowfert
Levels: highfert lowfert
> cds <- newCountDataSet( countTable, conds )
Error in if (any(round(countData) != countData)) stop("The countData is not integer.") :
missing value where TRUE/FALSE needed
ccard28 is offline   Reply With Quote
Old 01-24-2014, 01:37 PM   #2
Wolfgang Huber
Senior Member
 
Location: Heidelberg, Germany

Join Date: Aug 2009
Posts: 109
Default

Hi ccard28

you need to make sure that your 'countTable' is a data.frame whose columns are numeric variables of storage class 'integer', and contain no NA (and no negative) values. It might be necessary to read a basic R intro to familiarize yourself with these concepts.

To trouble-shoot, you could try (not tested):

sapply(countTable, function(x) which(is.na(x)))

Best wishes
Wolfgang
__________________
Wolfgang Huber
EMBL
Wolfgang Huber is offline   Reply With Quote
Old 01-29-2014, 10:45 AM   #3
ccard28
Member
 
Location: Rhode Island

Join Date: Jan 2012
Posts: 20
Default

I feel that my data table does follow the correct parameters. From my understanding of the commands i used thus far:

countTable <- read.csv( "~/Desktop/mergedcounts.csv", header=TRUE, row.names=1)

This is creating the data.frame file that is needed. This command reads my table and calls the first row in which in my excel.csv I have A and B for column headers for my 2 samples. The row.names=1 should be saying that column one is my row names which in my .csv file are gene names. All of my values are read counts that are all whole, positive numbers with many 0s as well so this should satisfy the integer requirement.

If I am calling row 1 column names with "header=TRUE" and the first column the row names with "row.names=1" that leaves me with only positive whole numbers and 0s which should satisfy the integer requirement but why would I keep getting the error:
"Error in if (any(round(countData) != countData)) stop("The countData is not integer.") : missing value where TRUE/FALSE needed" ?

My data is printing fine in R so the table is importing correctly but I still don't understand why the error keeps occurring. Could my column header or row.names functions not be separating the letters/gene names correctly? Could it by my conditions i set up with "conds <- factor( c( "highfert", "lowfert" ) )" is messing things up when trying to create the cds?

Creating the cds seems like it should be a simple step especially with my data apparently printing correctly within the R console when checking the countTable. Without the cds working I can't do any actual analysis within DESeq.

I tried to read up on sapply and tried your sapply command and it did not change anything with the error and I am not entirely sure the basis for using sappily in this instance.

Any other input would be very welcome.

Thank You,
ccard28
ccard28 is offline   Reply With Quote
Old 01-29-2014, 11:53 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

You might just cut to the chase and:

Code:
which(round(countTable) != countTable)
To see the index in (the matrix conversion of) countTable that's causing problems.

The point of Wolfang's sapply method was to output a matrix of True/False values so you can see which cells of your table might be producing NA values. It won't actually change anything, but instead print the results to screen. You could easily find out how many of the cells are producing NA values with:

Code:
table(sapply(countTable, function(x) which(is.na(x))))
You'll find a basic fluency in R to be extremely useful in bioinformatics.
dpryan is offline   Reply With Quote
Old 01-31-2014, 11:46 AM   #5
ccard28
Member
 
Location: Rhode Island

Join Date: Jan 2012
Posts: 20
Default

Thank you both very much for your input. I was able to interpret the sapply function that you both mentioned and determine the 2 rows that had missing values that were causing problems with my cds creation. Without the sapply I never would have found them amongst the thousands of rows, much appreciated.
ccard28 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:23 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO