SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq error clemen Bioinformatics 22 03-27-2014 04:59 AM
RNA-Seq: EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes. Newsbot! Literature Watch 0 03-28-2013 02:00 AM
Need run DESeq in Galaxy? Please Help! byou678 Bioinformatics 1 01-29-2013 01:45 PM
DESeq error Carmen Bioinformatics 2 01-17-2013 05:22 AM
DESeq Plot Error coralgirl Bioinformatics 2 11-29-2011 09:35 AM

Reply
 
Thread Tools
Old 05-03-2013, 08:28 AM   #1
epistatic
Senior Member
 
Location: Dronning Maud Land

Join Date: Mar 2009
Posts: 129
Default EDGE-pro into DESeq, DESeq run error?

Hi All,

I am doing rRNA reduced, strand specific RNA-seq in bacterial species and decided to try EDGE-pro (Estimated Degree of Gene Expression in Prokaryotic Genomes) and then run the count table in DESeq.

I have four treatments and three replicates each. When I try to compare the three replicates of the untreated wild-type strain with three replicates of the treated wild-type strain DESeq runs for >10 hours and has thrown an error about running out of memory.

Using Bioconductor version 2.12 (BiocInstaller 1.10.1), R version 3.0.0.

countTable <- read.table("deseqFile_WT.txt", header=T, sep="\t", row.names=1)
expt_design <- data.frame(row.names = colnames(count_table),condition = c("WT_UT","WT_UT","WT_UT","WT_T","WT_T","WT_T"))
conditions = expt_design$condition
library("DESeq")
cds <- newCountDataSet(countTable, conditions)
cds <- estimateSizeFactors(cds)
cds <- estimateDispersions(cds)
results <- nbinomTest(cds, "WT_UT", "WT_T")
= no output, 100% CPU and >10-12 hours later...

Any help is greatly appreciated.
epistatic is offline   Reply With Quote
Old 05-03-2013, 11:46 AM   #2
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 993
Default

Maybe your count table contains strangely large numbers, exceeding a few hundred thousand reads for a single gene? This could confuse DESeq into taking so long.

What is edge-pro, by the way? Are you sure it is a tool that simply counts how many reads map to gene, rather than doing something more sophisticated? (DESeq expects, as input, a table of raw read counts, i.e., a table that gives for each gene and each sample the number of reads mapped to it. Strangely, many people seem to assume that one could also input other data than raw read counts, which is why I now always ask.)
Simon Anders is offline   Reply With Quote
Old 05-03-2013, 12:17 PM   #3
epistatic
Senior Member
 
Location: Dronning Maud Land

Join Date: Mar 2009
Posts: 129
Default

Thank you very much for your reply and help.

http://ccb.jhu.edu/software/EDGE-pro/
T. Magoc, D. Wood, and S.L. Salzberg. EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes. Evolutionary Bioinformatics vol.9, pp.127-136, 2013.

The program uses bowtie2, outputs a table that does include the raw read count with the fpkm. The package recommends using DESeq to calculate differential expression and includes a script "edgeToDeseq.perl" to format the count table.

I have a bacterial genome with only ~3800 genes and I did 15M PE-50 reads per sample with the expectation that the rRNA reduction would reduce the usable data set. I did remove the rRNA counts from the table thinking that might be the cause of my problem.

I have ~10 genes with 1E+08 counts, about 150 with 1E+07, 500 with 1E+06, 250 with 1E+05 and the rest <100K.

I usually work with human/mouse/zebrafish/yeast and didn't think about counts being too high. Is there a recommended way to reduce my count table? Can I divide by the same number to bring the highest count to the 100K count range?
epistatic is offline   Reply With Quote
Old 05-03-2013, 12:34 PM   #4
epistatic
Senior Member
 
Location: Dronning Maud Land

Join Date: Mar 2009
Posts: 129
Default

I should have caught this, the output of EDGE-pro is not logical with the read count per sample. SUM of counts from one sample column = 6,705,152,768!
epistatic is offline   Reply With Quote
Old 09-02-2015, 03:57 AM   #5
annieg
Junior Member
 
Location: Europe

Join Date: Nov 2014
Posts: 1
Default

Hi epistatic,
I am using EDGE-pro for a very similar experimental setup (rRNA-depleted, stranded RNAseq data from a bacterial species) and I was wondering why you got this extreme number in the sum of counts from EDGE-pro. I mean is this a bug of the program or?? I have a number of unresolved issues with EDGE-pro (still trying to make it run without throwing errors ), and I see that the developers are not very quick in answering, nor are there many people using it, judging from the forums. I would not like to invest more if there is a known bug at the end of the pipeline that doesnot allow to do differential expression analysis at the end, so your insights would be very helpful!
Thanks,
Ana
annieg is offline   Reply With Quote
Old 09-02-2015, 04:32 AM   #6
epistatic
Senior Member
 
Location: Dronning Maud Land

Join Date: Mar 2009
Posts: 129
Default

There were a few bugs in the software causing high counts and also duplication of columns during output concatenation that seem resolved when I contacted the developers. I only used the program a few more times in 2013, it ran without error, and the gene expression levels made sense and were concordant with qPCR and other analysis methods.
epistatic is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:50 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO