Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • EDGE-pro into DESeq, DESeq run error?

    Hi All,

    I am doing rRNA reduced, strand specific RNA-seq in bacterial species and decided to try EDGE-pro (Estimated Degree of Gene Expression in Prokaryotic Genomes) and then run the count table in DESeq.

    I have four treatments and three replicates each. When I try to compare the three replicates of the untreated wild-type strain with three replicates of the treated wild-type strain DESeq runs for >10 hours and has thrown an error about running out of memory.

    Using Bioconductor version 2.12 (BiocInstaller 1.10.1), R version 3.0.0.

    countTable <- read.table("deseqFile_WT.txt", header=T, sep="\t", row.names=1)
    expt_design <- data.frame(row.names = colnames(count_table),condition = c("WT_UT","WT_UT","WT_UT","WT_T","WT_T","WT_T"))
    conditions = expt_design$condition
    library("DESeq")
    cds <- newCountDataSet(countTable, conditions)
    cds <- estimateSizeFactors(cds)
    cds <- estimateDispersions(cds)
    results <- nbinomTest(cds, "WT_UT", "WT_T")
    = no output, 100% CPU and >10-12 hours later...

    Any help is greatly appreciated.

  • #2
    Maybe your count table contains strangely large numbers, exceeding a few hundred thousand reads for a single gene? This could confuse DESeq into taking so long.

    What is edge-pro, by the way? Are you sure it is a tool that simply counts how many reads map to gene, rather than doing something more sophisticated? (DESeq expects, as input, a table of raw read counts, i.e., a table that gives for each gene and each sample the number of reads mapped to it. Strangely, many people seem to assume that one could also input other data than raw read counts, which is why I now always ask.)

    Comment


    • #3
      Thank you very much for your reply and help.


      T. Magoc, D. Wood, and S.L. Salzberg. EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes. Evolutionary Bioinformatics vol.9, pp.127-136, 2013.

      The program uses bowtie2, outputs a table that does include the raw read count with the fpkm. The package recommends using DESeq to calculate differential expression and includes a script "edgeToDeseq.perl" to format the count table.

      I have a bacterial genome with only ~3800 genes and I did 15M PE-50 reads per sample with the expectation that the rRNA reduction would reduce the usable data set. I did remove the rRNA counts from the table thinking that might be the cause of my problem.

      I have ~10 genes with 1E+08 counts, about 150 with 1E+07, 500 with 1E+06, 250 with 1E+05 and the rest <100K.

      I usually work with human/mouse/zebrafish/yeast and didn't think about counts being too high. Is there a recommended way to reduce my count table? Can I divide by the same number to bring the highest count to the 100K count range?

      Comment


      • #4
        I should have caught this, the output of EDGE-pro is not logical with the read count per sample. SUM of counts from one sample column = 6,705,152,768!

        Comment


        • #5
          Hi epistatic,
          I am using EDGE-pro for a very similar experimental setup (rRNA-depleted, stranded RNAseq data from a bacterial species) and I was wondering why you got this extreme number in the sum of counts from EDGE-pro. I mean is this a bug of the program or?? I have a number of unresolved issues with EDGE-pro (still trying to make it run without throwing errors ), and I see that the developers are not very quick in answering, nor are there many people using it, judging from the forums. I would not like to invest more if there is a known bug at the end of the pipeline that doesnot allow to do differential expression analysis at the end, so your insights would be very helpful!
          Thanks,
          Ana

          Comment


          • #6
            There were a few bugs in the software causing high counts and also duplication of columns during output concatenation that seem resolved when I contacted the developers. I only used the program a few more times in 2013, it ran without error, and the gene expression levels made sense and were concordant with qPCR and other analysis methods.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X