Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • edgeR

    I've read and used the DEGseq R-package.
    And edgeR seems to be complement to DEGseq package.

    But while manipulating with the edgeR manual, I insert some data into DGEList.
    However, the counts are shown without the automatic counting library size(lib.size would be NA).

    Does anyone know why it is?
    My R version is 2.12.0.

    Thank you.

  • #2
    haha
    I've found out the reason...
    Some data include missing values.

    Comment


    • #3
      Well, still one question:
      When I want to plot the MDS, I'd like to use the following command:
      plotMDS.dge(d, xlim=c(-2,1));
      d is a DGE object

      However, the R system always shows the following:
      Error in if (mx < tol) { : missing value where TRUE/FALSE needed
      Error during wrapup: cannot open the connection

      Do you guys have this kind of questions?
      How could I solve the problem?


      Thanks!

      Comment


      • #4
        Hmm, this worked fine for me in the last few weeks despite the fact I'm not an edgeR expert.

        Perhaps you still have a problem with missing values, try taking a small high quality subset of your data and retrying with that.

        Comment


        • #5
          Hi all,

          I am having a similar problem to this and was wondering if any one might have come across this before:

          I get the message Error in if (mx < tol) { : missing value where TRUE/FALSE needed when I run the command EstimateCommonDisp(y) .


          > mutant_control= x[,c(1,5,9,6,12,14)]
          > group <- factor(c(1,1,1,2,2,2))
          > y <- DGEList(counts=mutant_control, group=group)
          Calculating library sizes from column totals.
          > y <- estimateCommonDisp(y)
          Error in if (mx < tol) { : missing value where TRUE/FALSE needed

          > head(mutant_control)
          27 31 35 32 38 40
          128up 100.85404 94.66619 87.78034 101.9768 91.39150 85.91481
          14-3-3epsilon 9061.95160 9391.45480 9106.62168 9604.3740 9952.53064 9667.63616
          14-3-3zeta 7959.80739 8169.34580 8478.59387 8434.7244 7926.26723 8587.06141
          140up 19.50291 22.34962 14.74578 15.2824 19.61044 14.21309
          18w 88.16118 113.38107 97.86222 115.4046 120.79999 125.11319
          26-29-p 288.60969 274.10267 262.37095 275.9005 283.34272 296.14799

          > tail(mutant_control)
          27 31 35 32 38 40
          zip 2317.423662 2690.28298 2746.989546 2960.364282 2897.5624980 2985.039414
          zormin 324.178816 270.25428 350.734099 337.749747 370.9414048 304.788741
          zpg 0.000000 0.00000 0.000000 0.000000 0.0000000 0.000000
          zuc 3.015593 1.21086 1.031125 5.638336 2.6222360 4.258978
          zwilch 30.068800 28.26996 25.578376 27.846503 25.5089142 27.275533
          zye 1.292230 0.00000 0.000000 1.486839 0.8172129 0.000000

          Could it be because of the 0.0000 values?

          Thanks a lot,

          Carmen

          Comment


          • #6
            So i think I figured it out and it has to do with the function expecting integers and not real numbers. If you just round your counts matrix everything will run smoothly.

            Cheers!
            Carmen

            Comment


            • #7
              Yes, it runs smoothly but it won't give you correct results. There is a reason that edgeR and DESeq want integer values, namely that you are supposed to supply a table which, for each gene and each sample, tells the number of reads that map to the gene.

              How can 2317.423662 reads map to gene 'zip'?

              Comment


              • #8
                Thank you Simon. I was missing something fundamental about edgeR.

                Comment


                • #9
                  integers

                  edgeR expects integers, but many programs use estimation functions to improve transcript counts... ie: non integers. So you need to round.

                  Comment


                  • #10
                    Sigh.

                    No, you should not round. If you do not have integer counts, your input is not suitable for these tools. This is why they insist that you give them integer counts.

                    Of course, you can trick them into using your unsuitable data by rounding but than you will not get a reliable result. Please only use statistical methods off-label if you know what you are doing.

                    Comment


                    • #11
                      I compared the HTseq derived counts, and the rounded counts from cuffdiff v 2.1.1 (released last week).


                      I run 2-group edgeR, 3 rep. in control and 4 rep in cases.


                      DEGs at FDR 0.05:

                      HTseq derived counts: 475

                      rounded counts from cuffdiff v 2.1.1: 441

                      Overlap: 398.

                      In addition, 439 of the 475 htseq DEGs are of FDR <=0.1 in the results from rounded counts from cuffdiff v 2.1.1.

                      So, maybe using rounded counts data is acceptable in final results even though not strictly following edgeR assumptions?


                      Checking a few replicated (attached plot, r= 0.99 using transformation log(x)+1 ), there are some genes showing very different counts in htseq. Many of them are very short miRNAs thus missed by cufflinks ( counts=0).

                      Click image for larger version

Name:	count.cuffdiff.vs.htseq.ensgene71.PEN.png
Views:	1
Size:	16.1 KB
ID:	304138
                      Last edited by lshen; 04-18-2013, 11:56 AM. Reason: Enhancement content

                      Comment


                      • #12
                        Sure, if the values you obtain by rounding the output of cufflinks happen to be close to the correct values, there is a good chance that the result won't be that different, either.

                        But why would you do that when it is no more difficult to get the correct values in the first place?

                        This willingness of amassing many minor inaccuracies despite better knowledge is common in bioinformatics, but it is still sloppy science.


                        And, with all due respect: If the instructions for a statistical method state very clearly and explicitly that the method requires a certain kind of data as input advises against using the method on other data, and even gives a clear reason, founded on statistical theory, for that -- are you really that confident in you knowledge of advanced statistics that you think you know better?

                        Comment


                        • #13
                          I provide bioinformatics analysis services, and have people talking about using cufflinks counts directly. So I want to take a checking of it in addition to telling them assumptions that you emphasized many times.

                          I used pipleines of htseq count and edgeR/DESeq. And we trusted this combination more than FPKM-based results. But it relies on known gene annotations, whereas cufflinks can do de novo predictions. So I look for the non-expression tests of it (promoter, splicing), and using count based method for expression analysis.

                          Comment


                          • #14
                            Sorry for the harsh tone, which was more directed at post #9.

                            I am simply getting tired from getting asked the same stuff over and over again -- and way too often, I meet this attitude that as soon as a program runs through without throwing an error, the result must be right, no matter what one has done before.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            8 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            8 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            67 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X