Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq problem

    Hi,

    I've a problem to compute DE analysis with DESeq. I've a data.frame a with 10 samples ( one group of four samples and one group of six samples ).
    i
    here's the code I used :

    Code:
    cds <- newCountDataSet(a,conditions=c(rep("a",4),rep("b",8)))
    cds <- estimateSizeFactors(cds)
    cds <- estimateVarianceFunctions(cds)
    After that I've this error :

    Code:
    Erreur dans estimateVarianceFunctions(cds) : 
      NAs found in size factors. Have you called already 'estimateSizeFactors'?
    Indeed, when I look after the sizeFactors :

    Code:
    sizeFactors(cds)
     s1  s2  s3  s4  s5  s6  s7  s8  s9 s10 
     NA  NA  NA  NA  NA  NA  NA  NA  NA  NA

    So I don't understand where is the problem. Here's my data.frame :


    Code:
        s1  s2  s3  s4 s5  s6 s7 s8 s9 s10
    1   12 353 222   0  0   0  0  1  0 804
    2    3 131  26 148  2 139 39  0 53  65
    3    0  10   1   6  1  10  0  0 22  51
    4    0  35   0  12  0  12  0  0  0   0
    5    0   0   4   3  0   0  3  0 23   0
    6    0   8   3   3  0   2  4  0  1   2
    7    0   4   1   7  0   2  1  0  5   1
    8    0   4   0   7  0   6  2  0  2   0
    9    0   0   0   0  0   0  0  0 15   5
    10   0   0   0   0  0   0  0  0 17   0
    11   0   6   1   0  1   5  2  0  0   0
    12   0   1   0   3  1   2  1  0  5   2
    13   0   1   4   3  2   0  2  0  2   0
    14   0   2   3   2  0   1  2  0  2   1
    15   0   1   1   3  1   0  2  0  4   0
    16   0   0   0   0  0   0  0  0  0  12
    17   0   0   0   0  0   0  0  0 11   0
    18   0   0   0   0  0   0  0  0 11   0
    19   0   0   0   0  0   7  3  0  0   0
    20   0   0   1   0  0   0  1  0  7   0
    21   0   0   0   0  0   0  0  0  9   0
    22   0   0   2   0  1   0  0  0  5   0
    23   0   0   1   0  1   0  0  0  2   4
    24   0   0   0   3  0   0  0  0  2   3
    25   0   0   0   0  0   0  0  0  6   2
    26   0   0   0   0  0   0  0  0  4   3
    27   1   0   0   0  0   1  0  0  3   1
    28   0   0   0   0  0   0  0  0  4   2
    29   0   0   4   0  1   0  0  0  0   0
    30   0   1   0   1  0   0  0  0  3   0
    31   0   1   3   0  1   0  0  0  0   0
    32   0   0   1   2  0   0  0  0  2   0
    33   0   0   0   1  0   2  1  0  1   0
    34   0   1   0   0  1   0  0  0  2   0
    35   0   0   1   1  0   0  0  0  2   0
    36   0   0   1   3  0   0  0  0  0   0
    37   0   0   0   2  0   0  0  0  1   1
    38   0   0   0   0  0   2  1  0  1   0
    39   0   0   0   0  0   0  0  0  4   0
    40   1   0   0   0  1   0  0  0  1   0
    41   0   2   0   1  0   0  0  0  0   0
    42   0   1   1   0  0   0  0  0  1   0
    43   0   1   0   0  0   0  1  0  1   0
    44   0   1   0   1  0   0  0  0  1   0
    45   0   0   2   0  0   1  0  0  0   0
    46   0   0   2   0  0   0  0  0  1   0
    47   0   0   1   0  0   2  0  0  0   0
    48   0   0   2   0  0   0  0  0  0   1
    49   0   0   0   0  0   0  0  0  2   1
    50   0   0   0   1  0   0  1  0  1   0
    51   0   0   0   0  0   0  0  0  2   1
    52   0   0   0   0  0   0  0  0  3   0
    53   0   0   0   0  0   0  0  0  3   0
    54   0   0   0   0  0   0  0  0  3   0
    55   0   0   0   0  0   1  1  0  0   0
    56   1   0   0   0  0   0  0  0  1   0
    57   1   0   0   1  0   0  0  0  0   0
    58   0   0   0   1  0   0  0  0  1   0
    59   0   1   0   0  0   0  1  0  0   0
    60   0   1   0   1  0   0  0  0  0   0
    61   0   0   0   0  2   0  0  0  0   0
    62   0   1   1   0  0   0  0  0  0   0
    63   0   0   1   0  0   1  0  0  0   0
    64   0   2   0   0  0   0  0  0  0   0
    65   0   0   1   0  0   0  0  0  1   0
    66   0   0   1   0  0   0  1  0  0   0
    67   0   0   1   1  0   0  0  0  0   0
    68   0   0   0   0  0   0  0  0  0   2
    69   0   0   0   2  0   0  0  0  0   0
    70   0   0   0   1  0   0  0  0  1   0
    71   0   0   0   1  0   0  0  0  1   0
    72   0   0   0   1  0   0  0  0  1   0
    73   0   0   0   1  0   0  0  0  1   0
    74   0   0   0   2  0   0  0  0  0   0
    75   0   0   0   2  0   0  0  0  0   0
    76   0   0   0   0  0   0  0  0  2   0
    77   0   0   0   1  0   0  0  0  0   1
    78   0   0   0   1  0   0  0  0  1   0
    79   0   0   0   0  1   0  0  0  0   1
    80   0   0   0   0  1   0  0  0  0   1
    81   0   0   0   0  1   0  0  0  0   1
    82   0   0   0   0  0   1  0  0  1   0
    83   0   0   0   0  0   2  0  0  0   0
    84   0   0   0   0  0   1  0  0  1   0
    85   0   0   0   0  0   1  0  0  1   0
    86   0   0   0   0  0   1  0  0  1   0
    87   0   0   0   0  0   0  1  0  1   0
    88   0   0   0   0  0   0  2  0  0   0
    89   0   0   0   0  0   0  2  0  0   0
    90   0   0   0   0  0   0  0  0  2   0
    91   0   0   0   0  0   0  0  0  2   0
    92   0   0   0   0  0   0  0  0  2   0
    93   0   0   0   0  0   0  0  0  2   0
    94   0   0   0   0  0   0  0  0  2   0
    95   0   0   0   0  0   0  0  0  2   0
    96   0   0   0   0  0   0  0  0  2   0
    97   0   0   0   0  0   0  0  0  2   0
    98   0   0   0   0  0   0  0  0  2   0
    99   0   0   0   0  0   0  0  0  2   0
    100  0   0   0   0  0   0  0  0  2   0
    Thanks,

    N.


    P.S.: Simon help me pleeeease
    Last edited by NicoBxl; 10-25-2011, 05:18 AM.

  • #2
    Hi, NicoBxl,

    The error may due to the old version of R. Do you use the lastest R version? In my experience, R2.14.0 pre-release version will be ok for DEseq, but not R2.13 or R2.12. I think you should try the lastest R for DEseq.

    Comment


    • #3
      It's strange, it works with an another dataset (same type of data)

      Here's the dataset which worked :

      Code:
      > x
             1      2     3      4    5     6     7  8    9     10
      1   4193 127719 47448 169717    7 86887 46580 24   47 117654
      2    819  19487  6102  37403    1 16896  9752  2    8  14871
      3    303   7705  2103  20996    1  4953  3180  1    2   5313
      4    393   8273  4296  11208    0  4949  3521  2    5   5463
      5      0      0     0      0    0     0     0  0    0  29437
      6    355   4380  1342  10324    0  4463  3954  2    0   3791
      7    329   4197  2416   5705    0  2536  2863  0    4   3714
      8    238   3047  2942   3312    0  1618  2371  1    2   2973
      9     99   1502   706   4412    1  1116  1105  1    0   1240
      10     0      0     0   8470    0     0   929  0    0      0
      11    76   1912  1194   2411    0  1121   769  0    1   1549
      12    75   1433  1136   2389    0  1250   716  0    1   1819
      13     0      0     0   8673    0     0     0  0    0      0
      14    63   1112   299   3241    0   664   517  0    0    657
      15    46    435   143   2233    0   362   444  1    0    334
      16    71    584   180   1446    0   676   521  0    0    479
      17    45    702   336   1044    0   484   442  0    0    510
      18    15    438   308    975    0   578   302  1    0    896
      19    47    628   370    841    0   483   421  0    2    460
      20    17    496   242   1000    0   378   252  0    0    530
      21    17    323   172   1027    0   480   272  0    0    380
      22    21    365   312    865    0   416   233  0    1    425
      23     0      0     0      0    0     0  2569  0    0      0
      24    22    391   159   1027    0   361   330  0    0    217
      25    17    424   280    940    0   392   165  0    1    260
      26     0      0     0      1 1261     0     0  0 1205      0
      27    18    381   168    818    0   363   264  0    0    445
      28    35    247   126   1129    0   233   230  0    3    324
      29    36    244   186    914    0   281   273  0    0    320
      30    16    263   153   1128    0   263    82  0    0    326
      31     0      0     0      0  971     0     0  0 1203      0
      32    12    356   139    972    0   318   188  0    0    162
      33    15    288   273    529    0   253   164  0    0    502
      34    29    431   271    473    0   231   266  0    0    239
      35     0      0     0      0    0     0     0  0    0   1863
      36    27    332   362    342    0   143   198  0    0    305
      37    13    284   130    639    0   242   170  0    0    199
      38    24    220   151    626    0   168   239  0    0    236
      39    19    311   212    398    0   191   211  0    0    211
      40     8    333   109    546    0   237   177  0    0    112
      41    19    152    34    696    0   279   181  0    0    105
      42     8    312   146    297    0   153    88  0    0    430
      43    34    157    41    577    0   254   207  0    0    133
      44     7    166    57    728    0   231    71  0    0    124
      45     0      0     0      0  641     0     0  0  721      0
      46    13    161    58    729    0   113    89  0    0    159
      47     0      0     0      0  550     0     0  0  757      0
      48    11    212    66    521    0   209    96  1    0    136
      49     6    156    46    717    0   102    82  0    0     98
      50    11    136    33    607    0   160    99  0    0     98
      51    10    134    57    453    0   157   140  0    0    173
      52     0      0     0      0  485     0     0  0  594      0
      53    13    202   184    219    0   140    63  0    0    153
      54     8    151    68    375    0   136    75  0    0    147
      55     3    146    45    396    0    97   117  0    0    141
      56    64     89    58    280    0   163    50  0    0    240
      57     3    126   107    291    0   124    67  0    0    166
      58    13    185    94    246    0    90   108  0    0    129
      59     1    103    48    373    0   145    30  0    0    150
      60    11    189    65    219    0   133   100  0    0     93
      61     6    123   124    182    0   115    69  0    0    179
      62     7    145    99    209    0    88    73  0    0    128
      63     5    115   119    256    0    90    40  0    0    115
      64     0     85    56    384    0    86    27  0    0     96
      65     2    117    50    163    0   112    71  0    0    202
      66     9    127   117    178    0    69    72  0    0    137
      67     5    123    92    200    0    93    42  0    0    117
      68     5    145   165    115    0    71    60  0    0    105
      69     3    110    49    237    0   111    61  0    0     58
      70     6     60    34    270    0   115    75  0    0     66
      71     0      0     0      0  612     0     0  0    0      0
      72     4     86    31    238    0   102    42  0    0     68
      73     0      0     0      0  549     0     0  0    0      0
      74     0      0     0      0  253     0     0  0  282      0
      75     0      0     0      0  232     0     0  0  295      0
      76     6     34    16    288    0    45    83  0    0     43
      77     6     59    30    289    0    54    36  0    0     38
      78     4     61    28    202    0    88    54  0    0     63
      79     0      0    95      0    0   279   120  0    0      0
      80     0      0    34    346    0   109     0  0    0      0
      81     2     41    14    300    0    46    37  0    0     48
      82     0      0     0      0    0     0     0  0  484      0
      83     0      3     3    466    0     1     0  0    0      2
      84     0      0    46      0    0   145     0  0    0    283
      85     2     65    38    133    0    79    33  0    0    113
      86     0      0     0      0    0     0     0  0    0    457
      87     5     58    29    184    0    51    22  0    2    100
      88     4     44    26    230    0    43    39  0    0     55
      89     2    124    58    120    0    40    44  0    0     49
      90     1     35    41     28    0    37    17  0    0    275
      91     0      0     0      0  116     0     0  0  312      0
      92     5     62    38    172    0    36    71  0    0     36
      93     6     93    77    108    0    40    27  0    0     57
      94     0      0     0      0  183     0     0  0  211      0
      95     3     67    48    121    0    46    42  0    0     63
      96     2     44    21    197    0    32    28  0    0     65
      97     3     56    42     90    0    92    24  0    0     75
      98     8     40    30    152    0    40    44  0    0     58
      99     0      0     0      0    0   371     0  0    0      0
      100    0      0    39    236    0    95     0  0    0      0
      after

      Code:
      cds <- newCountDataSet(x,conditions=c("a","a","a",rep("b",7))
      cds <- estimateSizeFactors(cds)
      sizeFactors(cds)
      
       1            2            3            4            5            6            7            8            9           10 
       0.907312704 23.673935358  7.413062737 45.439328999  0.001330788 16.518310563  9.522291743  0.002994431  0.008935291 18.066151419
      So I don't understand where is the problem ...

      Comment


      • #4
        Maybe there are too many zeroes in your first data set. One thing to try is to call
        Code:
        cds <- estimateSizeFactors( cds, locfunc=genefilter::shorth )
        This might help. What kind of data is this anyway? It looks unusual.

        Comment


        • #5
          it's isomiRs from a same miRNA (so small rna sequencing)

          Comment


          • #6
            Originally posted by NicoBxl View Post
            it's isomiRs from a same miRNA (so small rna sequencing)
            I guess you get many miRNAs from each sample. You should leave all their isomiRs in one big count table, because if you only have 20 or so rows (and then most of them zero), this is not enough for size factor estimation.

            Comment


            • #7
              I had the same problem (with miRNA) and fixed it. The problem occurs when one or more samples was degraded, and thus the counts were exceptionally low (5x lower on average, with more zeroes). Removing those outlier samples from the matrix repaired the problem.

              Comment


              • #8
                Originally posted by Simon Anders View Post
                Maybe there are too many zeroes in your first data set. One thing to try is to call
                Code:
                cds <- estimateSizeFactors( cds, locfunc=genefilter::shorth )
                This might help. What kind of data is this anyway? It looks unusual.

                I am wondering what kind of influnce it will make if the dataset contains a lot of genes(12000 out of 30000) with two zero counts out of 3 biological replicates. And there are also a lot genes with less 10 counts in the 3 replicates.
                cds <- estimateSizeFactors( cds, locfunc=genefilter::shorth ) Is this the right way to deal with this kind of dataset? or what else should i do?
                PS: the GTF file is from UCSC table browser (track:UCSC genes,table: knowngenes).

                Another question, I got too many significant DE genes 20000+ out of 30000, Is this related to too many zeros in my dataset. If so, how to solve this? It would be better if you can explain a little bit how too many zeros in dataset could affect the DE results.

                Thanks in advance!
                Last edited by zhuya5607; 08-09-2012, 07:51 AM. Reason: Clarify question

                Comment


                • #9
                  Something seems to be seriously wrong here. Can you plot an excerpt from your count table and highlight a few of these genes with many zeroes that are significant. Also, which version of DESeq do you use?

                  Comment


                  • #10
                    Originally posted by Simon Anders View Post
                    Something seems to be seriously wrong here. Can you plot an excerpt from your count table and highlight a few of these genes with many zeroes that are significant. Also, which version of DESeq do you use?
                    I am using DESeq1.8.3, R 2.15.1
                    > head(H0H24countdata)
                    ID H0.R1 H0.R2 H0.R3 H24.R1 H24.R2 H24.R3
                    1 0 0 0 0 0 0
                    10 10 11 4 47 45 57
                    100 9 10 2 14 3 14
                    1000 0 0 0 0 0 0
                    10000 7 9 2 71 51 97
                    10001 24 15 14 79 87 114
                    > str(H0H24countdata)
                    'data.frame': 31227 obs. of 6 variables:
                    $ H0.R1 : int 0 10 9 0 7 24 52 117 24 354 ...
                    $ H0.R2 : int 0 11 10 0 9 15 45 108 12 347 ...
                    $ H0.R3 : int 0 4 2 0 2 14 21 75 9 222 ...
                    $ H24.R1: int 0 47 14 0 71 79 92 182 43 910 ...
                    $ H24.R2: int 0 45 3 0 51 87 100 147 40 753 ...
                    $ H24.R3: int 0 57 14 0 97 114 134 198 49 1112 ...

                    I used Excel to sort my data and found about 11000 obs. with all zeros, between 11000-15000, total counts of 6 rep for those genes are only 7.
                    Sequencing depth for H0 is about 14M, for H24 is about 24M.
                    see Dispersion plot(for H0) and log2foldchange plot in the attachment

                    Example: #NAME? is actually -Inf
                    id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj
                    18975 1.827030823 3.654061647 0 0 #NAME? 0.000598682 0.001533098
                    25670 1.702614384 3.405228769 0 0 #NAME? 0.000906012 0.002264972
                    23806 1.991799621 3.983599242 0 0 #NAME? 0.001457534 0.003528563
                    7183 1.666356283 3.332712566 0 0 #NAME? 0.002216659 0.005233984
                    11262 1.985726162 3.786047522 0.185404802 0.048970543 -4.351942006 0.002252883 0.005313618
                    25714 1.383241829 2.766483658 0 0 #NAME? 0.002466217 0.005778347
                    22910 1.992151895 0 3.984303791 Inf Inf 0.005458716 0.012116629
                    11812 1.977653923 0 3.955307845 Inf Inf 0.005480007 0.012159659
                    27308 1.977653923 0 3.955307845 Inf Inf 0.005480007 0.012159659
                    23814 1.960904512 0 3.921809024 Inf Inf 0.005504884 0.0122092
                    25351 1.960904512 0 3.921809024 Inf Inf 0.005504884 0.0122092
                    6698 1.946406539 0 3.892813079 Inf Inf 0.005526666 0.01225325
                    2873 1.883911773 0 3.767823545 Inf Inf 0.005623317 0.012447359
                    Attached Files
                    Last edited by zhuya5607; 08-10-2012, 12:55 AM. Reason: Clarify

                    Comment


                    • #11
                      1. Again: Which DESeq version are you using?

                      2. Tell us more about your experiment and its design. Your dispersion plot looks to good to be true, almost if these were technical replicates. I need to know about the biology behind this to tell you what is going on.

                      Comment


                      • #12
                        Originally posted by Simon Anders View Post
                        1. Again: Which DESeq version are you using?

                        2. Tell us more about your experiment and its design. Your dispersion plot looks to good to be true, almost if these were technical replicates. I need to know about the biology behind this to tell you what is going on.
                        I am using DESeq 1.8.3
                        I am comparing cell line in two different condition, untreated and treated, each has 3 biological replicates. H0 stands for untreated, H24 stands for treated after 24h.

                        Comment


                        • #13
                          This was a very short description of what you are doing. What kind of cell line, what kind of treatment, and what does "biological replicate" mean to you, i.e., which steps have you replicated?

                          Comment


                          • #14
                            Originally posted by Simon Anders View Post
                            This was a very short description of what you are doing. What kind of cell line, what kind of treatment, and what does "biological replicate" mean to you, i.e., which steps have you replicated?
                            Later, I asked the one who did the experiment, here I copied his reply:
                            24h after seeding A431 cell cultures in triplicates were treated with Gefitinib(2.5µM) or left untreated. Cells were harvested at 2h (treated and untreated), 6h and 24h after treatment by trypsination and washed in PBS.

                            Now, I only compare the untreated cells and cells harvested 24h after treatment.

                            Comment


                            • #15
                              Not sure what happened here but the notification email I just got says that you posted a different text, namely "It is A431 cell line, treated with EGFR inhibitor, Gefitinib. We replicated the mRNA extraction from the cells."

                              You "replicated the mRNA extraction"? So, you made two cell cultures, a treated and a control one, and then extracted from each culture three mRNA samples? This is no proper biological replication!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              23 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X