Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    DESeq

    I have a table of contants in excell containing the name of the genes, the conditions and the number of reads per each gene in each consition. How can I use this data in DESeq packcage? How I put the table in R?

    Comment


    • #32
      Originally posted by cascoamarillo View Post

      So it has been removed from the new version, or what does it mean?

      Thanks
      Yes, it has been replaced

      Comment


      • #33
        Originally posted by roseadele View Post
        I have a table of contants in excell containing the name of the genes, the conditions and the number of reads per each gene in each consition. How can I use this data in DESeq packcage? How I put the table in R?
        Googling "read data from Excel R" gives me 136 millions answers, and the first ten looked clear and simple. The rest of the information is found in the DESeq manual (search for "Analysing RNA-Seq data with the "DESeq" package"), which is nicely written with clear examples.
        You might need a R tutorial if you are not familiar with it; you could start here: http://cran.r-project.org/doc/manuals/R-intro.html.

        Comment


        • #34
          DESeq w/o replicates - padj

          Hi New to RNASeq.

          We are looking at data w/o replicates (bad I know, but $$ prohibited).
          Can someone explain how I interpret padj values =1. I believe this is a measure of FDR type I error?

          In the data below, we appear to have 4 genes that are significantly DE?
          I know that w/o replicates we are underestimating the true DE discovery..

          Charles

          deseq_id gene_counts(nano) gene_counts(ctrl) baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj
          9600 174 13 83.74641874 16.32121019 151.1716273 9.262280527 3.211367453 0.000890382 0.740886757
          10604 227 19 110.5361169 23.85407643 197.2181574 8.267692025 3.047484649 0.001206005 0.771936026
          8063 593 59 294.6365205 74.07318469 515.1998562 6.955281569 2.79810892 0.001591703 0.88218547
          9821 245 23 120.8662944 28.87598725 212.8566016 7.371405167 2.881939658 0.001793433 0.88218547
          680 61 4 29.00943031 5.021910827 52.9969498 10.55314434 3.399601013 0.002231307 1
          8550 402 44 202.2498031 55.24101909 349.2585872 6.322450109 2.660483748 0.002796612 1

          Comment


          • #35
            No, you have nothing.

            An FDR of 0.1 (i.e., 10%), for example, means that your gene list contains at most an estimated 10% of false positives. To get such a list, you take all genes with padj<.1.

            Thus, padj=1 means that you cannot include the gene even if you are willing to accept 99% false positives.

            What I never understand is why people claim that lack of money precluded them from doing replicates. First, you now have wasted all the money you paid for the sequencing run, because without replicates it is highly unlikely to ever get useful results.

            Second, while it may have been expensive to obtain replicate samples it is not expensive to sequence additional samples. After all, having twice as many samples does not mean that you need to use twice as many lanes. You simply use multiplexing to sequence each sample to only half the depth and still get more statistical power than with fewer samples at more depth. The only extra expense is the additional library prep kits, not the sequencing itself.

            Comment


            • #36
              No replicates

              Duly noted.

              c

              Comment


              • #37
                Originally posted by Simon Anders View Post
                The purpose of the 'blind' method was never to offer a proper analysis method for experiments without replication, because is is simply not possible (not just "dangerous") to get conclusions. The whole point of replicates is to allow you to draw the line for significance, i.e., to know how much fold change you need to see to consider an effect real. Without replicates, you can guess, of course, but it has to be a wild guess, unless you are happy with the extremely over-careful guess that e.g. the "blind" method give you.
                Is the "guess work" similar to what cuff-diff when replicates are not provided. There seems to be some mathematical modeling that cuff-diff does that I don't completely understand. Is the method 'blind' for a non statistical person to understand mentioned anywhere?

                Comment


                • #38
                  If no replicates are provided, there is no way to know the real biological variability, and hence there are at least two options:

                  (i) You can ignore the issue by (implicitly) postulating the biological variance to be zero. Unfortunately, this is the option most commonly chosen in the literature, despite the fact that it is clearly untenable and will lead to nearly all strongly expressed genes being called differentially expressed if you have sequenced deeply. Cuffdiff, in the versions described in the papers, also suffered from this flaw, but I don't know what the current version does. A way to find out might be to compare if you get more or less hits if you apply the tool of your choice first on a dataset with replicates and then on only two samples from this dataset, one from each treatment group. If you get more significant hits with less data, this would hint at biological variation not being properly accounted for.

                  (ii) If you think that only very few genes are differentially expressed, you can pretend that your two samples are replicates with respect to the majority of genes, and use this to assess variability. You might strongly overestimate variance that way and dramatically lose power. In other words: you only consider those genes as differentially expressed that differ so much more between the two samples than nearly all other samples that they "stick out" very prominently. This is what DESeq's "blind" approach attempts. Obviously, you typically only get very few hits this way, and even these could be just fluke findings. See the vignette and the paper for details.

                  Wu et al. (BMC Bioinformatics 2010, 11:564) tried to find a middle ground here but I have not heard about any practical experiences with their approach. Anybody here tried that?

                  Comment


                  • #39
                    prb with DESeq with estimateVarianceFunctions

                    Originally posted by Simon Anders View Post
                    Start R, load DESeq, and type "?estimateVarianceFunctions". If you don't see anything there about 'metho', you have an old DESeq version.

                    Simon
                    Hey Simon,

                    I try to use DESeq. "?estimateVarianceFunctionse" give me :
                    ...
                    "Usage:

                    estimateVarianceFunctions(cds, method = c( "normal", "blind", "pooled" ),
                    pool = NULL, locfit_extra_args = list(), lp_extra_args = list(),
                    modelFrame = NULL )"

                    but when I use it, I obtain a error message :
                    "cds <- estimateVarianceFunctions(cds,method="blind")
                    Erreur : tentative d'appliquer un objet qui n'est pas une fonction"

                    So I don't understand why. Before this line, I do
                    cds <- newCountDataSet(countsTable,conds)

                    cds <- estimateSizeFactors(cds)

                    and it's works but not this method "estimateVarianceFunctions".

                    Can you help me please ?

                    Thanks

                    Comment


                    • #40
                      You must have managed to override the definition of estimateVarianceFunctions further up in your session. Independent of that, please update to a current version of R and Bioconductor.

                      Comment


                      • #41
                        If I am trying to find differentially expressed genes between say tumor and relapse samples and I have 3 samples each for tumor and relapse patients. Can I group 3 tumor patients as replicates and do the same for relapse samples to get the differentially expressed genes between tumor and relapse cases?

                        Would such grouping cause any weird results to inaccurate variance estimation that would result from 1)biological noise 2) between sample variation

                        Comment


                        • #42
                          Sure, it is correct to group in this manner. Of course, you will not get any results due to the high between-group variance, but I guess you know that there is no chance of finding differences between tumour types with so few samples.

                          Comment


                          • #43
                            Hi i am using DEseq with no replicates

                            > conds <- factor( c( "A-Mock", "A-Infect", "B-Mock", "B-infect" ) )

                            i need to compare the diff expression between A-mock and A infect similarly B mock B infected. It doesnt seem to work i am using

                            res <- nbinomTest( cds, "A-mock", "A-infect", )
                            > res <- nbinomTest( cds, "B-mock", "B-infect", )

                            but at the end i am getting only one p value. How to solve this problem. Please help

                            Comment


                            • #44
                              Those are two different tests, but you are overwriting the first reslt object (res) with the second.

                              res <- nbinomTest( cds, "A-mock", "A-infect", )
                              res <- nbinomTest( cds, "B-mock", "B-infect", ) # replaces the above result with the new one

                              Call one resA and the other resB for example. In the end you might want to merge the two for comparison or just write out both as separate tables.

                              resA <- nbinomTest( cds, "A-mock", "A-infect", )
                              resB <- nbinomTest( cds, "B-mock", "B-infect", )

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X