Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by dpryan View Post
    Well, you can't derive any information about reliability of the tools from this, you'd need to have known-DE genes and then see how well the tools find them. For the most part, the images are telling you about the similarity in methods, except for cuffdiff, which has more discordant than expected results (though perhaps it's the correct one, there's only one way to find out). I wouldn't recommend putting any more time in the comparisons, you won't get anything more informative out without performing validations on the findings.

    Regarding post #132, yes, your understanding is correct.

    Regarding post #133, note that the baseMean for genes with NA in all of the fields is 0. That should tell you why everything is NA. For genes with a p-value but no adjusted p-value, they were most likely filtered to increase power.
    Hi D, Thank you! I will not make more effort on pipeline comparison.

    Regarding post #133, for the outlier list (I think it is (res[which(idx=="TRUE"),]) , we found some genes have p-value but without adj-p value, you said I could filter them to increase power. I want to ask in which list to filter them?
    If the list is the DE gene list, it is fine, because I only save the genes with adj-p value<0.05.
    Or is this the list the "outlier list", which would be searched in the DE genes excluded by DESeq2 but within edgeR, and to observe edgeR is reliable or not?
    Or I need to filter them before doing DE analysis?

    Another question, the second list, that is the genes are not outlier, all of them baseMean are 0, is this normal?

    Thank you!
    Last edited by super0925; 06-02-2014, 07:26 AM.

    Comment


    • Have a read through section 1.4.2 (I think) of the DESeq2 vignette.

      Originally posted by super0925 View Post
      For the outlier list , we found some genes have p-value but without adj-p value, you said I could filter them. I want to ask what list to filter?
      You misunderstood, those genes were already filtered for power, which is why there's no adjusted p-value but there is a raw p-value. You're just comparing the list of DE genes anyway, so that's fine.

      [QUOTE]Or is this the list the "outlier list", which would be searched in the DE genes excluded by DESeq2 but within edgeR, and to observe edgeR is reliable or not?[QUOTE]

      If both the adjusted AND raw p-value are NA, then there was at least one likely outlier sample for that gene, so it was filtered for that reason. If edgeR and the others call those DE then you should look closer at the data to determine if DESeq2 is doing things correctly or not.

      [QUOTE]Another question, the second list, that is the genes are not outlier, all of them baseMean are 0, is this normal?[QUOTE]

      As I mentioned, the baseMean of 0 should tell you something. Look at the raw counts for those, they'll be ignored by all of the tools.

      Comment


      • [QUOTE=dpryan;141837]Have a read through section 1.4.2 (I think) of the DESeq2 vignette.



        You misunderstood, those genes were already filtered for power, which is why there's no adjusted p-value but there is a raw p-value. You're just comparing the list of DE genes anyway, so that's fine.

        [QUOTE]Or is this the list the "outlier list", which would be searched in the DE genes excluded by DESeq2 but within edgeR, and to observe edgeR is reliable or not?[QUOTE]

        If both the adjusted AND raw p-value are NA, then there was at least one likely outlier sample for that gene, so it was filtered for that reason. If edgeR and the others call those DE then you should look closer at the data to determine if DESeq2 is doing things correctly or not.

        [QUOTE]Another question, the second list, that is the genes are not outlier, all of them baseMean are 0, is this normal?

        As I mentioned, the baseMean of 0 should tell you something. Look at the raw counts for those, they'll be ignored by all of the tools.
        Sorry Devon, I am sorry I am confused. Which list do I need to compare with edgeR/Cuffdiff ? i.e., how many genes in the list are also in the "special DE gene list ", which only predicted by edgeR/Cuffdiff.
        The percentage may represnt the reliable of that method, as you mentioned.

        Which list in section 4.3 of DESeq2 vignette or in post #133?
        res[which(idx=="TRUE"),] or res[which(idx=="FALSE"),]

        Comment


        • I'll just quote from the vignette, which should be clear enough:

          Note that some values in the results table can be set to NA, for either one of the following reasons:
          1. If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA.
          2. If a row contains a sample with an extreme count outlier then the p value and adjusted p value are set to NA. These outlier counts are detected by Cook's distance. Customization of this outlier fi ltering and description of functionality for replacement of outlier counts and refi tting is described in Section 3.5.
          3. If a row is ltered by automatic independent filtering, based on low mean normalized count, then only the adjusted p value is set to NA. Description and customization of independent filtering is described in Section 3.8.
          1. These wouldn't be significant with any of the tests.
          2. If edgeR/etc. find these to be DE, then be cautious believing that.
          3. These are filtered to increase power.

          Comment


          • Originally posted by dpryan View Post
            I'll just quote from the vignette, which should be clear enough:


            1. These wouldn't be significant with any of the tests.
            2. If edgeR/etc. find these to be DE, then be cautious believing that.
            3. These are filtered to increase power.
            Hi Devon
            Thank you for your explantion.
            (1)My unstanding is I don't need to consider about the genes with p-value or adj-pvalue are set to 'NA'. All of them could be filtered by package. Am I right?
            (2)But I still confused which 'DE list' I need to compare with edgeR/etc. I mean the "If edgeR/etc. find these to be DE, then be cautious believing that."
            Is that the first list in #133 , res[which(idx=="TRUE")?
            Or all the genes with P-value or adj-p value set to NA?
            Thanks a lot!
            Last edited by super0925; 06-03-2014, 02:47 AM.

            Comment


            • Originally posted by super0925 View Post
              Hi Devon
              Thank you for your explantion.
              (1)My unstanding is I don't need to consider about the genes with p-value or adj-pvalue are set to 'NA'. All of them could be filtered by package. Am I right?
              No, if only the raw and adjusted p-values are NA, then these would fall into #2 of the section I quoted from the vignette.

              (2)But I still confused which 'DE list' I need to compare with edgeR/etc.
              See above.

              res[which(idx=="TRUE")
              These are just genes for which there's a count in at least one sample.

              Comment


              • Originally posted by dpryan View Post
                There's no way to judge accuracy from a Venn diagram. Which version of cufflinks did you use? Lately it tends to be more conservative than the others, so that seems off. What often happens is that the differences (e.g., DESeq2 vs. edgeR) are toward the margins of significance, where you get an adjusted p-value of 0.08 in DESeq2 and 0.11 in edgeR (or vice versa), which isn't surprising. One thing to check is if DESeq2 flagged a number of the edgeR/cuffdiff only genes as having outlier samples. This is a really nice feature and can help avoid false-positive findings.
                Hi D
                Just a quick question about Cuffdiff.
                As we know we selected the significant DE genes in Cuffdiff by FDR Q-value< 0.05. But if I still think it is too liberal, could we have more conservative threshold? As you know , P or Q -value = 0.05 is a well known threshold.
                Could we add log 2 fold-change as another threshold as well? which level do you prefer ?
                Cheers

                Comment


                • Sure, you can use whatever thresholds you want. A FDR of 0.1 is the typical threshold, but of course that still gives you ~10% false positives. If you wanted to use 0.01 or something else then there's nothing innately wrong with that. Using a fold-change threshold is occasionally done. It's certainly the case that a 5% change is unlikely to be biologically meaningful for most genes, whereas a 50% change likely is, so you'll occasionally see 1.5x or 2x thresholds used.

                  Comment


                  • Thank you D.
                    I got it.
                    Another question, suppose a gene has 200 reads counts mapped, how many reads out of 200 reads has the overlap with introns? How do I know that? Do I need to remove these 'intron overlap' reads before doing DE analysis ?
                    Cheers
                    Last edited by super0925; 06-18-2014, 09:23 AM.

                    Comment


                    • I assume that the reads would only partly overlap an intron, since otherwise they wouldn't normally get counted. I wouldn't recommend removing them. While one could argue that they represent unprocessed RNAs, which you aren't interested in, they may also just represent the difficulty of mapping near splice boundaries and, in any case, would be presumed to be present at similar levels across samples in either case.

                      Comment


                      • Originally posted by dpryan View Post
                        I assume that the reads would only partly overlap an intron, since otherwise they wouldn't normally get counted. I wouldn't recommend removing them. While one could argue that they represent unprocessed RNAs, which you aren't interested in, they may also just represent the difficulty of mapping near splice boundaries and, in any case, would be presumed to be present at similar levels across samples in either case.
                        Thank you D.
                        1.
                        I will not remove 'partly overlap' intron reads but if I want to know the propotion of these 'partly overlap' intron reads, how could I do that?

                        2.
                        for the 'full overlap' intron reads, is that equal to unmapped reads? Am I right?
                        Last edited by super0925; 06-19-2014, 12:59 AM.

                        Comment


                        • 1. At least with htseq-count, the -m intersection_strict wouldn't count a read that overhangs a feature (i.e., overlaps an exon but continues into an intron). So you could use that.
                          2. They'll be mapped, but not counted.

                          Comment


                          • Originally posted by dpryan View Post
                            1. At least with htseq-count, the -m intersection_strict wouldn't count a read that overhangs a feature (i.e., overlaps an exon but continues into an intron). So you could use that.
                            2. They'll be mapped, but not counted.
                            Thank you D,
                            This is very useful. My supervisor asked me for statistic of how many reads counts are partly or totally overlap with introns in each genes.
                            sth. like
                            gene1 counts 200 counts overlap introns 50
                            ...

                            I will try what you suggest to see the result.


                            Another question, if I want to know how many reads are not mRNA (e.g. ribosome RNA), do you have any suggestion to do that?thank you!

                            Comment


                            • In general, you'll need an annotation file with rRNA, tRNA, etc. in there. Then you can count according to that. I believe that both htseq-count and featureCounts should allow that. For htseq-count, you'd probably need to change the -t option, I don't recall what the equivalent is in featureCounts.

                              Comment


                              • Originally posted by dpryan View Post
                                In general, you'll need an annotation file with rRNA, tRNA, etc. in there. Then you can count according to that. I believe that both htseq-count and featureCounts should allow that. For htseq-count, you'd probably need to change the -t option, I don't recall what the equivalent is in featureCounts.
                                Hi another quick question, suppose my library contains 300 human genes and luciferase mRNA, could I check the expression level of this luciferase ? Thank you!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                18 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                16 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                47 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X