Seqanswers Leaderboard Ad

**super0925** · 06-02-2014, 07:14 AM

Originally posted by dpryan View Post

Well, you can't derive any information about reliability of the tools from this, you'd need to have known-DE genes and then see how well the tools find them. For the most part, the images are telling you about the similarity in methods, except for cuffdiff, which has more discordant than expected results (though perhaps it's the correct one, there's only one way to find out). I wouldn't recommend putting any more time in the comparisons, you won't get anything more informative out without performing validations on the findings.

Regarding post #132, yes, your understanding is correct.

Regarding post #133, note that the baseMean for genes with NA in all of the fields is 0. That should tell you why everything is NA. For genes with a p-value but no adjusted p-value, they were most likely filtered to increase power.

Hi D, Thank you! I will not make more effort on pipeline comparison.

Regarding post #133, for the outlier list (I think it is (res[which(idx=="TRUE"),]) , we found some genes have p-value but without adj-p value, you said I could filter them to increase power. I want to ask in which list to filter them?
If the list is the DE gene list, it is fine, because I only save the genes with adj-p value<0.05.
Or is this the list the "outlier list", which would be searched in the DE genes excluded by DESeq2 but within edgeR, and to observe edgeR is reliable or not?
Or I need to filter them before doing DE analysis?

Another question, the second list, that is the genes are not outlier, all of them baseMean are 0, is this normal?

Thank you!

**dpryan** · 06-02-2014, 07:25 AM

Have a read through section 1.4.2 (I think) of the DESeq2 vignette.

Originally posted by super0925 View Post

For the outlier list , we found some genes have p-value but without adj-p value, you said I could filter them. I want to ask what list to filter?

You misunderstood, those genes were already filtered for power, which is why there's no adjusted p-value but there is a raw p-value. You're just comparing the list of DE genes anyway, so that's fine.

[QUOTE]Or is this the list the "outlier list", which would be searched in the DE genes excluded by DESeq2 but within edgeR, and to observe edgeR is reliable or not?[QUOTE]

If both the adjusted AND raw p-value are NA, then there was at least one likely outlier sample for that gene, so it was filtered for that reason. If edgeR and the others call those DE then you should look closer at the data to determine if DESeq2 is doing things correctly or not.

[QUOTE]Another question, the second list, that is the genes are not outlier, all of them baseMean are 0, is this normal?[QUOTE]

As I mentioned, the baseMean of 0 should tell you something. Look at the raw counts for those, they'll be ignored by all of the tools.

**super0925** · 06-02-2014, 07:43 AM

[QUOTE=dpryan;141837]Have a read through section 1.4.2 (I think) of the DESeq2 vignette.

You misunderstood, those genes were already filtered for power, which is why there's no adjusted p-value but there is a raw p-value. You're just comparing the list of DE genes anyway, so that's fine.

[QUOTE]Or is this the list the "outlier list", which would be searched in the DE genes excluded by DESeq2 but within edgeR, and to observe edgeR is reliable or not?[QUOTE]

If both the adjusted AND raw p-value are NA, then there was at least one likely outlier sample for that gene, so it was filtered for that reason. If edgeR and the others call those DE then you should look closer at the data to determine if DESeq2 is doing things correctly or not.

[QUOTE]Another question, the second list, that is the genes are not outlier, all of them baseMean are 0, is this normal?

As I mentioned, the baseMean of 0 should tell you something. Look at the raw counts for those, they'll be ignored by all of the tools.

Sorry Devon, I am sorry I am confused. Which list do I need to compare with edgeR/Cuffdiff ? i.e., how many genes in the list are also in the "special DE gene list ", which only predicted by edgeR/Cuffdiff.
The percentage may represnt the reliable of that method, as you mentioned.

Which list in section 4.3 of DESeq2 vignette or in post #133?
res[which(idx=="TRUE"),] or res[which(idx=="FALSE"),]

**dpryan** · 06-02-2014, 12:48 PM

I'll just quote from the vignette, which should be clear enough:

Note that some values in the results table can be set to NA, for either one of the following reasons:

If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA.
If a row contains a sample with an extreme count outlier then the p value and adjusted p value are set to NA. These outlier counts are detected by Cook's distance. Customization of this outlier filtering and description of functionality for replacement of outlier counts and refitting is described in Section 3.5.
If a row is ltered by automatic independent filtering, based on low mean normalized count, then only the adjusted p value is set to NA. Description and customization of independent filtering is described in Section 3.8.

These wouldn't be significant with any of the tests.
If edgeR/etc. find these to be DE, then be cautious believing that.
These are filtered to increase power.

**super0925** · 06-03-2014, 02:35 AM

Originally posted by dpryan View Post

I'll just quote from the vignette, which should be clear enough:

These wouldn't be significant with any of the tests.
If edgeR/etc. find these to be DE, then be cautious believing that.
These are filtered to increase power.

Hi Devon
Thank you for your explantion.
(1)My unstanding is I don't need to consider about the genes with p-value or adj-pvalue are set to 'NA'. All of them could be filtered by package. Am I right?
(2)But I still confused which 'DE list' I need to compare with edgeR/etc. I mean the "If edgeR/etc. find these to be DE, then be cautious believing that."
Is that the first list in #133 , res[which(idx=="TRUE")?
Or all the genes with P-value or adj-p value set to NA?
Thanks a lot!

**dpryan** · 06-03-2014, 03:42 AM

Originally posted by super0925 View Post

Hi Devon
Thank you for your explantion.
(1)My unstanding is I don't need to consider about the genes with p-value or adj-pvalue are set to 'NA'. All of them could be filtered by package. Am I right?

No, if only the raw and adjusted p-values are NA, then these would fall into #2 of the section I quoted from the vignette.

(2)But I still confused which 'DE list' I need to compare with edgeR/etc.

See above.

res[which(idx=="TRUE")

These are just genes for which there's a count in at least one sample.

**super0925** · 06-17-2014, 01:37 AM

Originally posted by dpryan View Post

There's no way to judge accuracy from a Venn diagram. Which version of cufflinks did you use? Lately it tends to be more conservative than the others, so that seems off. What often happens is that the differences (e.g., DESeq2 vs. edgeR) are toward the margins of significance, where you get an adjusted p-value of 0.08 in DESeq2 and 0.11 in edgeR (or vice versa), which isn't surprising. One thing to check is if DESeq2 flagged a number of the edgeR/cuffdiff only genes as having outlier samples. This is a really nice feature and can help avoid false-positive findings.

Hi D
Just a quick question about Cuffdiff.
As we know we selected the significant DE genes in Cuffdiff by FDR Q-value< 0.05. But if I still think it is too liberal, could we have more conservative threshold? As you know , P or Q -value = 0.05 is a well known threshold.
Could we add log 2 fold-change as another threshold as well? which level do you prefer ?
Cheers

**dpryan** · 06-17-2014, 01:43 AM

Sure, you can use whatever thresholds you want. A FDR of 0.1 is the typical threshold, but of course that still gives you ~10% false positives. If you wanted to use 0.01 or something else then there's nothing innately wrong with that. Using a fold-change threshold is occasionally done. It's certainly the case that a 5% change is unlikely to be biologically meaningful for most genes, whereas a 50% change likely is, so you'll occasionally see 1.5x or 2x thresholds used.

**super0925** · 06-18-2014, 08:51 AM

Thank you D.
I got it.
Another question, suppose a gene has 200 reads counts mapped, how many reads out of 200 reads has the overlap with introns? How do I know that? Do I need to remove these 'intron overlap' reads before doing DE analysis ?
Cheers

**dpryan** · 06-18-2014, 01:37 PM

I assume that the reads would only partly overlap an intron, since otherwise they wouldn't normally get counted. I wouldn't recommend removing them. While one could argue that they represent unprocessed RNAs, which you aren't interested in, they may also just represent the difficulty of mapping near splice boundaries and, in any case, would be presumed to be present at similar levels across samples in either case.

**super0925** · 06-19-2014, 12:50 AM

Originally posted by dpryan View Post

I assume that the reads would only partly overlap an intron, since otherwise they wouldn't normally get counted. I wouldn't recommend removing them. While one could argue that they represent unprocessed RNAs, which you aren't interested in, they may also just represent the difficulty of mapping near splice boundaries and, in any case, would be presumed to be present at similar levels across samples in either case.

Thank you D.
1.
I will not remove 'partly overlap' intron reads but if I want to know the propotion of these 'partly overlap' intron reads, how could I do that?

2.
for the 'full overlap' intron reads, is that equal to unmapped reads? Am I right?

**dpryan** · 06-19-2014, 04:20 AM

1. At least with htseq-count, the -m intersection_strict wouldn't count a read that overhangs a feature (i.e., overlaps an exon but continues into an intron). So you could use that.
2. They'll be mapped, but not counted.

**super0925** · 06-19-2014, 04:57 AM

Originally posted by dpryan View Post

1. At least with htseq-count, the -m intersection_strict wouldn't count a read that overhangs a feature (i.e., overlaps an exon but continues into an intron). So you could use that.
2. They'll be mapped, but not counted.

Thank you D,
This is very useful. My supervisor asked me for statistic of how many reads counts are partly or totally overlap with introns in each genes.
sth. like
gene1 counts 200 counts overlap introns 50
...

I will try what you suggest to see the result.

Another question, if I want to know how many reads are not mRNA (e.g. ribosome RNA), do you have any suggestion to do that?thank you!

**dpryan** · 06-19-2014, 05:05 AM

In general, you'll need an annotation file with rRNA, tRNA, etc. in there. Then you can count according to that. I believe that both htseq-count and featureCounts should allow that. For htseq-count, you'd probably need to change the -t option, I don't recall what the equivalent is in featureCounts.

**super0925** · 07-02-2014, 12:36 PM

Originally posted by dpryan View Post

In general, you'll need an annotation file with rRNA, tRNA, etc. in there. Then you can count according to that. I believe that both htseq-count and featureCounts should allow that. For htseq-count, you'd probably need to change the -t option, I don't recall what the equivalent is in featureCounts.

Hi another quick question, suppose my library contains 300 human genes and luciferase mRNA, could I check the expression level of this luciferase ? Thank you!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News