Seqanswers Leaderboard Ad

**Simon Anders** · 05-02-2013, 02:22 PM

Usually, this is the effect of many genes with small count values. Maybe you have a lot of genes with, say, 3 reads in total over all replicates from group A, and 1 read in all replicates in group B, and this ratio always gives exactly the same p value. Plotting p values against total read counts (i.e., against the row sums of the count matrix) is often helpful to understand such histograms.

So, no, this peak is not that unusual and will not explain why you no significance in your results.

**syintel87** · 05-02-2013, 02:34 PM

Thank you for the prompt reply.

**syintel87** · 05-02-2013, 02:41 PM

Thank you for the prompt reply.

I did a pairwise exact test, using the command, exactTest( data , pair=c("T1", "T2") , dispersion = "tagwise" ). Then, I generated p-value distribution of this test. The attachment is a part of table including p-value in the first column and read counts in other columns.

Would you please have a look at the attached file?

Attached Files

pval_tbl.jpg (66.5 KB, 40 views)

**syintel87** · 05-02-2013, 03:14 PM

Originally posted by Simon Anders View Post

Usually, this is the effect of many genes with small count values. Maybe you have a lot of genes with, say, 3 reads in total over all replicates from group A, and 1 read in all replicates in group B, and this ratio always gives exactly the same p value. Plotting p values against total read counts (i.e., against the row sums of the count matrix) is often helpful to understand such histograms.

So, no, this peak is not that unusual and will not explain why you no significance in your results.

I did the work that you mentioned, summing counts over replicates within each group. But it does not seem that genes with the same ratio of one sum to another have the same p-value. Would you please give me a piece of advice about what is wrong with the attached table that I generated? Also, for the genes having p-value in the range of (0.70,0.71), do you think some particular pattern is observed?

Thank you in advance.

Attached Files

**syintel87** · 05-02-2013, 04:03 PM

Originally posted by Simon Anders View Post

Usually, this is the effect of many genes with small count values. Maybe you have a lot of genes with, say, 3 reads in total over all replicates from group A, and 1 read in all replicates in group B, and this ratio always gives exactly the same p value. Plotting p values against total read counts (i.e., against the row sums of the count matrix) is often helpful to understand such histograms.

So, no, this peak is not that unusual and will not explain why you no significance in your results.

I've heard that ideally p-value distribution should look like the attached file.
Even though I can see why and how some genes have the same p-value, if there is an extreme peak, my p-value distribution is biased from the ideal one.
Then, how can I explain this sudden peak with some biological insight (e.g. correlation between genes or something) rather than with just mathematical formula or calculation?

Attached Files

ideal_pval.jpg (12.3 KB, 31 views)

**syintel87** · 05-02-2013, 07:07 PM

Originally posted by Simon Anders View Post

Usually, this is the effect of many genes with small count values. Maybe you have a lot of genes with, say, 3 reads in total over all replicates from group A, and 1 read in all replicates in group B, and this ratio always gives exactly the same p value. Plotting p values against total read counts (i.e., against the row sums of the count matrix) is often helpful to understand such histograms.

So, no, this peak is not that unusual and will not explain why you no significance in your results.

I drew plots
1) by setting p-value on x-axis and counts on y-axis.
2) by setting p-value on x-axis and logCPM on y-axis.

I wonder
1) why values which are very very close to 0 on y-axis have domain from 0 to 0.8 on x-axis. That is, Genes that have ratio that is close to 0 are dispersed over p-values. This implies that genes that have pretty distinguished count reads have diverse p-values. But, I think if ratio of one sum to another is close to 0, those genes are expected to have low p-values.
2) why this plot has positive relationship at bottom part and negative relationship at upper part. This implies that even a gene with ratio close to 1 could have very low p-value.

I am so curious about this plot.
I would really appreciate any tips on interpretation of this plot.
Thank you in advance.

Attached Files

**Simon Anders** · 05-02-2013, 11:52 PM

I don't think making many plots of p values will help you. What's up with your library sizes? It seems that most genes in T2 have only a very few counts.

**syintel87** · 05-03-2013, 04:50 AM

Originally posted by Simon Anders View Post

I don't think making many plots of p values will help you. What's up with your library sizes? It seems that most genes in T2 have only a very few counts.

1.
The reason for making multiple plots is to gain high resolution. Pictures should be assembled in order to make one complete figure. There is positive relationship in the range (0,0.001) on y-axis and negative relationship in the range (0.001,1.000) on y-axis.

2.
The table below is about library size.
group * lib.size * norm.factors
T1 * 22705534 * 10.53656319
T1 * 24463594 * 8.27944152
T2 * 11440163 * 0.01852953
T2 * 178857 * 1.23101359
T3 * 2232541 * 0.28527335
T3 * 90552 * 4.29918424
T3 * 855331 * 0.40975614

What insight could I have from "plot of p-value & count" and "p-value distribution"?

Always thank you.

Attached Files

pval_count4.jpg (34.0 KB, 28 views)

**Simon Anders** · 05-03-2013, 04:57 AM

You have hardly any useful reads for T2! It seems to less about two orders of magnitude less than in T1.
I doubt that you have enough data on T2 to perform any inference.

**syintel87** · 05-03-2013, 05:03 AM

Originally posted by Simon Anders View Post

You have hardly any useful reads for T2! It seems to less about two orders of magnitude less than in T1.
I doubt that you have enough data on T2 to perform any inference.

The reason why there exists too small amount of reads is that this is data about infecting worm. So data at time point T1 is extracted at worm's egg stage, while data at time point T2 is extracted from infected host at the next time point. This might have caused small library size.

So you mean I cannot have significant conclusion with this data set?

**Simon Anders** · 05-03-2013, 05:07 AM

Why does this mean that you get less reads? The number of reads you get from a sequencing lane is typically independent of the sample.

Or do you mean that in T2, the vast majority of your reads map to the host, and only a percent or so map to the worm genome? You know this is the kind of details you should mention when you start such a thread.

Also, why are the two T2 samples so different? (Once high library size but low normalization factor, once the other way round.)

**syintel87** · 05-03-2013, 05:14 AM

Thank you so much for advice.

**syintel87** · 05-03-2013, 05:15 AM

Originally posted by Simon Anders View Post

Why does this mean that you get less reads? The number of reads you get from a sequencing lane is typically independent of the sample.

Or do you mean that in T2, the vast majority of your reads map to the host, and only a percent or so map to the worm genome? You know this is the kind of details you should mention when you start such a thread.

Also, why are the two T2 samples so different? (Once high library size but low normalization factor, once the other way round.)

1. Yes, I meant that in T2, the vast majority of your reads map to the host, and only a percent or so map to the worm genome. I am so sorry for not having mentioned that.

2. Actually, I have seven time points, egg, juvenile, t1, t2, t3, t4, and t5.
Since there were no replicates, I assigned
- (egg and juvenile) to T1,
- (t1 and t2) to T2,
- (t3, t4, t5) to T3.
I guess this might have caused so different library size.
Am I doing with wrong approach?

**Simon Anders** · 05-03-2013, 05:20 AM

If for most of your time points you have only a few ten thousand usable reads, because most of your reads were used up by host mRNA, you may have to little usable data and the thing to do might be to improve on the wet-lab side by finding a better way to separate worm from host tissue.

If you want to go on working with your data to see whether you can see at least a few things, maybe start by making scatter plot or raw counts (each sample against each other sample).

You pseudo-replication scheme is not that good, either: The general idea of inference in a two group comparison is to find genes that show stronger differences between groups than within groups. So, you are now looking for genes that change more strongly between T1 and T2 than between either egg and juvenile or between t1 and t2. Why should the changes from juvenile to t2 be stronger than from egg to juvenile?

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

[EdgeR Analysis] P-value Distribution

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News