Hello everyone!
We are cooperating with an Institute that performed Illumina sequencing (HiSeq3000) for our RNA samples. They normalized and annotated the data using CLC Genomics Workbench 9. In the end, we received an Excel table containing the name of the gene, expression count, p, FDR, Bon, fold change and RPKM value.
I wrote an R script to make a volcano plot (log2FC on the x-axis, -log10p on the y axis).
The issues:
(1) Turns out that roughly 66% of our genes have a p value of 1. I excluded these genes as they are plotted on the x-axis (log2(1)=0). Is it okay to pre-filter data for a volcano plot or do people usually plot the whole data set?
(2) Another roughly 180 genes have a p-value of exactly 0. As I cannot calculate the logarithm of value 0, I first wanted to replace the zeros with the second smallest p value available in my dataset. However, as there are so many genes with p=0, it is hard to randomly assign a small p value without creating a suspicious pattern of dots in my plot. How do people plot genes with p=0?
(3) We figured that maybe the p=0 and p=1 values are rounded values that appear when they ask the software to create an Excel file. Could that be possible?
Our collaborator claims that none of the values are rounded. Yet, when they ask their automated software (CLC Genomics Workbench) to create a volcano plot, it looks normal, without any horizontal lines.
Any input is greatly appreciated!!
Best wishes
DCseq
We are cooperating with an Institute that performed Illumina sequencing (HiSeq3000) for our RNA samples. They normalized and annotated the data using CLC Genomics Workbench 9. In the end, we received an Excel table containing the name of the gene, expression count, p, FDR, Bon, fold change and RPKM value.
I wrote an R script to make a volcano plot (log2FC on the x-axis, -log10p on the y axis).
The issues:
(1) Turns out that roughly 66% of our genes have a p value of 1. I excluded these genes as they are plotted on the x-axis (log2(1)=0). Is it okay to pre-filter data for a volcano plot or do people usually plot the whole data set?
(2) Another roughly 180 genes have a p-value of exactly 0. As I cannot calculate the logarithm of value 0, I first wanted to replace the zeros with the second smallest p value available in my dataset. However, as there are so many genes with p=0, it is hard to randomly assign a small p value without creating a suspicious pattern of dots in my plot. How do people plot genes with p=0?
(3) We figured that maybe the p=0 and p=1 values are rounded values that appear when they ask the software to create an Excel file. Could that be possible?
Our collaborator claims that none of the values are rounded. Yet, when they ask their automated software (CLC Genomics Workbench) to create a volcano plot, it looks normal, without any horizontal lines.
Any input is greatly appreciated!!
Best wishes
DCseq
Comment