hi all
I have an RNA_Seq dataset (FPKM values) obtained from the ENCODE Caltech datasets for K562 cells.
I have 6 different lists of genes and I want to show that the expression values of my panel of genes is not random...therefore I decided to use R and perform bootstrapping.
bootstrap, for what I know, requires normally distributed data. my RNA-Seq data are not normally distributed and they are strongly skewed on the left.
I used BoxCox transformation
to normalize them.
I obtained a negative value (-0.03701075 to be precise) and raised my FPKM values to that. I then used
to obtain the z-scores and see if the z-scores of my genes are bigger/smaller than +/-1.96.
that's because I am interested to see if my panel of genes is more highly expressed -or it has a lower expression- than what you would expect by chance.
However...and here's the problem...the values with the HIGHEST FPKM became those with the LOWEST z-score and vice versa since I elevated to a negative number...I understand why this happens mathematically, however, I don't really know how to handle the bootstrap now because the data show the exact opposite of what I would reasonably expect!
so...how would you guys handle bootstrap on RNA-Seq data to test my genes of interest?
thanks!!
I have an RNA_Seq dataset (FPKM values) obtained from the ENCODE Caltech datasets for K562 cells.
I have 6 different lists of genes and I want to show that the expression values of my panel of genes is not random...therefore I decided to use R and perform bootstrapping.
bootstrap, for what I know, requires normally distributed data. my RNA-Seq data are not normally distributed and they are strongly skewed on the left.
I used BoxCox transformation
Code:
box.cox.powers(RNA_Seq[,2])
I obtained a negative value (-0.03701075 to be precise) and raised my FPKM values to that. I then used
Code:
scale()
that's because I am interested to see if my panel of genes is more highly expressed -or it has a lower expression- than what you would expect by chance.
However...and here's the problem...the values with the HIGHEST FPKM became those with the LOWEST z-score and vice versa since I elevated to a negative number...I understand why this happens mathematically, however, I don't really know how to handle the bootstrap now because the data show the exact opposite of what I would reasonably expect!
so...how would you guys handle bootstrap on RNA-Seq data to test my genes of interest?
thanks!!
Comment