I have spectral count data (SomaScan proteomics read).
What I have is protein name / uniprot identifier and a count (e.g., 188.9 or 106219.5).
I wanted to use PLGEM in R / bioconductor to analyze the data for differential expression between groups, but it says that 'normalized' data is required. Does anyone have a recommendation of what type of normalization to do?
I can't do something like dNSAF since I only have counts for called proteins, not counts for peptides.
Thanks.
Update: I've seen two thoughts on normalization, horizontal and vertical
one would be to normalize across samples (eg divide all protein counts by the total protein count for that sample so that that normalizes to the total protein load)
The other would be to normalize across each protein (e.g. mean 0, sd 1, so highly expressed proteins don't have an outsized effect on the significance levels)
You could even do the first then the second. Is there any good reason to to either or both?
What I have is protein name / uniprot identifier and a count (e.g., 188.9 or 106219.5).
I wanted to use PLGEM in R / bioconductor to analyze the data for differential expression between groups, but it says that 'normalized' data is required. Does anyone have a recommendation of what type of normalization to do?
I can't do something like dNSAF since I only have counts for called proteins, not counts for peptides.
Thanks.
Update: I've seen two thoughts on normalization, horizontal and vertical
one would be to normalize across samples (eg divide all protein counts by the total protein count for that sample so that that normalizes to the total protein load)
The other would be to normalize across each protein (e.g. mean 0, sd 1, so highly expressed proteins don't have an outsized effect on the significance levels)
You could even do the first then the second. Is there any good reason to to either or both?