The end goal is to cluster some time series data.
Using multiple different methods, I've identified a number of differentially expressed genes over a timecourse.
I've got two possible sets of genes I can work with. ~2,000 out of ~70,000 genes have been identified as differentially expressed by at 3 out of the 5 methods I'm using. Or alternatively ~200 have been identified by all 5 methods.
The 3000 gene set is to big to cluster - BHC can deal with time series data (I can use it on the smaller dataset), but on a data set that size it locks up the machine overnight and then crashes.
How does one go about reducing the dimensionality of the set? As in, I'd like to collapse groups of genes with similar expression profiles within that 2,000 to be represented by a single metagene. Or are there better methods of clustering time series data?
The bioconductor package Farm, appears to do this, but only for data presented in affybatch form, which I don't have - mines all RNASeq data in an expressionSet.
Cheers
Ben.
Using multiple different methods, I've identified a number of differentially expressed genes over a timecourse.
I've got two possible sets of genes I can work with. ~2,000 out of ~70,000 genes have been identified as differentially expressed by at 3 out of the 5 methods I'm using. Or alternatively ~200 have been identified by all 5 methods.
The 3000 gene set is to big to cluster - BHC can deal with time series data (I can use it on the smaller dataset), but on a data set that size it locks up the machine overnight and then crashes.
How does one go about reducing the dimensionality of the set? As in, I'd like to collapse groups of genes with similar expression profiles within that 2,000 to be represented by a single metagene. Or are there better methods of clustering time series data?
The bioconductor package Farm, appears to do this, but only for data presented in affybatch form, which I don't have - mines all RNASeq data in an expressionSet.
Cheers
Ben.
Comment