Hi all,
I have noted and read, in different threads on this forum, about filtering data in DESeq. However, the filtering I'm looking for is not primarily about the data quality or reducing hypotheses for limiting FDR.
My situation is that I will run the nbinomTest()-function from the DESeq package,
And I will do so on a subset of the data based on supplementary information about the genes. Namely, only those genes which are located on sex-chromosomes. Also, i will compare the result when using different conditions ('cds_object@phenoData@data$condition').
I already know which genes are located on sex-chromosomes, so that isn't the problem - the problem arises when I want to filter out those genes in a CountDataSet-object (cds-object) in R.
After size factor and dispersion is estimated, this is not a trivial thing to do since the 'gene-names' is now inside the cds-object, and no longer a column of the count-table.
The walkaround is to use a different, already filtered, input data (count table) - but that will lead to another sizeFactor, and I don't wan't to call the estimateSizeFactor()-function on the filtered data set since it is genes from the sex-chromosome (we do expect a higher over-all expression in the homogametic sex, depending on the level of dosage compensation - and that should not affect the sizeFactor). So therefore, the full dataset is needed only to aquire the correct sizeFactors. And then, instead of filtering this dataset - I use another, already filtered, dataset (that really is just a subset of the first dataset) and assign the previously aquired sizeFactors to the new cds-object. This feels quite awkward and is also ineffective - especially if you would run in into a situation where different kind of filtering is to be done on the same dataset, and all the filtering is based on supplementary information.
I feel as it is a better way of doing this. I'm new to both R and DESeq, so it might be something simple that I'm just missing. For example, lets say that I have an R-vector where each element correspond to a gene-name I want to filter out. Is there a way to grep these gene-names in the CountDataSet-object, so that I get the gene-names with corresponding samples' gene-counts, and the sizeFactor is untouched, saved in a new cds-object?
Thanks in advance
Markus
I have noted and read, in different threads on this forum, about filtering data in DESeq. However, the filtering I'm looking for is not primarily about the data quality or reducing hypotheses for limiting FDR.
My situation is that I will run the nbinomTest()-function from the DESeq package,
And I will do so on a subset of the data based on supplementary information about the genes. Namely, only those genes which are located on sex-chromosomes. Also, i will compare the result when using different conditions ('cds_object@phenoData@data$condition').
I already know which genes are located on sex-chromosomes, so that isn't the problem - the problem arises when I want to filter out those genes in a CountDataSet-object (cds-object) in R.
After size factor and dispersion is estimated, this is not a trivial thing to do since the 'gene-names' is now inside the cds-object, and no longer a column of the count-table.
The walkaround is to use a different, already filtered, input data (count table) - but that will lead to another sizeFactor, and I don't wan't to call the estimateSizeFactor()-function on the filtered data set since it is genes from the sex-chromosome (we do expect a higher over-all expression in the homogametic sex, depending on the level of dosage compensation - and that should not affect the sizeFactor). So therefore, the full dataset is needed only to aquire the correct sizeFactors. And then, instead of filtering this dataset - I use another, already filtered, dataset (that really is just a subset of the first dataset) and assign the previously aquired sizeFactors to the new cds-object. This feels quite awkward and is also ineffective - especially if you would run in into a situation where different kind of filtering is to be done on the same dataset, and all the filtering is based on supplementary information.
I feel as it is a better way of doing this. I'm new to both R and DESeq, so it might be something simple that I'm just missing. For example, lets say that I have an R-vector where each element correspond to a gene-name I want to filter out. Is there a way to grep these gene-names in the CountDataSet-object, so that I get the gene-names with corresponding samples' gene-counts, and the sizeFactor is untouched, saved in a new cds-object?
Thanks in advance
Markus
Comment