Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq: filtering genes by supplementary information

    Hi all,

    I have noted and read, in different threads on this forum, about filtering data in DESeq. However, the filtering I'm looking for is not primarily about the data quality or reducing hypotheses for limiting FDR.

    My situation is that I will run the nbinomTest()-function from the DESeq package,
    And I will do so on a subset of the data based on supplementary information about the genes. Namely, only those genes which are located on sex-chromosomes. Also, i will compare the result when using different conditions ('cds_object@phenoData@data$condition').

    I already know which genes are located on sex-chromosomes, so that isn't the problem - the problem arises when I want to filter out those genes in a CountDataSet-object (cds-object) in R.
    After size factor and dispersion is estimated, this is not a trivial thing to do since the 'gene-names' is now inside the cds-object, and no longer a column of the count-table.

    The walkaround is to use a different, already filtered, input data (count table) - but that will lead to another sizeFactor, and I don't wan't to call the estimateSizeFactor()-function on the filtered data set since it is genes from the sex-chromosome (we do expect a higher over-all expression in the homogametic sex, depending on the level of dosage compensation - and that should not affect the sizeFactor). So therefore, the full dataset is needed only to aquire the correct sizeFactors. And then, instead of filtering this dataset - I use another, already filtered, dataset (that really is just a subset of the first dataset) and assign the previously aquired sizeFactors to the new cds-object. This feels quite awkward and is also ineffective - especially if you would run in into a situation where different kind of filtering is to be done on the same dataset, and all the filtering is based on supplementary information.

    I feel as it is a better way of doing this. I'm new to both R and DESeq, so it might be something simple that I'm just missing. For example, lets say that I have an R-vector where each element correspond to a gene-name I want to filter out. Is there a way to grep these gene-names in the CountDataSet-object, so that I get the gene-names with corresponding samples' gene-counts, and the sizeFactor is untouched, saved in a new cds-object?

    Thanks in advance
    Markus

  • #2
    Have you tried just subsetting it as normal (i.e., "cds_object[IDX,]" where IDX is an index of genes of interest)? The CountDataSet extends an eSet, and that's how you'd subset that.

    Comment


    • #3
      Originally posted by dpryan View Post
      Have you tried just subsetting it as normal (i.e., "cds_object[IDX,]" where IDX is an index of genes of interest)? The CountDataSet extends an eSet, and that's how you'd subset that.
      Hi, and thank you for your quick reply.

      Using standard indexing did not help, since the actual count-table is just a part of the cds_object, not the cds_object itself. However, i did find a way to solve it: The count data can be found in cds_object@assayData$counts. So from there it shouldn't be any problem with indexing and filtering.

      Comment


      • #4
        Right, but if an object offers a method for subsetting then that will typically apply to all of its components. So generally:

        Code:
        cds_sub <- cds_object[IDX,]
        table(counts(cds_sub) == counts(cds_object)[IDX,])
        (or something like that) will yield all True.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 08:47 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X