Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deseq2 question

    Hi all,

    As a newbie in seq analysis, i have a question..: i have imported my data to the Deseq2 package. My data have nothing to do with comparisons (they are coming from the same cell population). What i'm trying to identify is how to find out which genes characterise my cell population (highest expression). Do i have to observe the baseMean value? and if yes, what is the threshold i would use? what is for example a baseline for a baseMean value?

    I hope i am clear and not cause any confusion..

    Thanks!!

  • #2
    Hi,
    if you don't have a comparison, aka differential expression, the DE part of DESeq2, I am not sure why you would go that route.

    You have some other things to think about e.g. gene length and others that might inflate counts of gene 1 compared to gene 2 for two genes having the same cellular abundance.

    Once you determine how you are going to account for gene specific factors to affect a normalization gene by gene, simply sort after normalization and take the top N rows. If you aren't going to do this, you could do it all via grep|sort|head at the command line and skip R. Something like

    grep "gene_specific_prefix" HTSeq_count_file.out | sort -k 2,2nr | head -n N >results_file.txt

    So, for human, the gene specific prefix would be something like ENS. Or, since you are in R already, simply sort and take the top genes and get rid of the couple of lines for read stats at the bottom of the count stack. Instead of specifying a count (baseMean or I would do this myself via apply) threshold, I might choose the genes that make up the top 10% (or 20 or ...) of the total. cumsum in R.

    Comment


    • #3
      I should have stated, before you sort in R, make sure you have removed the bottom couple of rows.

      Comment


      • #4
        Thanks a lot for your reply. I got the first point and you are absolutely right that i don't need to insert it to R. One of the reason that comes to my mind is to merge the count matrices that come from the samples (eg the same timepoint). Specifically i have 8 samples from the same developmental timepoint. Of course i could merge them using a simple python function.
        For the second part (talking about the prefix), i lost you a bit (newbie ).
        My question is when i sort for the baseMean (eg >15000) i get 30 specific genes with high values..is this not a way to get a first glance at the data? even if i open the count matrices i get the same genes....

        Comment


        • #5
          Ahh, now you added new info, you have replicates AND a structure to your experiment.

          Yes, that would be ok.

          If you had only a single sample AND the counts were from HTSeq, you could get only the genes using grep on a string specific to your genes. For human genes using Ensembl id's, all names start with ENS.

          Since you have replicates AND an experiment, this would not be the best way to go. Your way is fine, although you still should consider gene length etc if you are going to make specific statements on abundance. Even then you will have a lot of unknown factors that make these statements difficult when comparing across genes rather than within a single gene but across experimental units.

          Comment


          • #6
            Perfect! Thanks a lot for your help.... it is true that i didn't include a lot of details. Actually we sorted cells (belonging to the same population), coming from 3 different development stages. What we need to do, is to characterize these populations with multiple markers. So, i assume that i don't have to come to any comparison. And you are right about the internal controls..we have housekeeping genes and also experimental RNA controls where we can normalize the gene values. The whole confusion was, what could i reply if somebody asked me what baseMean means? what units? otherwise i think im starting understanding the analysis.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Working...
            X