Seqanswers Leaderboard Ad

**bioBob** · 09-03-2015, 04:13 AM

Hi,
if you don't have a comparison, aka differential expression, the DE part of DESeq2, I am not sure why you would go that route.

You have some other things to think about e.g. gene length and others that might inflate counts of gene 1 compared to gene 2 for two genes having the same cellular abundance.

Once you determine how you are going to account for gene specific factors to affect a normalization gene by gene, simply sort after normalization and take the top N rows. If you aren't going to do this, you could do it all via grep|sort|head at the command line and skip R. Something like

grep "gene_specific_prefix" HTSeq_count_file.out | sort -k 2,2nr | head -n N >results_file.txt

So, for human, the gene specific prefix would be something like ENS. Or, since you are in R already, simply sort and take the top genes and get rid of the couple of lines for read stats at the bottom of the count stack. Instead of specifying a count (baseMean or I would do this myself via apply) threshold, I might choose the genes that make up the top 10% (or 20 or ...) of the total. cumsum in R.

**bioBob** · 09-03-2015, 04:25 AM

I should have stated, before you sort in R, make sure you have removed the bottom couple of rows.

**Akis** · 09-03-2015, 04:39 AM

Thanks a lot for your reply. I got the first point and you are absolutely right that i don't need to insert it to R. One of the reason that comes to my mind is to merge the count matrices that come from the samples (eg the same timepoint). Specifically i have 8 samples from the same developmental timepoint. Of course i could merge them using a simple python function.
For the second part (talking about the prefix), i lost you a bit (newbie

).
My question is when i sort for the baseMean (eg >15000) i get 30 specific genes with high values..is this not a way to get a first glance at the data? even if i open the count matrices i get the same genes....

**bioBob** · 09-03-2015, 04:58 AM

Ahh, now you added new info, you have replicates AND a structure to your experiment.

Yes, that would be ok.

If you had only a single sample AND the counts were from HTSeq, you could get only the genes using grep on a string specific to your genes. For human genes using Ensembl id's, all names start with ENS.

Since you have replicates AND an experiment, this would not be the best way to go. Your way is fine, although you still should consider gene length etc if you are going to make specific statements on abundance. Even then you will have a lot of unknown factors that make these statements difficult when comparing across genes rather than within a single gene but across experimental units.

**Akis** · 09-03-2015, 05:07 AM

Perfect! Thanks a lot for your help....

it is true that i didn't include a lot of details. Actually we sorted cells (belonging to the same population), coming from 3 different development stages. What we need to do, is to characterize these populations with multiple markers. So, i assume that i don't have to come to any comparison. And you are right about the internal controls..we have housekeeping genes and also experimental RNA controls where we can normalize the gene values. The whole confusion was, what could i reply if somebody asked me what baseMean means? what units? otherwise i think im starting understanding the analysis.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Deseq2 question

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News