Dear all,
I have a general question regarding my time course RNAseq data analysis. I am using a kind of worm which can regenerate the missing body part, and I would like to know the significantly changing genes and expression profiles during regeneration.
I have 11 time points: 0hr (as control), 1hr post injury, 2hr, 4hr, 6hr,....., until 48hr. The strategy in my mind is to do the pairwise comparison between "0hr and 1hr", "0hr and 2hr", "0hr and 4hr"......, "0hr and 48hr" using DESeq2 , and filter out those genes which don't show any change across all time points compared to 0hr.
My questions are:
1. Should I individually pre-extract the pairwise datasets (raw read count data) and do the DESeq2 test (dds function) separately? or I can do the DESeq2 test first based on whole raw read count data containing all time points and later extract the pairs I want (0hr vs 1hr, 0hr vs 2hr......). The confusing point is that since dds function estimates the size factor, calculate gene dispersion, create the model and test the model fitting, I wonder if using pre-extracted pair of raw read count data and using whole time points can result in different estimation and calculation (and maybe also padj due to the different testing sample size).
2. If I also want to analyze the post-calculating fold-change relatively to time 0 of differentially expressed genes from DESeq2 results (e.g. to do the clustering), should I use the providing fold-changes from dds&res function or use the variance-stabilizing transformation?
My goal to deal with those data is to first extract genes with at least one differential expression (e.g. use padj < 0.05) across all time points (filter out those showing no changes relative to control). Second, I want to apply the fold-change threshold (e.g. FC > 1.5 or FC < 0.5) to further extract highly changing gene profiles. Third, I want to use the padj&FC-extracted subset of profiles for clustering, GO-term enrichment test, and other analyses.
I am really new. So any opinion would be helpful.
Thank you.
I have a general question regarding my time course RNAseq data analysis. I am using a kind of worm which can regenerate the missing body part, and I would like to know the significantly changing genes and expression profiles during regeneration.
I have 11 time points: 0hr (as control), 1hr post injury, 2hr, 4hr, 6hr,....., until 48hr. The strategy in my mind is to do the pairwise comparison between "0hr and 1hr", "0hr and 2hr", "0hr and 4hr"......, "0hr and 48hr" using DESeq2 , and filter out those genes which don't show any change across all time points compared to 0hr.
My questions are:
1. Should I individually pre-extract the pairwise datasets (raw read count data) and do the DESeq2 test (dds function) separately? or I can do the DESeq2 test first based on whole raw read count data containing all time points and later extract the pairs I want (0hr vs 1hr, 0hr vs 2hr......). The confusing point is that since dds function estimates the size factor, calculate gene dispersion, create the model and test the model fitting, I wonder if using pre-extracted pair of raw read count data and using whole time points can result in different estimation and calculation (and maybe also padj due to the different testing sample size).
2. If I also want to analyze the post-calculating fold-change relatively to time 0 of differentially expressed genes from DESeq2 results (e.g. to do the clustering), should I use the providing fold-changes from dds&res function or use the variance-stabilizing transformation?
My goal to deal with those data is to first extract genes with at least one differential expression (e.g. use padj < 0.05) across all time points (filter out those showing no changes relative to control). Second, I want to apply the fold-change threshold (e.g. FC > 1.5 or FC < 0.5) to further extract highly changing gene profiles. Third, I want to use the padj&FC-extracted subset of profiles for clustering, GO-term enrichment test, and other analyses.
I am really new. So any opinion would be helpful.
Thank you.
Comment