Hi,
I am currently analyzing a RNASeq experiments with 2 conditions. Each condition has several biological replicates and each replicate was sequenced multiple times (on different lanes and even flowcells). For each sub-replicate I have a separate fastq file.
Control 1-1 Treatment 1-1
Control 1-2 Treatment 1-2
Control 1-3 Treatment 1-3
Control 2-1 Treatment 2-1
Control 2-2 Treatment 2-2
Control 2-3 Treatment 2-3
...
I performed now 2 analyses:
1) Fastq -> STAR -> htseq count -> DESeq2
2) Fastq -> Merge Fastqs of each replicate (1-1+1-2+1-3) -> STAR -> htseq count -> DESeq2
I checked the count files and the individual counts of the replicates sum up to the merged count file.
However, when I run now DESeq2 on the different count-data-sets I get different results (most notably the adj p-value).
Not merged:
Merged:
Could you please tell me if that is expected?
Which approach should I use?
Thanks for you help,
Stephan
I am currently analyzing a RNASeq experiments with 2 conditions. Each condition has several biological replicates and each replicate was sequenced multiple times (on different lanes and even flowcells). For each sub-replicate I have a separate fastq file.
Control 1-1 Treatment 1-1
Control 1-2 Treatment 1-2
Control 1-3 Treatment 1-3
Control 2-1 Treatment 2-1
Control 2-2 Treatment 2-2
Control 2-3 Treatment 2-3
...
I performed now 2 analyses:
1) Fastq -> STAR -> htseq count -> DESeq2
2) Fastq -> Merge Fastqs of each replicate (1-1+1-2+1-3) -> STAR -> htseq count -> DESeq2
I checked the count files and the individual counts of the replicates sum up to the merged count file.
However, when I run now DESeq2 on the different count-data-sets I get different results (most notably the adj p-value).
Not merged:
Code:
baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> ENSG00000143369.10 479.26267 6.1618396 0.28753411 21.42994 7.026598e-102 2.636520e-97 ENSG00000135250.12 381.34094 0.7005228 0.03782914 18.51807 1.476331e-76 2.769745e-72 ENSG00000163898.5 62.74118 5.4375752 0.30318159 17.93504 6.281376e-72 7.856327e-68 ENSG00000154269.10 65.19322 3.9433545 0.23587129 16.71825 9.651700e-63 9.053777e-59 ENSG00000108821.9 306.78844 3.0304242 0.18264854 16.59156 8.020919e-62 6.019218e-58
Code:
baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> ENSG00000077327.11 63.88176 -3.1465321 0.5082833 -6.190509 5.997039e-10 1.553113e-05 ENSG00000222022.1 13.70911 3.2297638 0.5455883 5.919782 3.223688e-09 4.174354e-05 ENSG00000135250.12 3787.92187 0.7295915 0.1324829 5.507063 3.648689e-08 3.149791e-04 ENSG00000108821.9 2925.22227 2.4831871 0.4592653 5.406869 6.413598e-08 4.152484e-04 ENSG00000110427.10 74.67557 2.7264235 0.5144769 5.299409 1.161784e-07 6.017577e-04
Could you please tell me if that is expected?
Which approach should I use?
Thanks for you help,
Stephan
Comment