DEG analysis with paired sample without replication

younko

Member

Join Date: May 2014

Posts: 24
- Share
- Tweet
#1

DEG analysis with paired sample without replication

05-02-2014, 01:06 AM

I have a question regarding to analyze the RNAseq data.

My data is exactly looks like this.(total 12 samples)

disease1 patient1 pre-hormone
disease1 patient1 post-hormone
disease1 patient2 pre-hormone
disease1 patient2 post-hormone
disease1 patient3 pre-hormone
disease1 patient3 post-hormone
disaese2 patient4 pre-hormone
disaese2 patient4 post-hormone
dissease3 patient5 pre-hormone
disease3 patient5 post-hormone
disease3 patient6 pre-hormone
disease3 patient6 post-hormone..

I would like to find the genes that are affected after hormone treatment ( such as differentially expressed genes).
Since, it is not easy to handle multiple diseases, multiple patients within disease, and hormone, I could not create the design matrix for GLM analysis.

The easist way to handle this might separate the data per each disease.
(e.g. disease1, disease2, disease3)

Then, I made three different experiment set (for disease 1, disease2, disease3) and test with patient and hormone factor. Does it make sense?? Three different experiment set is called data1(disease1), data2(disease2), data3(disease3).

data1

disease1 patient1 pre-hormone
disease1 patient1 post-hormone
disease1 patient2 pre-hormone
disease1 patient2 post-hormone
disease1 patient3 pre-hormone
disease1 patient3 post-hormone

data2

disaese2 patient4 pre-hormone
disaese2 patient4 post-hormone

data3

dissease3 patient5 pre-hormone
disease3 patient5 post-hormone
disease3 patient6 pre-hormone
disease3 patient6 post-hormone..

so, to test the differentially expressed genes after hormone treatment for disease 2 with DESeq

d_hunter <- estimateDispersions(data2, method="blind", sharingMode="fit-only") // I put this one since I only have one sample for data2(disease2)
plotDispEsts(data2)

plotPCA(varianceStabilizingTransformation(data2), intgroup=c("patient", "hormone"))

dh_fit1 = fitNbinomGLMs(data2, count ~ patient + hormone)
dh_fit0 = fitNbinomGLMs(data2, count ~ patient)
dh_pvalsGLM = nbinomGLMTest( dh_fit1, dh_fit0 )
dh_padjGLM = p.adjust( dh_pvalsGLM, method="BH" )

===============================

Same for EdgeR
dg_edesign <- model.matrix(~patient+hormone)

dg_e <- DGEList(counts=data2)
dg_e <- calcNormFactors(dg_e)
dg_e <- estimateGLMCommonDisp(dg_e, dg_edesign)
dg_e <- estimateGLMTrendedDisp(dg_e, dg_edesign)
dg_e <- estimateGLMTagwiseDisp(dg_e, dg_edesign)
dg_efit <- glmFit(dg_e, dg_edesign)
dg_efit <- glmLRT(dg_efit, coef=3)

==================================

This code is right???????????? I tried to do the same thing for DESeq and EdgeR. So, I expect these two codes did the same thing... even though the result could be different!!

=================================================
In addition I woud like to consider all samples at the same time instead of separating the data by disease, how can I do it?

design2 <- model.matrix (~disease+diseaseatient+disease:hormone)

would be a one way.. or

design2 <- model.matrix (~0+disease+diseaseatient+disease:hormone)

Does it make sense?

Could you please somebody give comments?

Thanks in advance

Last edited by younko; 05-06-2014, 05:08 PM. Reason: adding more comments
Tags: deg, paired sample

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

DEG analysis with paired sample without replication

Latest Articles

ad_right_rmr

News