SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Understanding DESeq2 dispersion estimates ysnapus Bioinformatics 4 03-10-2014 05:00 PM
DESeq2 multi-factor designs id0 Bioinformatics 3 01-10-2014 07:25 AM
DESeq2: Multi-factor designs sindrle Bioinformatics 10 10-21-2013 08:47 AM
DEXSeq for multi-factor design alittleboy Bioinformatics 5 06-27-2013 10:43 AM

Reply
 
Thread Tools
Old 04-04-2014, 08:51 AM   #1
kajot
Member
 
Location: Germany

Join Date: Dec 2013
Posts: 16
Default DESeq2 - understanding multi-factor DE analysis

Hello again dear SeqAnswers,

I have another question regarding RNA-seq data processing, this time with DESeq. I had count tables from my experiment generated successfully by DEXSeq counting script. Since I wanted to use DESeq2 this time, without having to re-count all the samples, I used geneCountTable function on my ExonCountSet in R to "compress" all exon bins into genes.

1) I hope this is correct approach to obtaining gene count tables for DESeq ?

Now the main question, my study is about gender differences in response to some knock-out. So I have 4 sample groups:

Wild-type male
Wild-type female
Knock-out male
Knock-out female

I generated all the necessary tables following the DESeq2 vignette:

sex strain
WTM_1.counts male WT
WTM_2.counts male WT
WTM_3.counts male WT
WTF_1.counts female WT
WTF_2.counts female WT
WTF_3.counts female WT
KOM_1.counts male KO
KOM_2.counts male KO
KOM_3.counts male KO
KOF_1.counts female KO
KOF_2.counts female KO
KOF_3.counts female KO


2) What is then a proper approach to see differences between gene expression of KO samples that would take into consideration the natural variance between genders in wild-type (like sex-specific genes) ?

I know I can manipulate the GLM with the formula, but to be honest I am unsure if I understand the entire meaning behind it.

Currently, just for fun, I used formula (design = ~ sex + strain). Now what I got back are 3 tables with fold-changes - Intercept, male vs female, KO vs WT.

3) What is intercept exactly ? Are values in tables male vs. female and KO vs. WT simply pair-wise comparisons, like ones I would get by using formulas: design = ~ sex and design = ~ strain separately ?

I know this might take a longer answer so I will patiently wait for somebody's time. I will be extremely thankful for any insight, I am a beginner in RNA-seq , hence those questions might seem trivial for some.

All the best!
kajot is offline   Reply With Quote
Old 04-04-2014, 09:16 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Summing the counts from DEXSeq may lead to spurious results since you'll be double counting reads overlapping multiple exons. If you found any differential exon usage with DEXSeq, then you're even more likely to get spurious results here. You can get counts pretty quickly with featureCounts(), so give that a try.

If you're interested in "gender differences in response to some knock-out" then I would think you'd be most interested in the sex:strain interaction (so "design = ~ sex*strain").

The intercept is the base level of the model, which appears to be WT females (presumably you releveled() the strain factor in your design, since normally they're set alphabetically, meaning that KO females would be the base level), unless I have that backward. The "male vs female" result would be interpreted as "differences due to gender after accounting for differences due to strain". The "KO vs. WT" results would have a similar interpretation.
dpryan is offline   Reply With Quote
Old 04-07-2014, 12:56 AM   #3
kajot
Member
 
Location: Germany

Join Date: Dec 2013
Posts: 16
Default

That's very concise and informative. Thank you for the answer !


However, one more thing troubles me, I have an Ensembl GRCm38 .gtf file that contains in 3rd column values such as exon, CDS, start_codon, stop_codon, but no "gene". Since I want to use the .gtf file for HTseq counting of reads based on their overall mapping to genes, I thought I should use the -t gene option. Otherwise the default -t exon would count the multi-exon reads twice into same gene, like the DEXseq counting script, right ? Or maybe the -t exon option is fine for preparing count tables for DESeq ?

Last edited by kajot; 04-07-2014 at 01:18 AM.
kajot is offline   Reply With Quote
Old 04-07-2014, 05:12 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

htseq-count won't double count reads mapping to multiple exons, since it's designed with gene-level analyses in mind. The scripts that come with DEXSeq are different in this regard, since there it makes sense to double count in these instances (the DESeq/DEXSeq authors put enough thought into things that you won't normally shoot yourself in the foot doing this sort of thing). If you used the default "-t exon -i gene_id" then what you're telling the script to do is to look at reads mapping within exons and then count them according to how they overlap genes denoted by their gene_id. This is normally the appropriate way to go about things.
dpryan is offline   Reply With Quote
Old 04-07-2014, 05:41 AM   #5
kajot
Member
 
Location: Germany

Join Date: Dec 2013
Posts: 16
Default

I understand, I wasn't sure since in my .gtf file there are entries like this:

1 protein_coding exon 4830268 4830315 . + . exon_id "ENSMUSE00001118471"; exon_number "4"; gene_biotype "protein_coding"; gene_id "ENSMUSG00000025903"; gene_name "Lypla1"; p_id "P31648"; transcript_id "ENSMUST00000150971"; transcript_name "Lypla1-003"; tss_id "TSS49998";
1 protein_coding exon 4830268 4830315 . + . exon_id "ENSMUSE00001118471"; exon_number "4"; gene_biotype "protein_coding"; gene_id "ENSMUSG00000025903"; gene_name "Lypla1"; p_id "P31294"; transcript_id "ENSMUST00000119612"; transcript_name "Lypla1-009"; tss_id "TSS45053";


As you can see, there is a double entry for two transcripts from the same gene, but the exon position and gene IDs are the same, therefore I was a bit concerned that the script might count a read mapped to such exon twice for the same ENSMUSG00000025903.
kajot is offline   Reply With Quote
Old 04-07-2014, 05:42 AM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Ah, yeah, that won't cause any double counting or other problems
dpryan is offline   Reply With Quote
Old 04-07-2014, 05:45 AM   #7
kajot
Member
 
Location: Germany

Join Date: Dec 2013
Posts: 16
Default

Ok, all is clear, I am deeply grateful for your help
kajot is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:04 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO