SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Reply
 
Thread Tools
Old 07-30-2014, 03:22 PM   #1
Kolamite
Junior Member
 
Location: Baltimore

Join Date: Nov 2012
Posts: 7
Default Combining RNA-seq datasets

I searched a lot of threads here and elsewhere without finding anything exactly like this.

I have two RNA-seq datasets from different dates and different platforms. Split between them are 4 groups (normal and 3 stages of cancer). This is the distribution:

Set 1
1x normal
2x stage 1
3x stage 2
7x stage 3

Set 2
2x normal
2x stage 1
2x stage 2
2x stage 3

We want to combine the datasets and make comparisons between the groups for differential expression. So far, I've tried:

-Combine all FPKM values into 1 table
-Run ComBat on the table, specifying dataset as batch and stage as a covariate
-Skipped voom() since they are not raw counts and log2 converted the ComBat output for limma.
-Run lmFit, contrasts.fit, and eBayes from limma on the converted output.

My questions/confusion is over:

1. Should I be using FPKM values or the raw counts for this, given the two datasets and need for batch removal?

2. What is the best way to run limma on the ComBat output without conversion through voom()?

3. Are there any other glaring problems with this approach?

Thanks!
Kolamite is offline   Reply With Quote
Old 01-27-2015, 08:25 AM   #2
habbas
Junior Member
 
Location: texas

Join Date: Nov 2014
Posts: 8
Default

Hi -
Were you able to find a solution to this problem?
habbas is offline   Reply With Quote
Old 01-27-2015, 08:14 PM   #3
Gordon Smyth
Member
 
Location: Melbourne, Australia

Join Date: Apr 2011
Posts: 91
Default

This is actually a very common type of RNA-seq analysis where we combine two datasets. You can run voom and limma on the raw counts, as you would for any analysis. When you form the design matrix, include a term for the batch effect like this:

design <- model.matrix(~Stage+Set)

Here Set is the batch factor taking values "Set1" and "Set2" and Stage is the experimental factor taking values "Normal", "Stage1", "Stage2" and "Stage3".

This is very standard type of analysis. There is no need for any external batch correction such as Combat.
Gordon Smyth is offline   Reply With Quote
Old 01-28-2015, 09:13 AM   #4
habbas
Junior Member
 
Location: texas

Join Date: Nov 2014
Posts: 8
Default

Hi Gordon- Thanks for your reply.
I am a beginner in this. I know how to use DESeq2 to analyze RNASeq data from tables generated by summarizeOverlaps function. I was wondering how would your suggestion be implemented in this.
Hussein
habbas is offline   Reply With Quote
Old 01-28-2015, 09:35 AM   #5
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Quote:
Originally Posted by Gordon Smyth View Post
This is actually a very common type of RNA-seq analysis where we combine two datasets. You can run voom and limma on the raw counts, as you would for any analysis. When you form the design matrix, include a term for the batch effect like this:

design <- model.matrix(~Stage+Set)

Here Set is the batch factor taking values "Set1" and "Set2" and Stage is the experimental factor taking values "Normal", "Stage1", "Stage2" and "Stage3".

This is very standard type of analysis. There is no need for any external batch correction such as Combat.
I have been using limma for a while now, but I am also a bit unsure about the syntax of the model.matrix command when it comes to batch effects, random effects, paired design, etc.

If I understood correctly, then I don't need to use any special command like removeBatchEffect() ? I see this command come up on some Bioconductor threads when I google. But I no longer see it in the limma manual. Is this now deprecated?

Thank you for talking the time to answer my question.
NGSfan is offline   Reply With Quote
Old 01-28-2015, 12:59 PM   #6
Gordon Smyth
Member
 
Location: Melbourne, Australia

Join Date: Apr 2011
Posts: 91
Default

Quote:
Originally Posted by NGSfan View Post
If I understood correctly, then I don't need to use any special command like removeBatchEffect() ? I see this command come up on some Bioconductor threads when I google. But I no longer see it in the limma manual. Is this now deprecated?
Type ?removeBatchEffect to read the documentation page. The documentation page explains that it is used to make unsupervised plots rather than for differential expression analyses.

This is the way that removeBatchEffect has always been treated. It has not been removed from any documentation.

Last edited by Gordon Smyth; 01-29-2015 at 10:06 PM. Reason: minor grammar improvement
Gordon Smyth is offline   Reply With Quote
Reply

Tags
combat, limma, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:31 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO