SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq Multi-factor designs: determining the significance of model terms DJParker Bioinformatics 8 07-21-2014 02:21 PM
DESeq without replicates austinpa Bioinformatics 43 07-15-2014 06:38 PM
Help Using DESeq without replicates aprilw Bioinformatics 6 12-19-2013 07:07 AM
DESeq with incomplete replicates Roy Bioinformatics 3 09-12-2012 10:57 PM
DESeq: question about with replicates and without any replicates. nb509 RNA Sequencing 2 10-25-2011 06:04 AM

Reply
 
Thread Tools
Old 03-20-2013, 09:45 AM   #1
SEQnovice
Junior Member
 
Location: Toronto

Join Date: Nov 2012
Posts: 6
Default Determining Replicates for DESeq?

Hello, I am relatively new to RNA-Seq data analysis so I apologize in advance if this is a novice question.
I have been reading several forums on the subject and I think I have the general idea, but I would be happy to get advice from the experts.

I have RNA-Seq data for multiple samples of a specific cancer. Previously from Gene Expression analysis (on Illumina platforms) the lab ran before I arrived, we have learned in that this cancer can be divided into 3 subgroups (1,2,3). I would like to find the list of differentially expressed genes between RNA-Seq samples pertaining to subgroups 2 and 3, using DESeq.

I am somewhat confused here as to what I should consider as my 'biological replicates', or whether I should not consider replicates at all for my analysis.

For each of the subgroups, I only have one sample per patient. So the scenario looks like this:

Subgroup 2: Sample A, Sample B, Sample C, Sample D

Subgroup 3: Sample X, Sample Y, Sample Z, Sample F, Sample W

In this case, should I consider that Samples A,B,C,D are all biological replicates of Subgroup 2, and Samples X,Y,Z,F,W as biological replicates of Subgroup3? Each of the samples in the subgroup pertains to a different patient, and there is no control sample for each patient (and in this case, no pairing between my samples).

If I am to consider this scenario, any advice on the DESeq parameters? Right now I am just running the defaults as appears in the vignette.

The alternative is to consider that I don't have any replicates and run the two groups. So the DESeq for calculating dispersions would be like this:

cds = estimateDispersions( cds, method="blind", sharingMode="fit-only", fitType="local" )

I have tried both scenarios. In the case where I don't consider any replicates at all, I have ended up with a much larger number of differentially expressed genes at p<0.1 (1380 as opposed to 92).

Any advice would be appreciated! Thank you in advance! Deena
SEQnovice is offline   Reply With Quote
Old 03-20-2013, 09:57 AM   #2
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by SEQnovice View Post
In this case, should I consider that Samples A,B,C,D are all biological replicates of Subgroup 2, and Samples X,Y,Z,F,W as biological replicates of Subgroup3?
Short answer: Yes.

Longer answer: I should probably write a more extensive answer on this as you are not the first one to ask with question, discussing why the term "biological replicate" is actually quite an abuse of terminology, that did manage to cause quite some confusion. I'll get to that.
Simon Anders is offline   Reply With Quote
Old 03-20-2013, 10:01 AM   #3
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

I should add: For a comparison of cancer types, three and four samples are usually way too few, and even more so, if you don't have matched healthy tissue samples from the same patients, so I wouldn't be too optimistic about your results.

Also, are you sure that you got _more_ hits with "blind" than with the standard work-flow? Should be the other way round.
Simon Anders is offline   Reply With Quote
Old 03-20-2013, 10:18 AM   #4
SEQnovice
Junior Member
 
Location: Toronto

Join Date: Nov 2012
Posts: 6
Default

Hi Simon,
Thanks for the very speedy answer! The cancer types are slightly larger (9 vs 6), I was just putting out a generic question, but you are right that they are still quite few in number either way.

I guess the main confusion is that in this case, the pooled samples are not truly biological replicates. In my scenario I would have originally considered that biological replicates are if I had multiple cancer samples per patient for each subgroup, so Sample A1, A2, etc. ...I look forward to reading your explanation on this.

And yes, I did get more hits with blind than standard workflow, which was why I started questioning the issue of replicates. I did have a look at the variance between the gene counts for the samples of each of my subgroups, there doesn't seem to be a high degree of variation in these samples with the exception of 8-9 genes that are outliers per subgroup.
This may be explain why the number of differentially expressed is quite poor?

I will run it again just to be sure and let you know.
Thanks,
Deena
SEQnovice is offline   Reply With Quote
Old 03-20-2013, 10:27 AM   #5
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

"sharing-mode="fit-only"' is extremely sensitive to outliers, which are turned into false positives. This is why we recommend to avoid it (except for the blind mode where it is unavoidable). So all the extra hits are probably false positives.

This whole stuff with the sharing mode is a bit of a hack, and replacing this with something more well founded was one of the main motivations for developing DESeq2.
Simon Anders is offline   Reply With Quote
Old 03-20-2013, 10:38 AM   #6
SEQnovice
Junior Member
 
Location: Toronto

Join Date: Nov 2012
Posts: 6
Default

Thanks, I will look into DESeq2.

But just for the sake of completing this exercise, I am assuming the following dispersion estimation is correct?

cds = estimateDispersions( cds, method="blind", sharingMode="maximum", fitType="parametric" )

Why wouldn't you use "pooled" or "per-condition" here for the method? Just thinking with regards to dealing with the outliers.

Thanks,
Deena
SEQnovice is offline   Reply With Quote
Reply

Tags
deseq, differential expression, replicates, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:32 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO