SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
DESeq-newCountDataSet lynn012 RNA Sequencing 14 03-21-2015 09:00 AM
DESeq problems janec Bioinformatics 17 08-05-2013 04:44 AM
DESeq problem NicoBxl Bioinformatics 21 08-11-2012 09:57 AM
DESeq 1.5.30 - estimateDispersions horizon Bioinformatics 5 03-29-2012 11:08 AM
DESeq package(1.5.24) elisadouzi Bioinformatics 1 10-01-2011 02:02 AM

Reply
 
Thread Tools
Old 02-25-2012, 07:02 AM   #1
hanshart
Member
 
Location: Germany

Join Date: Nov 2011
Posts: 27
Default DESeq questions

Hello,

I have 16 samples from 16 different human tissues (say "A","B",...,"P"), so no biological replicates (75bp single end reads). I want to study the expression levels of a specific group of genes for a specific tissue (let's say this tissue is "B"). A few questions on this scenario:

1. Is it valid to treat tissue B as 1 condition and the other 15 tissues as replicates for the "non-B" condition?

2. Although I'm only interested in a specific group of genes, is it recommended to supply DESeq with a count table of all genes instead of only the few genes, I'm interested in? (in order to give DESeq more information about the samples)

3. If 1. is ok: Is the following configuration a good choice for the dispersion estimation?
Code:
estimateDispersions(cds,method="blind",sharingMode="fit-only",fitType="local")
4. What would be the explanatory power of the analysis? (My hope is that 16 different samples give enough information to obtain meaningful results, despite the absence of replicates?!)

5. Would it be a great improvement to use technical replicates? (either 50bp paired end or 100bp stranded)

Thank you.

edit: added point 5

Last edited by hanshart; 02-25-2012 at 09:52 AM.
hanshart is offline   Reply With Quote
Old 03-04-2012, 11:39 PM   #2
hanshart
Member
 
Location: Germany

Join Date: Nov 2011
Posts: 27
Default

Can anyone help me with this, please?
hanshart is offline   Reply With Quote
Old 03-06-2012, 10:46 AM   #3
Jean
Member
 
Location: Canada

Join Date: Nov 2008
Posts: 37
Default

Just to get it started I can try to answer some of your points:

Quote:
Hello,

I have 16 samples from 16 different human tissues (say "A","B",...,"P"), so no biological replicates (75bp single end reads). I want to study the expression levels of a specific group of genes for a specific tissue (let's say this tissue is "B"). A few questions on this scenario:

1. Is it valid to treat tissue B as 1 condition and the other 15 tissues as replicates for the "non-B" condition?
If your questions is "what genes are uniquely/differentially expressed in tissue B" then I'd say yes this is valid. However, many DE RNAseq tools have hiccups when you feed them highly diverse "biological replicates", and I think this should be a concern if you choose this type of analysis.

Quote:
2. Although I'm only interested in a specific group of genes, is it recommended to supply DESeq with a count table of all genes instead of only the few genes, I'm interested in? (in order to give DESeq more information about the samples)
You will bias your samples by picking and choosing this way. The number of reads/gene are PROPORTIONAL to all other sequence that has been sampled and you should include all this information.

Quote:
3. If 1. is ok: Is the following configuration a good choice for the dispersion estimation?
Code:
estimateDispersions(cds,method="blind",sharingMode="fit-only",fitType="local")
This is specific to DESeq so someone else might be able to answer.

Quote:
4. What would be the explanatory power of the analysis? (My hope is that 16 different samples give enough information to obtain meaningful results, despite the absence of replicates?!)
More samples does not make up for lack of biological replicates. You have very little power in your question (what is unique to tissue B) except for the big obvious differences that you can hardly miss. Biological replicates are important.

Quote:
5. Would it be a great improvement to use technical replicates? (either 50bp paired end or 100bp stranded)
It's been shown several times in the literature that technical replicates in RNAsq are very tight and almost unnecessary for an experiment. This also depends on how deeply you've sequenced (if you don't have enough coverage, your variation is much higher). Tech reps are nice when you can afford them, but biological reps would be far more useful if you had to choose one.
Jean is offline   Reply With Quote
Old 03-06-2012, 11:28 AM   #4
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

What do you mean by "expression levels of a specific group of genes for a specific tissue." Are you wanting to know if they are differentially expressed in comparison to other tissues? Or maybe their expression level within that specific tissue in comparison to other genes within the same tissue?

For example, does it matter that gene A is expressed in tissue B and not tissue C, but at lower levels than tissue D? Or do you just want to know that the gene A is expressed more strongly than gene B in tissue B? Or do you just want to know that the gene is expressed at some level of significance?

Also, for #2 use all the genes and look at your genes of interest at the end.
chadn737 is offline   Reply With Quote
Old 03-06-2012, 12:16 PM   #5
ETHANol
Senior Member
 
Location: Western Australia

Join Date: Feb 2010
Posts: 308
Default

You can sequence these samples. You can analyze them. And you might even find something interesting. But you'll never be able to publish it (or at least you shouldn't be able to). And you'll always be wondering if it is reproducible and what you would find if you had the replicates you need.

So go ahead and sequence them but your main priority should be in figuring out how you are going to get the replicates you need. If it's not possible, you should reconsider putting your time to something potentially more productive.
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 03-06-2012, 01:54 PM   #6
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

You definitely need to explain more about your design before one can answer you questions well. Are these samples from the same individual or each tissue from a different one? Do you expect that your genes of interest will typically have similar expression in all but one tissue? What do you hope to do with your result?
Simon Anders is offline   Reply With Quote
Old 03-07-2012, 03:44 AM   #7
hanshart
Member
 
Location: Germany

Join Date: Nov 2011
Posts: 27
Default

Thank you all for your answers and sorry for the ambiguous explanations. I will try to explain it more detailed now.

I have 16 different samples from 16 different human tissues (e.g. liver, brain, lung,...). Each sample comprises the total mRNA and is from a different individual. Say 's1',...,'s16' are the names of the samples.
The initial question was, whether a specific gene (say 'g8') is differentially expressed in a particular sample (say 's3') (that was the guess).

'g8' belongs to the family of AARS genes. Just because of interest I studied the normalized read counts not only for 'g8' but also for all other AARS-genes over all samples. This gave me 16 normalized counts for each AARS-gen (including 'g8'). Analysing the normalized read counts for each gene with a Boxplot shows that some of the genes have an outlier, corresponding to some sample. Also the gene 'g8' has one outlier. And this outlier is exactly correspondig to the sample 's3'. So for gene 'g8': the normalized read count of sample 's3' seems to be indeed _different_ than the normalized read counts of the other samples. I thought a Boxplot is not the best way to be really sure if this outlier is indeed _different_ from the other read counts or possible by chance. So I want to apply differential expression analysis. Therefore I provided DESeq with the read counts of all AARS genes (just to have not only one gene for each sample, which would give no information about the distribution of the counts).


1.) I treated 's3' as one sample with no replicates and all other samples ('s1','s2','s4',...,'s16') as replicates for a second sample under consideration. -> is this valid?
(In fact, the variance over all the other 15 tissues should be at least as high as the variance for 15 biological replicates of the same tissue (?!). So observing that gene 'g8' is differentially expressed between 's3' and a second sample comprising the 15 other tissues, should have at least the same statistical power as I would have compared 's3' against a particular 15-times-replicated tissue!?

IF 1.) is valid, than I have a few technical questions about the implementation ...

2.) is it enough to provide DESeq with the counts of only the 37 AARS Genes? (if _not_ -> how "much" genes should I take into account? (Consider that I cannot use _all_ genes, as many of them would fluctuate to much between the 15 samples, which results in that no gene would be differentially expressed (because of the much to high variation in the "replicated" sample)

3.) In DESeq: Is the following configuration a good choice for the dispersion estimation? (providing as little as possible assumptions, I hope)
Code:
estimateDispersions(cds,method="blind",sharingMode="fit-only",fitType="local")
4.) deleted

5.) Would it be a great improvement to use technical replicates? (either 50bp paired end or 100bp stranded)

I hope everything is clear now.
Many thanks again!
hanshart is offline   Reply With Quote
Old 03-07-2012, 03:56 AM   #8
hanshart
Member
 
Location: Germany

Join Date: Nov 2011
Posts: 27
Default

@ Jean
1.) I also observed the hiccups you mentioned in cufflinks. DESeq didn't show any problems with the configuration I stated above.

4.) As you said, I can only observe the "big obvious differences" with this method. But as I'm only interested to make sure that the observation is significant, it should be ok, to do so !?

5.) thanks to make this clear to me

@ ETHANol
Thanks for this hint. But I simply do not have any biological replicates. I have just the data I have and for my diploma thesis I am, beside other tasks, supposed to analyse if this gene is differentially expressed. I don't know if my observation is even not enough for a statement in the thesis?

Thanks to all again
hanshart is offline   Reply With Quote
Old 03-07-2012, 04:22 AM   #9
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

In short, there is not much statistics you can do. If a gene sticks out in tissue X derived from subject Y, it may be because this gene is special for this tissue or because this gene is special for this person. You are willing to always attribute the difference to the tissue rather than to the person.

In other words, you are bargaining on the assumption that differences in gene expression are much stronger between different tissues of the same subject than between samples from the same tissue taken from different subjects. This is a risky assumption that is certainly untenable for quite a few genes, and due to your lack of replicates there is no way whatsoever to check it.

Last edited by Simon Anders; 03-07-2012 at 04:22 AM. Reason: corrected
Simon Anders is offline   Reply With Quote
Old 03-07-2012, 04:29 AM   #10
hanshart
Member
 
Location: Germany

Join Date: Nov 2011
Posts: 27
Default

Quote:
Originally Posted by Simon Anders View Post
In other words, you are bargaining on the assumption that differences in gene expression are much stronger between different tissues of the same subject than between samples from the same tissue taken from different subjects. This is a risky assumption that is certainly untenable for quite a few genes, and due to your lack of replicates there is no way whatsoever to check it.
Thank you Simon for this precise answer. I never have seen this fact (don't ask me why). I totally agree with you and should ask my advisor how to deal with this problem
hanshart is offline   Reply With Quote
Old 03-07-2012, 11:42 AM   #11
Jean
Member
 
Location: Canada

Join Date: Nov 2008
Posts: 37
Default

Quote:
@ Jean
1.) I also observed the hiccups you mentioned in cufflinks. DESeq didn't show any problems with the configuration I stated above.
By hiccups I meant statistical hiccups. When the variation in your replicates is very high, you will violate some of the assumptions for the differential expression test. Whatever method/analysis you apply you'll want to really look at your data (does it fit the assumptions and models used, do the called and uncalled genes make biological sense).

Everyone else had good advice. Unfortunately you don't have an ideal dataset for the questions you want to ask.
Jean is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:11 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO