SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
using Cuffdiff with biological replicates Jane M RNA Sequencing 0 09-01-2011 01:42 AM
overdispersion and biological replicates shilez Bioinformatics 3 08-29-2011 08:43 AM
cuffdiff with biological replicates PFS Bioinformatics 1 06-14-2011 07:51 PM
Inconsistency between biological replicates Nicholas_ Bioinformatics 1 04-06-2011 04:18 AM
ChIP-Seq Biological Replicates LouDore General 0 08-11-2009 12:35 PM

Reply
 
Thread Tools
Old 11-16-2010, 04:00 PM   #1
vpp605
Junior Member
 
Location: Saskatchewan

Join Date: Feb 2009
Posts: 6
Default Biological replicates for RNA-seq

Hi all,

I'm not sure if this has already been discussed elsewhere, but after looking around I didn't find anything directly answering my question, so if it has already been discussed, sorry for the repeat and please point me in the right direction!

I'm going to be doing RNA-seq for DE analysis of a bacteria growing in two different environments. I'm trying to determine the number of biological replicates that would be required to provide statistically-meaningful results. I saw that technical replicates aren't exactly necessary, and with cost of course being an issue, we were hoping to run 2 biological replicates of each environment, but we don't want to find out afterwards that we should have included more. We will be multiplexing our data, and are using illumina technology.

I'm a microbiologist, with little background in stats, so any input or thoughts on this would be greatly appreciated!

Thanks in advance.

- Vanessa
vpp605 is offline   Reply With Quote
Old 11-16-2010, 04:41 PM   #2
mrawlins
Member
 
Location: Retirement - Not working with bioinformatics anymore.

Join Date: Apr 2010
Posts: 63
Default

This will depend on what statistics you use to determine statistical significance.

If you are using something like a T-test, you really want as many replicates as possible. You can't do these tests without at least 3 replicates. Depth isn't particularly useful here, but more replicates are. It's an oversimplified approach to the statistics, IMO.

If you use something like Fisher's Exact Test (hypergeometric or poisson distribution) then two biological replicates should be reasonable. You can actually mix multiple biological replicates in the same barcoded sample and it will likely give the same answer (since the reads from replicates are just added together). In this case the read depth is more useful than additional replicates. This method is what we use, but a number of people (probably more knowledgeable than me) have raised concerns with it.

If you use something like DESeq I don't have an answer for you, because I don't know anything about their statistical models. My guess is that two biological replicates would be fine for this type of analysis, though you may need 3.

I think this is a useful question, though, and I am interested to see what others think.
mrawlins is offline   Reply With Quote
Old 11-17-2010, 12:21 AM   #3
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

like mrawlins, we've mixed multiple biological replicates in the same barcoded sample. but we're working on small rna seq.

It works great
NicoBxl is offline   Reply With Quote
Old 11-24-2010, 01:31 PM   #4
ecofriendly
Junior Member
 
Location: University of Wisconsin, Madison

Join Date: Nov 2010
Posts: 9
Default

mrawlins, can you suggest an article that explains the difference between these different statistical tests and why they require different numbers of biological replicates of RNA-seq data to be as powerful?

I've been reading the Cufflinks paper and trying to understand their statistical model used to analyze RNA-seq data, as written in the Supplementary Methods (Trapnell et al., 2010, in Nature Biotechnology). Can someone explain it to me in simple terms with as little math as possible ?
ecofriendly is offline   Reply With Quote
Old 11-24-2010, 07:26 PM   #5
Jeremy
Senior Member
 
Location: Pathum Thani, Thailand

Join Date: Nov 2009
Posts: 190
Default

By "mix multiple biological replicates in the same barcoded sample", do you mean each sample has its own barcode and is pooled together and sequenced. Or do you mean that multiple samples are pooled together, given the same barcode and then sequenced?

If you mix samples and give them the same barcode, how do you calculate variance?
Jeremy is offline   Reply With Quote
Old 11-24-2010, 07:36 PM   #6
golharam
Member
 
Location: Philadelphia, PA

Join Date: Dec 2009
Posts: 55
Default

We always recommend at least 3 biological replicates. If you do two, how do you know one isn't bad? If you do three, and one is bad, you can at least eliminate it and continue.

I think 4 would be ideal especially from a statistical standpoint, but that's not always possible because of cost. However, talk with your sequencing core facility to determine if barcoding multiple samples is an option. If it is, you may be able to sequence more samples for the same cost.

Take a look at this paper as well:

Statistical Design and Analysis of RNA Sequencing Data
Genetics, Vol. 185, No. 2. (1 June 2010), pp. 405-416.
golharam is offline   Reply With Quote
Old 11-25-2010, 03:26 AM   #7
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

I've explained this in a number of posts before, so I just repeat the core points.

- If you use FIsher's exact test or something similar, you don't need any replicates because it cannot accommodate for them. The results, though, will be wrong, especially for strongly expressed genes.

This is because Fisher's exact test tests whether two samples differ in the concentration of a given transcript. This is, however, not the question you want to ask. What you want to know is whether the difference between two samples with different treatment is stronger than what you expect to see between two samples that are replicates, because otherwise, you cannot attribute the difference to the treatment.

This criticism also applies to cuffdiff, at least to the version described in the paper. (There is a new version of cuffdiff that allows for biological replicates but there is no documentation on its method yet, and hence it is unclear whether it now asks the relevant question.)

- If you have many replicates, use a t test.

- With only two or three replicates, you need to pool across genes, i.e., assume that similar genes have similar variance. Our DESeq package assumes that genes with similar expression strength have similar variance, and so pools information from these in order to get a reasonable estimate of biological variability, which is then used for the test.

Simon
Simon Anders is offline   Reply With Quote
Old 11-29-2010, 06:04 AM   #8
vpp605
Junior Member
 
Location: Saskatchewan

Join Date: Feb 2009
Posts: 6
Default

Thank you everyone for the replies!!

golharam - thank you for pointing me to that paper!

Simon - sorry for making you repeat everything over again -- if there is a "better" post for me to look at, please point me in that direction!
vpp605 is offline   Reply With Quote
Old 11-29-2010, 09:03 AM   #9
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by vpp605 View Post
Simon - sorry for making you repeat everything over again -- if there is a "better" post for me to look at, please point me in that direction!
No Problem. We had a couple of discussions on the subject of replicates, but they are spread over several threads, so they may be hard to find.

Here are a few of them:

http://seqanswers.com/forums/showthread.php?t=4349
http://seqanswers.com/forums/showthread.php?t=5180
http://seqanswers.com/forums/showthread.php?t=5248
http://seqanswers.com/forums/showthread.php?t=7108

Note that some of the mentioned software packages have got new functionality quite recently, i.e., some arguments in these threads about their limitations are out of date.

Simon
Simon Anders is offline   Reply With Quote
Old 12-07-2010, 07:47 AM   #10
adumitri
Member
 
Location: Cambridge, MA

Join Date: Jan 2010
Posts: 27
Default Cuffdiff - differential expression analysis between groups of samples

Quote:
This criticism also applies to cuffdiff, at least to the version described in the paper. (There is a new version of cuffdiff that allows for biological replicates but there is no documentation on its method yet, and hence it is unclear whether it now asks the relevant question.)
Hello,

Simon mentioned the existence of a new version of Cuffdiff that allows for biological replicates. Does anyone know anything else about this new version? Will it be released soon or is it already available somewhere?

Given the currently available Cuffdiff version (v0.9.3), is there any viable workaround to analyze groups of samples (e.g. control samples compared with treated samples)?

Thank you,
Alexandra
adumitri is offline   Reply With Quote
Old 03-03-2011, 02:23 PM   #11
jminich444
Junior Member
 
Location: San Diego, CA U.S.A

Join Date: Feb 2011
Posts: 3
Default

http://www.genetics.org/cgi/content/abstract/185/2/405
jminich444 is offline   Reply With Quote
Old 08-28-2014, 02:59 AM   #12
eastasiasnow
Junior Member
 
Location: China

Join Date: Jan 2014
Posts: 8
Default

Quote:
Originally Posted by Jeremy View Post
By "mix multiple biological replicates in the same barcoded sample", do you mean each sample has its own barcode and is pooled together and sequenced. Or do you mean that multiple samples are pooled together, given the same barcode and then sequenced?

If you mix samples and give them the same barcode, how do you calculate variance?
hi Jeremy, have you got the answer of your concern?
eastasiasnow is offline   Reply With Quote
Old 08-28-2014, 08:28 PM   #13
Jeremy
Senior Member
 
Location: Pathum Thani, Thailand

Join Date: Nov 2009
Posts: 190
Default

Quote:
Originally Posted by eastasiasnow View Post
hi Jeremy, have you got the answer of your concern?
Based on my quote marks I think I was asking the OP what they meant and then pointing out (via rhetorical question) that you can't get within group variance using a pooled approach. But it was so long ago I can't remember and the phrase that I quoted seems to no longer be there.
Jeremy is offline   Reply With Quote
Old 08-28-2014, 08:48 PM   #14
eastasiasnow
Junior Member
 
Location: China

Join Date: Jan 2014
Posts: 8
Default

Quote:
Originally Posted by Jeremy View Post
Based on my quote marks I think I was asking the OP what they meant and then pointing out (via rhetorical question) that you can't get within group variance using a pooled approach. But it was so long ago I can't remember and the phrase that I quoted seems to no longer be there.
yeah, pooling biological replicate samples will lose group variance. but could I use this design to do the following analysis? do people accept this design when I apply it in my paper? if so, what kind of tools can do this?

thank you very much.
eastasiasnow is offline   Reply With Quote
Old 08-28-2014, 08:53 PM   #15
Jeremy
Senior Member
 
Location: Pathum Thani, Thailand

Join Date: Nov 2009
Posts: 190
Default

Quote:
Originally Posted by eastasiasnow View Post
yeah, pooling biological replicate samples will lose group variance. but could I use this design to do the following analysis? do people accept this design when I apply it in my paper? if so, what kind of tools can do this?

thank you very much.
For differential expression analysis, I wouldn't. That design would have a lot of trouble getting published. For almost the same price you can sequence biological replicates that have been individually tagged and get results that are far more biologically relevant.
Jeremy is offline   Reply With Quote
Old 08-29-2014, 05:30 AM   #16
eastasiasnow
Junior Member
 
Location: China

Join Date: Jan 2014
Posts: 8
Default

Quote:
Originally Posted by Jeremy View Post
For differential expression analysis, I wouldn't. That design would have a lot of trouble getting published. For almost the same price you can sequence biological replicates that have been individually tagged and get results that are far more biologically relevant.
thank you, jeremy, now I am assure what to do.
eastasiasnow is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:34 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO