Seqanswers Leaderboard Ad

**sjm** · 12-17-2009, 09:22 AM

Well, maybe everyone isn't using biological replicates, but I certainly am in experiments involving RNASeq of nontransgenic and transgenic mouse tissues...!
I could send you RPKM expression values (calculated via Tophat/Cufflinks) for n=4 replicates, 2 treatment groups, for a subset of detected transcripts or for everything we found. Post back if you're interested and we can figure out a way to send data (I don't have a good ftp system here, so probably e-mail and compressed files will be the go).

**sjm** · 12-17-2009, 09:23 AM

By the way, you may also be interested to know that we've multiplexed 4 samples per Illumina GAII lane (i.e. barcoding system), but haven't tried examining single samples per lane.

**anar** · 12-17-2009, 08:00 PM

Hi sjm,

Wow that would be super, if you wouldn't mind sharing the data I would appreciate it very much!

And even better that you've multiplexed 4 samples/lane as that removes any lane effects

I think I would like to obtain RPKM values for all genes, if you are open to that. I would like to plot pooled RPKM vs pooled standard deviation for all genes, to see how variability changes for lowly expressed genes compared with highly expressed genes.

Look forward to hearing from you. Thanks!

**sjm** · 12-18-2009, 02:12 PM

Great - let's work on getting you some data to play with. These are data that I am working up for publication, so if it doesn't mess up your calculations and you're OK with my data being 'anonymous', I would prefer to not send real gene names/symbols with the RPKMs. That way it won't be obvious which species, transgenes or tissues were used for this experiment. (A little paranoid, I know, but my PI would be horrified if these data were to 'leak' in an understandable form, albeit by some really remote chance...) You'll still be able to monitor variability on lowly vs highly-expressed genes.

Write back to me at s.m.a.t.k.o.v.i.AT.d=o=m=DOT=w=u=s=t=l=DOT=e=d=u and we can go from there.

**lifeng.tian** · 02-12-2010, 12:13 PM

Technical variation with RPKM calculated via TopHat/Cufflinks

Hi, sjm,

When I compare my tech replicates data, on the M-A plot, TopHat/Cufflinks yields quite large variation. I've attached the M-A plot.

Do you have tech replicates in your exp? Are there relatively large variation on the M-A plot with TopHat? Cause with our own RPKM scripts
we see very small variations. I would appreciate your commends/experience on this.

Thanks!

Lifeng

Originally posted by sjm View Post

Well, maybe everyone isn't using biological replicates, but I certainly am in experiments involving RNASeq of nontransgenic and transgenic mouse tissues...!
I could send you RPKM expression values (calculated via Tophat/Cufflinks) for n=4 replicates, 2 treatment groups, for a subset of detected transcripts or for everything we found. Post back if you're interested and we can figure out a way to send data (I don't have a good ftp system here, so probably e-mail and compressed files will be the go).

Attached Files

MA_tophat.jpg (11.7 KB, 72 views)

**blackgore** · 03-16-2010, 03:08 AM

I also have replicates for some RNA-Seq data that I'd like to group together, for the purposes of a differential expression test. However, in the Cufflinks manual I've only been able to find information on running "Lane vs Lane" type comparisions rather than "Group vs Group".

Can you please describe how to use TopHat and Cufflinks when replicates are involved?

**Simon Anders** · 03-16-2010, 04:04 AM

Hi

Typically, the noise between technical replicates is barely above the shot noise level (i.e., the noise predicted by the Poisson distribution) while the noise between biological replicates is much larger. This is what Nagalakshmi et al. have already shown in their 2008 Science paper. Mortazavi et al. (Nature Methods, 2008) have also observed shot-noise only between techniccal replicates, so I suppose it is save to assume that any noise significantly exceeding shot noise points to a problem in library preparation.

However, you won't be able to see this from a cufflinks-derived MA-plot as Lifeng Tian has shown because (I assume) the A axis FPKM-scaled. However, to compare with the shot noise level, you should look at raw counts.

Our "DESeq" package allows to estimate variance from raw counts and compare with shot noise levels: http://www-huber.embl.de/users/anders/DESeq/

For more on the maths behind this, see our paper, which I've now made available as a preprint: http://precedings.nature.com/documents/4282/version/1

Cheers
Simon

**Simon Anders** · 03-16-2010, 04:20 AM

Another point concerning replicates. As they are expensive I recommend you keep the following points in mind:

Given that technical replicates vary at shot noise level, making two technical replicates is the same as sampling only one sample but twice as deep. Additional biological replicates, in contrary, give you not only more counts but also inform you on the variability between samples.

You need at least one pair of biological replicates to get an idea at all how strong your data varies from one sample to the next. Otherwise, you have no idea of knowing whether the observed difference between your samples of different conditions is due to the change in experimental condition, or whether a difference of the same magnitude would have been observed as well between two different samples under the same condition. This is the very reason why one needs replicates at all, and why it is flawed to just assume the variance to be as predicted by the Poisson distribution rather than to estimate it from biological replicates. (DEGSeq, for example, falls for this flaw.)

If you now compare biological replicates, you may or may not find that the variance is above shot-noise level. (See e.g. Figure 8 in our preprint that I referred to above, which illustrates this for the Nagalakshmi data.) If the biological variance is above shot noise level, sequencing deeper won't help as it reduces shot noise and you are limited by biological variance. On the other hand, if the variance between biological replicates does not exceed the shot noise level significantly, you are limited by shot noise, i.e., further biological replicates will not help any more than sampling the existing samples deeper (i.e., fill more lanes).

Hence, the comparison with shot noise is vital to answer the question how many replicates are needed.

A question orthogonal to this is whether you have enough replicates to average away the effects on covariates for which you cannot control. (See this thread for a discussion of this issue.)

Cheers
Simon

**blackgore** · 05-13-2010, 07:52 AM

I have a situation where I initially have two main groups (four replicate organisms in each), so that is pretty straightforward. However I would also like to do some within-group comparisons too - different tissue types, males vs females, etc.

even with a minimum of two replicates for each comparison... that's a lot of sequencing to do!

**sjm** · 06-08-2010, 04:01 PM

analysis of biological replicates (groups) via Tophat/Cufflinks

Hi,

Sorry that I haven't posted for a while. blackgore, for analysis of replicates, I did not use Tophat/Cufflinks for this part of the operation. Having produced a list of genes/transcripts and RPKM values for each sample, I imported these into MS Access (openoffice.org Base works too) and did a crosstab query to get a spreadsheet of RPKMs with genes in rows, samples in columns.

From there, calling differences between groups is up to you and your favorite stats package.

Does that help?

Originally posted by blackgore View Post

I also have replicates for some RNA-Seq data that I'd like to group together, for the purposes of a differential expression test. However, in the Cufflinks manual I've only been able to find information on running "Lane vs Lane" type comparisions rather than "Group vs Group".

Can you please describe how to use TopHat and Cufflinks when replicates are involved?

**A.Presson** · 07-13-2012, 03:49 PM

Hi Simon,
I'm wondering why you haven't created an R function for calculating power/sample size for rna-seq experiments based on your negative binomial model? Seems like it would be quite popular...

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Power calculations for expt design

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News