SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Within Sample Correlation of 2 genes - Rna-seq saint_667 Bioinformatics 3 02-07-2014 12:53 AM
RNA-Seq: Canonical correlation analysis for RNA-seq co-expression networks. Newsbot! Literature Watch 0 03-06-2013 02:00 AM
the correlation of RNA-seq data. kentnf Bioinformatics 6 07-17-2012 10:08 AM

Reply
 
Thread Tools
Old 08-05-2014, 04:52 PM   #1
mihuzx
Member
 
Location: Beijing China

Join Date: Apr 2013
Posts: 20
Unhappy RNA-seq bio-replication with low correlation

Hi!
I sequeced two biological replicates for one condition with Hiseq 2000 platfom.
but the typical R(Pearson) correlation of gene expression(Raw Count) between two biological replicates is only 0.93, ie the R2 only about 0.87.
Can I use these 2 samples to do the differencial analysis?
Any suggestion for how to use this to call DE genes?
Or some recommend readings are also very helpful.

Thanks all.
mihuzx is offline   Reply With Quote
Old 08-05-2014, 06:03 PM   #2
mikep
Member
 
Location: Singapore

Join Date: Feb 2011
Posts: 45
Default

Raw counts don't follow a linear distribution. Use Spearman, not Pearson. And discard any genes with 0 counts. Actually, I'd probably discard genes with < 10.

Secondly, is this human data, or in other words are your biological replicates sampled from different individuals with a heterogeneous genetic background?
mikep is offline   Reply With Quote
Old 08-05-2014, 10:47 PM   #3
mihuzx
Member
 
Location: Beijing China

Join Date: Apr 2013
Posts: 20
Default

Quote:
Originally Posted by mikep View Post
Raw counts don't follow a linear distribution. Use Spearman, not Pearson. And discard any genes with 0 counts. Actually, I'd probably discard genes with < 10.

Secondly, is this human data, or in other words are your biological replicates sampled from different individuals with a heterogeneous genetic background?
thank you for your advice,
I removed all genes < 10 and calculated the spearman correlation, but it still only about 0.93.
and I calculate spearson crrelation with genes < 1RPKM ,it didn't change.
now I wonder if I can use this to call DE genes. and how much it affect the result.
or if I use the data ,how can I make the least differrence.
mihuzx is offline   Reply With Quote
Old 08-05-2014, 11:53 PM   #4
velt
Member
 
Location: Paris

Join Date: Jun 2013
Posts: 10
Default

The Pearson and Spearman correlation coefficients are not well suited to RNA-seq count data. Indeed, we want to know if expression values are the same between two samples (linearity => Pearson coefficient), not just whether they have an increasing or decreasing trend (Spearman coefficient). But, Pearsonís r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count).

I think it is difficult, from these coefficients, to determine if the samples are good replicates or not.

I advise you to read this publication and to use the SERE coefficient, which is well suited to the comparison of RNA-seq samples:

http://www.ncbi.nlm.nih.gov/pubmed/23033915

A score of 1 indicating faithful replication. And more the score is high, more the samples are different. I use this coefficient to explore my data.

Last edited by velt; 08-06-2014 at 12:08 AM.
velt is offline   Reply With Quote
Old 08-06-2014, 12:59 AM   #5
mikep
Member
 
Location: Singapore

Join Date: Feb 2011
Posts: 45
Default

You didn't mention your sample source. If it is different people then 0.93 might be as good as it gets. I get around 0.95 on my data.

Another option (for future use) is to use a spikein like ERCC, you can then correlate counts independent of biological variabilty.

As for DE, my advice is suck it and see.

Finally Velt, nice call. Assimilating SERE into our pipeline in 3...2...1...
mikep is offline   Reply With Quote
Old 08-06-2014, 03:17 AM   #6
mihuzx
Member
 
Location: Beijing China

Join Date: Apr 2013
Posts: 20
Default

Quote:
Originally Posted by velt View Post
The Pearson and Spearman correlation coefficients are not well suited to RNA-seq count data. Indeed, we want to know if expression values are the same between two samples (linearity => Pearson coefficient), not just whether they have an increasing or decreasing trend (Spearman coefficient). But, Pearsonís r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count).

I think it is difficult, from these coefficients, to determine if the samples are good replicates or not.

I advise you to read this publication and to use the SERE coefficient, which is well suited to the comparison of RNA-seq samples:

http://www.ncbi.nlm.nih.gov/pubmed/23033915

A score of 1 indicating faithful replication. And more the score is high, more the samples are different. I use this coefficient to explore my data.
hi velt,
thank you very much.
I have tried it with my data. the SERE score is 5.8.
and another pair replication is about 3.3
is this too high? or any sugguestions ?
by the way, I think this standard is really strict.
mihuzx is offline   Reply With Quote
Old 08-06-2014, 04:01 AM   #7
mbblack
Senior Member
 
Location: Research Triangle Park, NC

Join Date: Aug 2009
Posts: 245
Default

Well, your single greatest source of variation when it comes to differential expression is biological variation amongst individuals in your population. So if these were two different individuals, then your observed correlations might not be far off, at least when looking only at raw read counts.

Also, did you have equal or near equal read depth for each sample? If you had large differences in read depth across the two samples, then raw counts will also vary a great deal because of that.

Honestly, I would not worry about such differences in raw counts between biological replicates. That sort of variability is the very reason you use biological replication, so you can compute a robust mean population response. Individuals will inherently vary, often a great deal, in raw expression estimates.

How do your normalized read counts compare for these two samples? That is by far a more meaningful comparison than raw counts. Also, basing a comparison on an N of just 2 can be very misleading, as you have no idea how those two biological samples fall out in terms of the range of variation in expression for your population.
__________________
Michael Black, Ph.D.
ScitoVation LLC. RTP, N.C.

Last edited by mbblack; 08-06-2014 at 04:09 AM.
mbblack is offline   Reply With Quote
Old 09-21-2015, 11:11 AM   #8
bkellman16
Junior Member
 
Location: La Jolla

Join Date: Apr 2015
Posts: 1
Default SERE over log transform

My understanding is that log(poisson) [log(counts) in this case] will approximate a normal distribution thereby achieving linearity. Is there a benefit to using SERE over using the pearson correlation of log transformed counts?
bkellman16 is offline   Reply With Quote
Reply

Tags
rnaseq low replicates

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:23 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO