SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-Seq: RSEQtools: A modular framework to analyze RNA-Seq data using compact, anonym Newsbot! Literature Watch 1 02-27-2013 03:16 AM
RNA-Seq: Full-length transcriptome assembly from RNA-Seq data without a reference gen Newsbot! Literature Watch 7 10-26-2011 05:37 AM
RNA-Seq: Detection of splicing events and multiread locations from RNA-seq data based Newsbot! Literature Watch 0 10-26-2011 02:50 AM
RNA-Seq: Composite Transcriptome Assembly of RNA-seq data in a Sheep Model for Delaye Newsbot! Literature Watch 0 03-26-2011 02:02 AM
RNA-Seq: SAW: A Method to Identify Splicing Events from RNA-Seq Data Based on Splicin Newsbot! Literature Watch 0 08-14-2010 02:00 AM

Reply
 
Thread Tools
Old 08-19-2010, 06:54 PM   #1
kentnf
Member
 
Location: Ithaca

Join Date: Jan 2009
Posts: 26
Default the correlation of RNA-seq data.

I generated an correlation value table for all biological replicates.
Most of them have good correlation 0.9;
2-3 set around 0.8;
6-7 sets are < 0.8.

How about the data? How to improve it? Can I use all of them for differential expression analysis?
Thank you!
kentnf is offline   Reply With Quote
Old 08-20-2010, 08:17 AM   #2
mrawlins
Member
 
Location: Retirement - Not working with bioinformatics anymore.

Join Date: Apr 2010
Posts: 63
Default

The nature of taking samples and the techniques makes it possible for some biological (or even technical) replicates to seem dissimilar for any given metric (p-values, correlations, etc.)

I would use any or all of them for DE analysis. If you account for the variance between replicates in your DE analysis you will reduce the number of false positives. That gets into the question of how you intend to determine differential expression. Some methods will allow you to use that replicate variance, while others won't. Either way, it shouldn't hurt you to include them all.
mrawlins is offline   Reply With Quote
Old 08-22-2010, 05:42 PM   #3
frankyue50
Member
 
Location: CA

Join Date: Nov 2008
Posts: 34
Default

For me, most replicates have correlation > 0.95. Less than 0.8 is too low. You can try qq normalization to see if it improves.

Quote:
Originally Posted by kentnf View Post
I generated an correlation value table for all biological replicates.
Most of them have good correlation 0.9;
2-3 set around 0.8;
6-7 sets are < 0.8.

How about the data? How to improve it? Can I use all of them for differential expression analysis?
Thank you!
frankyue50 is offline   Reply With Quote
Old 07-16-2012, 08:46 AM   #4
vyellapa
Member
 
Location: phoenix

Join Date: Oct 2011
Posts: 59
Default

Could any of you post the method/code for how you're finding the correlation values.
I did a simple cor(FPKM_1,FPKM_2) on the FPKM values from cufflinks on biological replicates and got a value of .2.

It would be really useful to know how you guys are doing it.
vyellapa is offline   Reply With Quote
Old 07-16-2012, 11:56 AM   #5
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

A simple cor() (in R, I assume) can get very skewed by large outlier values such as those that can appear in Cufflinks output. You can try a Spearman correlation cor(FPKM_1, FPKM_2, method="spearman"), or take logs of the FPKM values, or do an arcsine transformation (http://bridgecrest.blogspot.fi/2011/...technical.html)
kopi-o is offline   Reply With Quote
Old 07-17-2012, 09:19 AM   #6
vyellapa
Member
 
Location: phoenix

Join Date: Oct 2011
Posts: 59
Default

Quote:
Originally Posted by kopi-o View Post
A simple cor() (in R, I assume) can get very skewed by large outlier values such as those that can appear in Cufflinks output. You can try a Spearman correlation cor(FPKM_1, FPKM_2, method="spearman"), or take logs of the FPKM values, or do an arcsine transformation (http://bridgecrest.blogspot.fi/2011/...technical.html)
I tried the spearman method which increased my correaltion value to ~.4. I did remove the outliers, ie. all values with exponential values such as 8.1667e-05, 6.162e+05, etc before I ran the correlation test. I also tried the cor of logs(after adding 1's to elimintae 0's) ie. cor(log10(FPKM_1), log10(FPKM_2)) but it did not show any better correlation value. The arcsine method is something I have not tried yet but seems interesting.

A scatter plot and correlation values using count data seemed good though.
http://seqanswers.com/forums/showthr...ed=1#post79112
vyellapa is offline   Reply With Quote
Old 07-17-2012, 10:08 AM   #7
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

If you are only interested in the gene-level FPKMs, you could always divide the counts by gene lengths and the number of mapped reads (divided by a million) which would then lead to well-correlated FPKMs (since you are only scaling the values linearly). If you are interested in isoform-level FPKMs you obviously need something like Cufflinks, but you could try alternatives like RSEM, MISO (etc) to check how they perform.
kopi-o is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:55 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO