SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
rethinking log-log RPKM plots BAMseek RNA Sequencing 6 12-22-2011 11:44 AM
Cufflinks: a problem with the FPKM ratios? gfmgfm Bioinformatics 9 10-02-2011 04:09 AM
Strange Ti/Tv ratios in GATK VariantEval report ForeignMan Genomic Resequencing 0 06-24-2011 05:25 AM
tophat log files? arrchi RNA Sequencing 0 06-02-2011 11:10 AM
bias in mapped forward/reverse read ratios dvh Illumina/Solexa 8 10-02-2008 08:32 AM

Reply
 
Thread Tools
Old 07-27-2012, 02:54 AM   #1
immatos
Junior Member
 
Location: Lisbon

Join Date: Jul 2012
Posts: 1
Question Log ratios and/or ratios of the log's

Hello all.
I have a very basic question about the analisys of my RNA-seq data...can some one help me out?

Usually we work with the log ratio of RPKM’s, log2(X1 RPKM/ X2 RPKM).
I would like to know if instead of using log ratios I can aply log to all my RPKM data sets and from then use this log2(rpkm) values for the rest of the analysis. So, basically istead of doing log ratios, do the ratio between log2(RPKM). Example log2(X1 RPKM)= 5 and log2(X2 RPKM) = 5 so ratio=1 meaning equal expression. This is completely wrong and I have to use the log ratios or can I do this way also?


Thanks
immatos is offline   Reply With Quote
Old 07-27-2012, 03:53 AM   #2
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

log(a / b) = log(a) - log(b),

so you can do what you're proposing, but you have to substract the two logs, not divide them.
ffinkernagel is offline   Reply With Quote
Old 01-30-2013, 12:52 PM   #3
aprice67
Member
 
Location: New York

Join Date: Nov 2012
Posts: 49
Default

Quote:
Originally Posted by ffinkernagel View Post
log(a / b) = log(a) - log(b),

so you can do what you're proposing, but you have to substract the two logs, not divide them.

I don't understand how that changes the problem. Log(0) is undefined. In that case should I just use 0? If so how will the log(25) - 0 results be different from a result like log(25) - log(x!=0).
aprice67 is offline   Reply With Quote
Old 01-30-2013, 01:17 PM   #4
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

This mabye a dumb question and I needs me some splainin', but ...

I know folks always did log transforms on old school array data (because it better reflected the known inputs in spike in data).

But ... why do people do log transforms on RPKMs from RNA seq data ? Isn't 2X reads on a gene really mean 2X expression?
Richard Finney is offline   Reply With Quote
Old 01-31-2013, 09:28 AM   #5
aprice67
Member
 
Location: New York

Join Date: Nov 2012
Posts: 49
Default

Quote:
Originally Posted by Richard Finney View Post
This mabye a dumb question and I needs me some splainin', but ...

I know folks always did log transforms on old school array data (because it better reflected the known inputs in spike in data).

But ... why do people do log transforms on RPKMs from RNA seq data ? Isn't 2X reads on a gene really mean 2X expression?

I'm not doing a typical experiment. I'm using RNA-Seq to predict transcriptome secondary structure in bacteria by the PARS method outlined in the 2010 nature paper that can be found here: http://genie.weizmann.ac.il/pubs/PARS10/index.html

I run two protocols, one with a digestion that cuts at single stranded positions and one with a digestion that cuts at double stranded positions.

To say if any position is in a secondary structure requires that I have counts of the how many reads start at each position, so it isn't measured in RPKM or any other measure usually used for differential expression. Now I have cleaned, aligned, and produced files containing counts for number of reads starting at each position.

What I measure is the log(protocol1/protocol2) counts at each position to determine if there is secondary structure at each point. However when there is a case that is log(25 single strand/0 double strand) i cant accurately compute a score for that position.

I still don't have a workable solution to this. I've been in the literature and found that there are some statistical methods that can apply to DE on a per gene basis that I may be able to apply to a per position basis if i tweak the statistics a bit, but I'd rather go with something that has been peer reviewed and tested if ya know what I mean.
aprice67 is offline   Reply With Quote
Old 01-31-2013, 11:25 AM   #6
aprice67
Member
 
Location: New York

Join Date: Nov 2012
Posts: 49
Default

What I plan to try, is to get distributions of read counts for both protocols and use that to compute if a 0 is significant or not.
aprice67 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:02 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO