SEQanswers

Go Back   SEQanswers > Introductions



Similar Threads
Thread Thread Starter Forum Replies Last Post
Peak annotation using HOMER Erinyes Bioinformatics 0 01-15-2016 02:37 AM
Homer makeTagDirectory error with hg19 micron Bioinformatics 0 11-06-2015 07:58 AM
Annotating dog coordinates using HOMER ajsn6c Bioinformatics 2 10-20-2015 05:12 PM
HOMER annotation & GO gcorre Bioinformatics 0 12-08-2014 07:47 AM
Indexing in Homer kushald Bioinformatics 0 03-20-2013 10:45 PM

Reply
 
Thread Tools
Old 02-10-2016, 10:47 AM   #1
blumroy
Junior Member
 
Location: NY

Join Date: Aug 2013
Posts: 1
Default Homer / getDiffExpression.pl / edgeR

To complement our cuffdiff analysis we thought to take a parallel route using Homer:

To do so we basically followed this protocol:

makeTagDirectory pairend.sam -format sam -fragLength 250 -removeSpikes 10000 5 -genome /data//projects/mm9/ref.fasta -checkGC -tbp 1 -sspe -flip



Then, we run analyzeRepeats.pl to make read summaries across genes:
analyzeRepeats.pl rna mm9 -noadj -count genes -strand + -d MIa9_1 MIa9_2 > Scramble_vs_Target_analyzeRepeats.txt

We run this step as recommended by the Homer tutorial with the unnormalized data so the edgeR analysis, which is the following step, would be able to use an unnormalized data.


Next step we run:
getDiffExpression.pl Scramble_vs_Target_analyzeRepeats.txt MIa9_1 MIa9_2 -repeats -edgeR > Scramble_vs_Target_analyzeRepeats_edgeR.txt


After testing the execution of this command and checking closely the perl script for getDiffExpression we run into the following understanding:
The getDiffExpression step takes unnormalized data and transmits it to edgeR so edgeR can calculate log2FDR, p-value, and FDR. off the original unnormalized data. edgeR creates a model for the data and uses pseudo values in cases of zero values. edgeR finally outputs the log2FDR, p-value, and FDR as calculated after transferring the original data into a modeled space.

Now, the output of getDiffExpression.pl is composed of two parts:
The three columns at the very right end, which contain the edgeR's output (Log2FC; p-value, FDR).
AND the output also contains - the rest of the table (all the left side-end columns) which forms a matrix. The matrix is of normalized values. BUT these are not normalized values created by edgeR, but instead - normalized values that were created as part of getDiffExpression.pl script execution.
This data is only presented in the final output table next to the edgeR output (three right columns) and is aimed only for the user to see, but this normalized values were NOT actually been used as part of the edgeR analysis (unlike what I originally thought...).

My issue is that we are currently not sure in what manner the data which presented in this table was actually normalized.

Please see this link for relevant indications:
http://homer.salk.edu/homer/ngs/analyzeRNA.html
Look for explanatory indication such as the following which exactly states what I've described above:
"This command will basically add 3 columns to the end of your original file per pairwise comparison. Most important are the log2 fold change and the p-value and FDR values. The raw read counts will be normalized in this file based on the total reads in each column so that the values are more easily to compare when looking through the results."

However, our attempts to re-run the previous step again - (run analyzeRepeats.pl) - using this time the normalization options in order to obtain the normalized values that appear in the output from getDiffExpression.pl did not prove to be successful...
We tried:
analyzeRepeats.pl rna mm9 -noadj -count genes -strand + -d MIa9_1 MIa9_2 -normMatrix 1e6

And also,
analyzeRepeats.pl rna mm9 -noadj -count genes -strand + -d MIa9_1 MIa9_2 -norm 1e7

but the normalized values that we obtained were quite different from the ones that were obtained after running the getDiffExpression.pl script...

We think that it would become important to know how the data that appears in the output table from getDiffExpression.pl was actually normalized. Was it per all reads? was it per all reads in each individual library? ...

If any of the users have some more information and ideas regarding this matter we'll be glad to hear!
Alternatively, we could ignore the normalized values content of the output table, and refer only to the results of edgeR that show the statistics.

It would be interesting to hear how other users decided to normalize their rna-seq data using this approach. Which selections were finally chosen.

Thanks a lot!

Roy
blumroy is offline   Reply With Quote
Reply

Tags
analyzerepeats.pl, edger, getdiffexpression.pl, homer, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:45 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO