SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Different FPKM values of cufflinks and cuffdiff mrfox Bioinformatics 5 10-17-2012 02:10 PM
cufflinks FPKM >>> Cuffdiff FPKM peromhc Bioinformatics 6 10-17-2012 02:07 PM
Cufflinks and cuffdiff FPKM values combiochem Bioinformatics 12 10-14-2012 12:37 AM
FPKM calculated by Cuffdiff 1.3.0 VS 2.0.0 huijia Bioinformatics 0 06-14-2012 01:41 PM
Different FPKM values of cufflinks and cuffdiff in latest version mrfox Bioinformatics 1 11-23-2010 06:23 AM

Reply
 
Thread Tools
Old 12-27-2012, 08:18 AM   #1
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default Solution found: For Cuffdiff/links 2.0.2 Make cufflinks FPKM match Cuffdiff FPKM

This is in reference to cuffdiff v2.0.2 (3522)

Hi, I have been banging my head on why cuffdiff and cufflinks have different FPKM calculations. I finally stumbled upon the settings to make cufflinks match the FPKM values created with default settings of cuffdiff:

cuffdiff (defaults) mygenes.gtf mysample1.bam mysample1.bam
cufflinks --compatible-hits-norm --max-frag-multihits 1 -G mygenes.gtf mysample1.bam

I think there is a mistake in the cuffdiff options list, where --max-frag-multihits is listed as unlimited in the default state, but it is probably actually set to 1.

also, I add a pseudocount RPKM of 0.01 to both, otherwise cufflinks makes very small RPKMs, unlike cuffdiff, where it sets it to zero.



At least now it is possible for me to reproduce the FPKMs that cuffdiff makes on default settings. I hope the authors fix this.

Note the plot is on log2 scale for the RPKMs, but the match is exact when on the linear scale as well (with exception of a few - those genes with extremely small FPKMs ~ e-200 as calculated by cufflinks).

btw, I carefully compared all the options and saw that the options for --compatible-hits-norm and --total-hits-norm are flipped between the two programs:

CUFFLINKS: --compatible-hits-norm count hits compatible with reference RNAs only [ default: FALSE ]
CUFFDIFF: --compatible-hits-norm count hits compatible with reference RNAs only [ default: TRUE ]

CUFFLINKS: --total-hits-norm count all hits for normalization [ default: TRUE ]
CUFFDIFF: --total-hits-norm count all hits for normalization [ default: FALSE ]

Interestingly, there is another normalization only offered by Cuffdiff and set to true:

CUFFDIFF only: --geometric-norm use geometric mean normalization [ default: TRUE ]


Both Cufflinks and Cuffdiff say the default for max-frag-multihits is unlimited, but I have found setting it to 1 in Cufflinks brings the two into agreement:

CUFFLINKS --max-frag-multihits Maximum number of alignments allowed per fragment [ default: unlim ]
CUFFDIFF --max-frag-multihits Maximum number of alignments allowed per fragment [ default: unlim ]

Last edited by NGSfan; 12-27-2012 at 10:18 AM.
NGSfan is offline   Reply With Quote
Old 12-28-2012, 01:30 AM   #2
syfo
Just a member
 
Location: Southern EU

Join Date: Nov 2012
Posts: 103
Default

Thanks for sharing
syfo is offline   Reply With Quote
Old 12-28-2012, 12:39 PM   #3
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Alternatively, instead of pseudocounts to deal with some of the very small RPKMs generated by Cufflinks, you can also replace any RPKM < 0.0001 with 0 and this will give you the same values as reported by Cuffdiff.

I don't know why on some genes, the RPKMs get so small (1e-300 for example) in Cufflinks, whereas in Cuffdiff they are 0 RPKM.

Has anybody compared Cufflinks to eXpress? I notice that there are subsets of genes where cufflinks reports 1000 RPKM while eXpress reports 0 and vice versa... many look to be small RNAs.. just wondering what others are seeing out there. This happens with or without effective length correction.

I will try RSEM next.

I am a bit concerned though with aligning only to transcripts, wouldn't this result in some reads being forcibly aligned to a transcript reference? A lot of transcription goes on that is not annotated (as we know from the ENCODE project) and this could result in skewing RPKMs for some genes. Perhaps the genomic reference approach is better? With accurate aligners like STAR the argument for aligning solely to the transcriptome seems weaker.

Last edited by NGSfan; 12-28-2012 at 02:27 PM.
NGSfan is offline   Reply With Quote
Old 04-16-2013, 07:22 AM   #4
fuad193
Member
 
Location: N/A

Join Date: Feb 2012
Posts: 17
Default

Very Useful I will try it now. Just wondering why you make multi-hit only 1, this seems odd because the aligner will stop for looking of other possibility in mapping when it encounter the first hit.
fuad193 is offline   Reply With Quote
Old 04-16-2013, 08:10 AM   #5
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Quote:
Originally Posted by fuad193 View Post
Very Useful I will try it now. Just wondering why you make multi-hit only 1, this seems odd because the aligner will stop for looking of other possibility in mapping when it encounter the first hit.

If my interpretation or understanding of the --max-frag-multihits 1 option is correct, this tells cuffdiff to limit itself to using reads that align uniquely to one locus, and to ignore reads that map to more than 1 locus. In the case of tophat/STAR, reads that are uniquely mapped get a score of 255, so it should be counting those only.

I have not played around with changing this value. I would assume that unique reads are better to use for differential gene expression.
NGSfan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:59 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO