SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
cufflinks multiple FPKM values for same location abh Bioinformatics 2 07-16-2013 08:19 AM
Solution found: For Cuffdiff/links 2.0.2 Make cufflinks FPKM match Cuffdiff FPKM NGSfan RNA Sequencing 4 04-16-2013 07:10 AM
Different FPKM values of cufflinks and cuffdiff mrfox Bioinformatics 5 10-17-2012 01:10 PM
Cufflinks and cuffdiff FPKM values combiochem Bioinformatics 12 10-13-2012 11:37 PM
Different FPKM values of cufflinks and cuffdiff in latest version mrfox Bioinformatics 1 11-23-2010 05:23 AM

Reply
 
Thread Tools
Old 09-19-2013, 06:19 AM   #1
id0
Senior Member
 
Location: USA

Join Date: Sep 2012
Posts: 130
Default Cufflinks/Cuffdiff different FPKM values for multiple genes at one location

I am doing my first RNA-seq for Drosophila melanogaster (I normally deal with human or mouse data). It turns out there are a lot of fly genes that have identical coordinates as other genes. In other words, the same location has multiple genes assigned to it.

If there are multiple genes that have the same exact coordinates, they should have the same FPKM values. However, running Cuffdiff using a GTF like that does not yield the same values for all genes.

Is there a way to force Cuffdiff to assign the same values to all overlapping genes? I could not find any arguments that may do that. Is there a proper way of dealing with such situations? Do I need to optimize the GTF file? I use the one from iGenomes, which is endorsed by Cufflinks, so it seems like it should be fine.
id0 is offline   Reply With Quote
Old 11-06-2013, 11:34 PM   #2
davidblaney
Member
 
Location: Oxford, UK

Join Date: Nov 2011
Posts: 17
Default

Hi,

The FPKM values for the genes are a sum of the FPKM values found for transcripts of that gene... so I guess you could expect differing values come from the difference in transcripts and the reads/fragments covering them.

To test I would look at the transcripts FPKM values for each gene that have the same loci.
davidblaney is offline   Reply With Quote
Old 11-07-2013, 06:06 AM   #3
id0
Senior Member
 
Location: USA

Join Date: Sep 2012
Posts: 130
Default

Quote:
Originally Posted by davidblaney View Post
The FPKM values for the genes are a sum of the FPKM values found for transcripts of that gene... so I guess you could expect differing values come from the difference in transcripts and the reads/fragments covering them.

To test I would look at the transcripts FPKM values for each gene that have the same loci.
The FPKM values for multiple transcripts with identical coordinates are very different. Additonally, they can even be called significantly different between two samples and in different directions.
id0 is offline   Reply With Quote
Old 11-07-2013, 10:40 AM   #4
feralBiologist
Member
 
Location: UK

Join Date: Jun 2011
Posts: 61
Default

I stopped using cufflinks/cuffdiff 3 months ago as the latest version was producing implausible results. I would recommend using tophat2 + htseq-count + edgeR (or DESeq). I based my workflow on this nice tutorial: http://www-huber.embl.de/pub/pdf/nprot.2013.099.pdf
feralBiologist is offline   Reply With Quote
Old 11-07-2013, 11:47 PM   #5
rboettcher
Member
 
Location: Berlin

Join Date: Oct 2010
Posts: 71
Default

I agree with feralBiologist, but would switch to featureCounts instead of HTSeq-count for performance reasons (can be run multithreaded and does not require resorted SAM file).
rboettcher is offline   Reply With Quote
Old 11-08-2013, 12:41 AM   #6
feralBiologist
Member
 
Location: UK

Join Date: Jun 2011
Posts: 61
Default

Quote:
Originally Posted by rboettcher View Post
I agree with feralBiologist, but would switch to featureCounts instead of HTSeq-count for performance reasons (can be run multithreaded and does not require resorted SAM file).
Thanks for this suggestion. Have you checked whether HTSeq-count and featureCounts produce the same results? HTSeq-count supports certain counting models. Basically, the main issue is how to count reads that hit overlapping genes. I chose HTSeq-count because it is written by the author of DESeq and in the above paper has been "approved" also by the authors of edgeR. If featureCounts produces the same results than the switch would be painless but if there are differences than you need to look into the details.

EDIT: I realised that featureCounts is written by the authors of edgeR so it shall be straight-forward to substitute HTSeq-count. Thanks again to rboettcher.

Last edited by feralBiologist; 11-08-2013 at 12:56 AM.
feralBiologist is offline   Reply With Quote
Old 11-08-2013, 01:09 AM   #7
rboettcher
Member
 
Location: Berlin

Join Date: Oct 2010
Posts: 71
Default

Quote:
Originally Posted by feralBiologist View Post
Thanks for this suggestion. Have you checked whether HTSeq-count and featureCounts produce the same results? HTSeq-count supports certain counting models. Basically, the main issue is how to count reads that hit overlapping genes. I chose HTSeq-count because it is written by the author of DESeq and in the above paper has been "approved" also by the authors of edgeR. If featureCounts produces the same results than the switch would be painless but if there are differences than you need to look into the details.

EDIT: I realised that featureCounts is written by the authors of edgeR so it shall be straight-forward to substitute HTSeq-count. Thanks again to rboettcher.
I suggest to have a look a their manuscript on arXiv where the authors made such as comparison. From my experience both tools produce similar results, see http://arxiv.org/abs/1305.3347

EDIT: another nice feature is that fC outputs gene length, so computation of RPKM is straight forward.
rboettcher is offline   Reply With Quote
Old 11-11-2013, 02:29 AM   #8
davidblaney
Member
 
Location: Oxford, UK

Join Date: Nov 2011
Posts: 17
Default

I am going to try these new methods out, thanks.
davidblaney is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO