Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
How do I get one FPKM value per gene? PFS Bioinformatics 26 07-13-2016 11:54 PM
CuffDiff 0 FPKM Output/Incorrectly Identified Differential Expression? cw11 Bioinformatics 4 04-27-2015 03:59 AM
Combining FPKM values for a gene john_nl Bioinformatics 5 02-15-2012 11:28 PM
ChIP-Seq: A regression analysis of gene expression in ES cells reveals two gene class Newsbot! Literature Watch 0 03-05-2011 02:01 AM
Differential gene expression of gene clusters anjana.vr RNA Sequencing 1 10-28-2010 10:33 AM

Thread Tools
Old 06-30-2011, 01:32 PM   #1
Location: long island

Join Date: May 2011
Posts: 22
Default Can I use FPKM to represent gene expression

Dear All
I am a newbie to the RNA-seq data analysis field. Currently, I'm in
charge of analyzing some human NGS samples (single end) in a disease-control comparative setting. I have 10 BAM files (biological replicates) from tophat, each having the size~4GB.

I followed the tophat-cufflinks-cuffcompare-cuffdiff pipeline (using
hg19 reference) to find the differentially expressed genes between experimental and control conditions.

I have no problem getting assembled results from cufflinks for each sample but I am stuck at the final cuffdiff step (the problem seems to be an insufficient memory issue as I constantly received bad-alloc feedback from the shell)

So I wonder if I can just use the FPKM value from the cufflink genes.fpkm_tracking file of each sample as the gene expression values and use traditional statistical methods to identify differentially expressed genes between two groups? (e.g. multiple
t-test, SAM analysis etc.)

Thanks in advance
slowsmile is offline   Reply With Quote
Old 07-01-2011, 07:35 AM   #2
David Eccles (gringer)
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838

You'd probably want to read here:

And here:

My quick-glance summary from that second FAQ is the following:
Current count-based differential expression tools are poorly suited to differential expression analysis in genomes with alternatively spliced genes. The main reason for this is that when a gene has multiple isoforms, a change in the total number of reads or fragments from that gene doesn't always correspond to a change in expression for that gene. Conversely, a gene's expression may change, but the total number of fragments generated by its isoforms may be very similar. In order to detect changes accurately, it's necessary to estimate how many fragments came from each individual splice variant in each sample. Current count-based tools don't do this (to our knowledge - please send us email if you know of one!). Even if they did, fragments that come from parts of genes that are shared by more than one splice variant can't generally assigned to a single isoform, so the fragment counts for each isoform are only estimates, and there is some uncertainty in the counts. Isoforms that are very similar will have a great deal of uncertainty surrounding their fragment counts. This uncertainty needs to be accounted for when testing for differential expression. So while you could use Cufflinks to estimate isoform-level counts, you'd be throwing away Cufflinks' uncertainty, and thus have more confidence in the differences you see than you really should. This will probably lead to many false positives in your analysis. Furthermore, we do not normalize simply by the length to calculate FPKM but an effective length, as explained in our publications. Calculting counts from FPKM by multiplying by the length will give incorrect results. We strongly encourage you to consider using Cuffdiff to find differentially expressed genes and transcripts.
In other words, if you're using cufflinks, it is also recommended to use cuffdiff. Note that tophat seems to be under somewhat heavy development at the moment. If you're not using the latest versions (cufflinks 1.0.3, tophat 1.3.1), there may be bugs that have been fixed to solve the memory issues.
gringer is offline   Reply With Quote
Old 07-01-2011, 07:53 AM   #3
Senior Member
Location: Research Triangle Park, NC

Join Date: Aug 2009
Posts: 245

Recently I was running cuffdiff with 6 SOLiD BioScope 1.3 mapped BAM files (3 control and 3 treatment, total of about 40.2Gb with the smallest file being about 5Gb and the largest about 12Gb) and was getting bad_alloc failures too.

I just took a look at our cluster's swap setup and then made a temporary swap big enough to let cuffdiff run. The machine I was using has 24Gb RAM, but had a small swap (not sure why, it shipped from Penguin that way), so I made an empty file of 24Gb and appended that to swap and cuffdiff ran just fine after that (used all the RAM of course, and about 13-14Gb of the swap, so I was overly generous but it worked).

So, you may be able to run cuffdiff by just creating a nice massive temporary swap file for the run.
Michael Black, Ph.D.
ScitoVation LLC. RTP, N.C.

Last edited by mbblack; 07-01-2011 at 08:16 AM.
mbblack is offline   Reply With Quote

cuffdiff, cufflinks, fpkm

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 04:27 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO