SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   cufflinks FPKM >>> Cuffdiff FPKM (http://seqanswers.com/forums/showthread.php?t=16962)

peromhc 01-18-2012 08:24 PM

cufflinks FPKM >>> Cuffdiff FPKM
 
I cannot understand why the FPKM estimated in cufflinks is SO much larger than that in cuffdiff:

Cufflinks
Code:

cufflinks -p8 -m320 -u -o /media/hd/working/tuco/17Jan12socialcuff -L social \
--upper-quartile-norm --max-mle-iterations 20000 \
/media/hd/working/tuco/b2.social/social.bam

cat transcripts.gtf | grep 'comp14388_c0_seq1'

comp14388_c0_seq1; FPKM "1630419.4581286784";

I merged the .gtf files from each cufflinks run, and fed that to cufflinks
I have 5 biological reps for each group

Cuffdiff
Code:

mkdir /media/hd/working/tuco/17Jan.cuffdiff
cd /media/hd/working/tuco/17Jan.cuffdiff

cuffdiff -p8 -L social,solitary -N -u \
--max-mle-iterations 10000 /media/hd/working/tuco/17Jan12cuffcompare/*gtf \
/media/hd/working/tuco/b2.bams/406A.bam,\
/media/hd/working/tuco/b2.bams/4262.bam,\
/media/hd/working/tuco/b2.bams/2354.bam,\
/media/hd/working/tuco/b2.bams/4241.bam,\
/media/hd/working/tuco/b2.bams/401C.bam \
/media/hd/working/tuco/b2.bams/6236.bam,\
/media/hd/working/tuco/b2.bams/2226.bam,\
/media/hd/working/tuco/b2.bams/5B5C.bam,\
/media/hd/working/tuco/b2.bams/255D.bam,\
/media/hd/working/tuco/b2.bams/4572.bam

cat gene_exp.diff | grep 'comp14388_c0_seq1'

comp14388_c0_seq1:0-1977        social        solitary        10.5437        8.08172


ok... 1630419.4581286784 >>> 10.5437 Why??

peromhc 01-19-2012 07:57 AM

I should note that 'social.bam' is just a product of samtools merge for all the individuals in the social treatment.. Those bamfiles are listed individually in Cuffdiff-- to indicate that there are biological replicates.

So, in essence, the FPKM from social.bam from cufflinks should be the average value from all the individuals in that group.

polyatail 01-25-2012 10:09 AM

Just at first glance, in your cufflinks run you specify two different parameters that will affect the FPKM calculation.
Code:

--upper-quartile-norm --max-mle-iterations 20000
I would try changing --max-mle-iterations to match cuffdiff, disabling quartile normization, and running the biological replicates through cufflinks separately to see if this difference is true. Then I would try cufflinks with the merged BAMs. Internally the same code does the quantification in both cufflinks and cuffdiff.

Also, I noticed you're looking in transcripts.gtf for cufflinks and gene_exp.diff for cuffdiff. It would be better to look in isoforms.fpkm_tracking for both cufflinks and cuffdiff, as gene_exp.diff lists quantification at the locus level while transcripts.gtf is at the isoform level.

peromhc 01-25-2012 07:21 PM

also, I just realized that log10(1630419.4581286784) is about 6, which is pretty close to 10.. I wonder if the difference is this easy.

sudders 04-18-2012 04:17 AM

Did you ever find a solution to this? We run into the same problem.

Our pipeline is thus:
We map reads with tophat for each sample
Run cufflinks on each sample to generate a transcriptome assembly

the command looks something like:
Code:

cufflinks --label tax-Pre-R5
              --num-threads 4
              --library-type fr-secondstrand
              --frag-bias-correct /ifs/mirror/genomes/bowtie/hg19.fa
              --multi-read-correct
              --upper-quartile-norm
              /ifs/projects/proj004/rnaseq4/tax-Pre-R5.accepted.bam

Run Cuffmerge and Cuffcompare to generate merged gene sets.

We also run cuff diff to test for differences.

Our cuffdiff commands look like:

Code:

cuffdiff --output-dir abinitio.cuffdiff.dir           
                --library-type fr-secondstrand
                --upper-quartile-norm
                --frag-bias-correct /ifs/mirror/genomes/bowtie/hg19.fa
                --multi-read-correct
                --verbose
                --num-threads 16
                --labels Prostate-Pre-agg,Prostate-Post-agg,tax-Pre-agg,tax-Post-agg             
                --FDR 0.050000
                abinitio.gtf
              Prostate-Pre-R7.accepted.bam,Prostate-Pre-R1.accepted.bam,Prostate-Pre-R4.accepted.bam,Prostate-Pre-R2.accepted.bam,Prostate-Pre-R8.accepted.bam,Prostate-Pre-R5.accepted.bam,Prostate-Pre-R3.accepted.bam,Prostate-Pre-R6.accepted.bam
            Prostate-Post-R7.accepted.bam,Prostate-Post-R8.accepted.bam,Prostate-Post-R6.accepted.bam,Prostate-Post-R3.accepted.bam,Prostate-Post-R5.accepted.bam,Prostate-Post-R2.accepted.bam,Prostate-Post-R4.accepted.bam,Prostate-Post-R1.accepted.bam 
            tax-Pre-R1.accepted.bam,tax-Pre-R3.accepted.bam,tax-Pre-R2.accepted.bam,tax-Pre-R6.accepted.bam,tax-Pre-R4.accepted.bam,tax-Pre-R5.accepted.bam
          tax-Post-R6.accepted.bam,tax-Post-R1.accepted.bam,tax-Post-R4.accepted.bam,tax-Post-R5.accepted.bam,tax-Post-R2.accepted.bam,tax-Post-R3.accepted.bam

If we compare the FPKMs coming out of cuffcompare and cuffdiff they are not even within two or three orders of magnitude of each other, with the cuffcompare FPKMs being in the millions or tens of millions, while the cuffdiff outputs being in the more sensible 0 - several hundred range.

We're using cufflinks 1.3.1.

mmanrique 08-01-2012 07:49 AM

Hi,

we had the same problem and tried the new Cufflinks version 2.0.2 and it seems the values from Cufflinks and Cuffdiff are the same (have to check it more carefully)

these are the commands I used

Code:

cufflinks -o ./Sample001_cufflinks_out_No_N_2.0.2 -u -g ../genes.gtf -p 2 --total-hits-norm ../Sample_001_accepted_hits.bam
Code:

cuffdiff -o ./COMPARISON1_SAMPLE1_SAMPLE1BIS_cuffdiff_out/ -L SAMPLE1,SAMPLE1BIS -p 2 -u -v -emit-count-tables -total-hits-norm ../Sample001_cufflinks_out/transcripts.gtf ../Sample_001_accepted_hits.sam ../Sample_001_bis_accepted_hits.sam
I know it's weird to use cuffdiff to compare one sample to itself but I had no other choice...

HTH

Marina

EDIT: Though the FPKM values from Cufflinks and Cuffdiff are now more similar I still get unreasonable high FPKM values specially for very short genes (around 37nt, regulatory RNAs I guess). Searching for some kind of explanation I found this thread http://seqanswers.com/forums/showthread.php?t=20702 it's worth reading it, good explanation by Cole Trapnell on why in small genes you can get extremely high FPKM values

IBseq 10-17-2012 01:07 PM

hi all, i had the same prob and i was told to run cuffdiff WITHOUT the "N" option (perform quartile normalization)

hope it helps....
ib


All times are GMT -8. The time now is 10:52 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.