Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • peromhc
    Senior Member
    • Sep 2009
    • 108

    cufflinks FPKM >>> Cuffdiff FPKM

    I cannot understand why the FPKM estimated in cufflinks is SO much larger than that in cuffdiff:

    Cufflinks
    Code:
    cufflinks -p8 -m320 -u -o /media/hd/working/tuco/17Jan12socialcuff -L social \
    --upper-quartile-norm --max-mle-iterations 20000 \
    /media/hd/working/tuco/b2.social/social.bam
    
    cat transcripts.gtf | grep 'comp14388_c0_seq1'
    
    comp14388_c0_seq1; FPKM "[B]1630419.4581286784[/B]";
    I merged the .gtf files from each cufflinks run, and fed that to cufflinks
    I have 5 biological reps for each group

    Cuffdiff
    Code:
    mkdir /media/hd/working/tuco/17Jan.cuffdiff
    cd /media/hd/working/tuco/17Jan.cuffdiff
    
    cuffdiff -p8 -L social,solitary -N -u \
    --max-mle-iterations 10000 /media/hd/working/tuco/17Jan12cuffcompare/*gtf \
    /media/hd/working/tuco/b2.bams/406A.bam,\
    /media/hd/working/tuco/b2.bams/4262.bam,\
    /media/hd/working/tuco/b2.bams/2354.bam,\
    /media/hd/working/tuco/b2.bams/4241.bam,\
    /media/hd/working/tuco/b2.bams/401C.bam \
    /media/hd/working/tuco/b2.bams/6236.bam,\
    /media/hd/working/tuco/b2.bams/2226.bam,\
    /media/hd/working/tuco/b2.bams/5B5C.bam,\
    /media/hd/working/tuco/b2.bams/255D.bam,\
    /media/hd/working/tuco/b2.bams/4572.bam
    
    cat gene_exp.diff | grep 'comp14388_c0_seq1'
    
    comp14388_c0_seq1:0-1977	social	solitary	[B]10.5437[/B]	8.08172

    ok... 1630419.4581286784 >>> 10.5437 Why??
  • peromhc
    Senior Member
    • Sep 2009
    • 108

    #2
    I should note that 'social.bam' is just a product of samtools merge for all the individuals in the social treatment.. Those bamfiles are listed individually in Cuffdiff-- to indicate that there are biological replicates.

    So, in essence, the FPKM from social.bam from cufflinks should be the average value from all the individuals in that group.

    Comment

    • polyatail
      Member
      • Dec 2010
      • 25

      #3
      Just at first glance, in your cufflinks run you specify two different parameters that will affect the FPKM calculation.
      Code:
      --upper-quartile-norm --max-mle-iterations 20000
      I would try changing --max-mle-iterations to match cuffdiff, disabling quartile normization, and running the biological replicates through cufflinks separately to see if this difference is true. Then I would try cufflinks with the merged BAMs. Internally the same code does the quantification in both cufflinks and cuffdiff.

      Also, I noticed you're looking in transcripts.gtf for cufflinks and gene_exp.diff for cuffdiff. It would be better to look in isoforms.fpkm_tracking for both cufflinks and cuffdiff, as gene_exp.diff lists quantification at the locus level while transcripts.gtf is at the isoform level.

      Comment

      • peromhc
        Senior Member
        • Sep 2009
        • 108

        #4
        also, I just realized that log10(1630419.4581286784) is about 6, which is pretty close to 10.. I wonder if the difference is this easy.

        Comment

        • sudders
          Member
          • Dec 2011
          • 32

          #5
          Did you ever find a solution to this? We run into the same problem.

          Our pipeline is thus:
          We map reads with tophat for each sample
          Run cufflinks on each sample to generate a transcriptome assembly

          the command looks something like:
          Code:
           cufflinks --label tax-Pre-R5
                         --num-threads 4
                         --library-type fr-secondstrand
                         --frag-bias-correct /ifs/mirror/genomes/bowtie/hg19.fa
                         --multi-read-correct
                         --upper-quartile-norm
                         /ifs/projects/proj004/rnaseq4/tax-Pre-R5.accepted.bam
          Run Cuffmerge and Cuffcompare to generate merged gene sets.

          We also run cuff diff to test for differences.

          Our cuffdiff commands look like:

          Code:
           cuffdiff --output-dir abinitio.cuffdiff.dir             
                           --library-type fr-secondstrand
                           --upper-quartile-norm 
                           --frag-bias-correct /ifs/mirror/genomes/bowtie/hg19.fa
                           --multi-read-correct
                           --verbose
                           --num-threads 16
                           --labels Prostate-Pre-agg,Prostate-Post-agg,tax-Pre-agg,tax-Post-agg              
                           --FDR 0.050000
                          abinitio.gtf
                        Prostate-Pre-R7.accepted.bam,Prostate-Pre-R1.accepted.bam,Prostate-Pre-R4.accepted.bam,Prostate-Pre-R2.accepted.bam,Prostate-Pre-R8.accepted.bam,Prostate-Pre-R5.accepted.bam,Prostate-Pre-R3.accepted.bam,Prostate-Pre-R6.accepted.bam
                       Prostate-Post-R7.accepted.bam,Prostate-Post-R8.accepted.bam,Prostate-Post-R6.accepted.bam,Prostate-Post-R3.accepted.bam,Prostate-Post-R5.accepted.bam,Prostate-Post-R2.accepted.bam,Prostate-Post-R4.accepted.bam,Prostate-Post-R1.accepted.bam   
                      tax-Pre-R1.accepted.bam,tax-Pre-R3.accepted.bam,tax-Pre-R2.accepted.bam,tax-Pre-R6.accepted.bam,tax-Pre-R4.accepted.bam,tax-Pre-R5.accepted.bam
                     tax-Post-R6.accepted.bam,tax-Post-R1.accepted.bam,tax-Post-R4.accepted.bam,tax-Post-R5.accepted.bam,tax-Post-R2.accepted.bam,tax-Post-R3.accepted.bam
          If we compare the FPKMs coming out of cuffcompare and cuffdiff they are not even within two or three orders of magnitude of each other, with the cuffcompare FPKMs being in the millions or tens of millions, while the cuffdiff outputs being in the more sensible 0 - several hundred range.

          We're using cufflinks 1.3.1.

          Comment

          • mmanrique
            Member
            • Dec 2009
            • 12

            #6
            Hi,

            we had the same problem and tried the new Cufflinks version 2.0.2 and it seems the values from Cufflinks and Cuffdiff are the same (have to check it more carefully)

            these are the commands I used

            Code:
            cufflinks -o ./Sample001_cufflinks_out_No_N_2.0.2 -u -g ../genes.gtf -p 2 --total-hits-norm ../Sample_001_accepted_hits.bam
            Code:
            cuffdiff -o ./COMPARISON1_SAMPLE1_SAMPLE1BIS_cuffdiff_out/ -L SAMPLE1,SAMPLE1BIS -p 2 -u -v -emit-count-tables -total-hits-norm ../Sample001_cufflinks_out/transcripts.gtf ../Sample_001_accepted_hits.sam ../Sample_001_bis_accepted_hits.sam
            I know it's weird to use cuffdiff to compare one sample to itself but I had no other choice...

            HTH

            Marina

            EDIT: Though the FPKM values from Cufflinks and Cuffdiff are now more similar I still get unreasonable high FPKM values specially for very short genes (around 37nt, regulatory RNAs I guess). Searching for some kind of explanation I found this thread http://seqanswers.com/forums/showthread.php?t=20702 it's worth reading it, good explanation by Cole Trapnell on why in small genes you can get extremely high FPKM values
            Last edited by mmanrique; 08-04-2012, 07:25 AM.

            Comment

            • IBseq
              Member
              • Jul 2012
              • 56

              #7
              hi all, i had the same prob and i was told to run cuffdiff WITHOUT the "N" option (perform quartile normalization)

              hope it helps....
              ib

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Today, 05:37 AM
              0 responses
              5 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              16 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              49 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              109 views
              0 reactions
              Last Post SEQadmin2  
              Working...