Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cufflinks FPKM >>> Cuffdiff FPKM

    I cannot understand why the FPKM estimated in cufflinks is SO much larger than that in cuffdiff:

    Cufflinks
    Code:
    cufflinks -p8 -m320 -u -o /media/hd/working/tuco/17Jan12socialcuff -L social \
    --upper-quartile-norm --max-mle-iterations 20000 \
    /media/hd/working/tuco/b2.social/social.bam
    
    cat transcripts.gtf | grep 'comp14388_c0_seq1'
    
    comp14388_c0_seq1; FPKM "[B]1630419.4581286784[/B]";
    I merged the .gtf files from each cufflinks run, and fed that to cufflinks
    I have 5 biological reps for each group

    Cuffdiff
    Code:
    mkdir /media/hd/working/tuco/17Jan.cuffdiff
    cd /media/hd/working/tuco/17Jan.cuffdiff
    
    cuffdiff -p8 -L social,solitary -N -u \
    --max-mle-iterations 10000 /media/hd/working/tuco/17Jan12cuffcompare/*gtf \
    /media/hd/working/tuco/b2.bams/406A.bam,\
    /media/hd/working/tuco/b2.bams/4262.bam,\
    /media/hd/working/tuco/b2.bams/2354.bam,\
    /media/hd/working/tuco/b2.bams/4241.bam,\
    /media/hd/working/tuco/b2.bams/401C.bam \
    /media/hd/working/tuco/b2.bams/6236.bam,\
    /media/hd/working/tuco/b2.bams/2226.bam,\
    /media/hd/working/tuco/b2.bams/5B5C.bam,\
    /media/hd/working/tuco/b2.bams/255D.bam,\
    /media/hd/working/tuco/b2.bams/4572.bam
    
    cat gene_exp.diff | grep 'comp14388_c0_seq1'
    
    comp14388_c0_seq1:0-1977	social	solitary	[B]10.5437[/B]	8.08172

    ok... 1630419.4581286784 >>> 10.5437 Why??

  • #2
    I should note that 'social.bam' is just a product of samtools merge for all the individuals in the social treatment.. Those bamfiles are listed individually in Cuffdiff-- to indicate that there are biological replicates.

    So, in essence, the FPKM from social.bam from cufflinks should be the average value from all the individuals in that group.

    Comment


    • #3
      Just at first glance, in your cufflinks run you specify two different parameters that will affect the FPKM calculation.
      Code:
      --upper-quartile-norm --max-mle-iterations 20000
      I would try changing --max-mle-iterations to match cuffdiff, disabling quartile normization, and running the biological replicates through cufflinks separately to see if this difference is true. Then I would try cufflinks with the merged BAMs. Internally the same code does the quantification in both cufflinks and cuffdiff.

      Also, I noticed you're looking in transcripts.gtf for cufflinks and gene_exp.diff for cuffdiff. It would be better to look in isoforms.fpkm_tracking for both cufflinks and cuffdiff, as gene_exp.diff lists quantification at the locus level while transcripts.gtf is at the isoform level.

      Comment


      • #4
        also, I just realized that log10(1630419.4581286784) is about 6, which is pretty close to 10.. I wonder if the difference is this easy.

        Comment


        • #5
          Did you ever find a solution to this? We run into the same problem.

          Our pipeline is thus:
          We map reads with tophat for each sample
          Run cufflinks on each sample to generate a transcriptome assembly

          the command looks something like:
          Code:
           cufflinks --label tax-Pre-R5
                         --num-threads 4
                         --library-type fr-secondstrand
                         --frag-bias-correct /ifs/mirror/genomes/bowtie/hg19.fa
                         --multi-read-correct
                         --upper-quartile-norm
                         /ifs/projects/proj004/rnaseq4/tax-Pre-R5.accepted.bam
          Run Cuffmerge and Cuffcompare to generate merged gene sets.

          We also run cuff diff to test for differences.

          Our cuffdiff commands look like:

          Code:
           cuffdiff --output-dir abinitio.cuffdiff.dir             
                           --library-type fr-secondstrand
                           --upper-quartile-norm 
                           --frag-bias-correct /ifs/mirror/genomes/bowtie/hg19.fa
                           --multi-read-correct
                           --verbose
                           --num-threads 16
                           --labels Prostate-Pre-agg,Prostate-Post-agg,tax-Pre-agg,tax-Post-agg              
                           --FDR 0.050000
                          abinitio.gtf
                        Prostate-Pre-R7.accepted.bam,Prostate-Pre-R1.accepted.bam,Prostate-Pre-R4.accepted.bam,Prostate-Pre-R2.accepted.bam,Prostate-Pre-R8.accepted.bam,Prostate-Pre-R5.accepted.bam,Prostate-Pre-R3.accepted.bam,Prostate-Pre-R6.accepted.bam
                       Prostate-Post-R7.accepted.bam,Prostate-Post-R8.accepted.bam,Prostate-Post-R6.accepted.bam,Prostate-Post-R3.accepted.bam,Prostate-Post-R5.accepted.bam,Prostate-Post-R2.accepted.bam,Prostate-Post-R4.accepted.bam,Prostate-Post-R1.accepted.bam   
                      tax-Pre-R1.accepted.bam,tax-Pre-R3.accepted.bam,tax-Pre-R2.accepted.bam,tax-Pre-R6.accepted.bam,tax-Pre-R4.accepted.bam,tax-Pre-R5.accepted.bam
                     tax-Post-R6.accepted.bam,tax-Post-R1.accepted.bam,tax-Post-R4.accepted.bam,tax-Post-R5.accepted.bam,tax-Post-R2.accepted.bam,tax-Post-R3.accepted.bam
          If we compare the FPKMs coming out of cuffcompare and cuffdiff they are not even within two or three orders of magnitude of each other, with the cuffcompare FPKMs being in the millions or tens of millions, while the cuffdiff outputs being in the more sensible 0 - several hundred range.

          We're using cufflinks 1.3.1.

          Comment


          • #6
            Hi,

            we had the same problem and tried the new Cufflinks version 2.0.2 and it seems the values from Cufflinks and Cuffdiff are the same (have to check it more carefully)

            these are the commands I used

            Code:
            cufflinks -o ./Sample001_cufflinks_out_No_N_2.0.2 -u -g ../genes.gtf -p 2 --total-hits-norm ../Sample_001_accepted_hits.bam
            Code:
            cuffdiff -o ./COMPARISON1_SAMPLE1_SAMPLE1BIS_cuffdiff_out/ -L SAMPLE1,SAMPLE1BIS -p 2 -u -v -emit-count-tables -total-hits-norm ../Sample001_cufflinks_out/transcripts.gtf ../Sample_001_accepted_hits.sam ../Sample_001_bis_accepted_hits.sam
            I know it's weird to use cuffdiff to compare one sample to itself but I had no other choice...

            HTH

            Marina

            EDIT: Though the FPKM values from Cufflinks and Cuffdiff are now more similar I still get unreasonable high FPKM values specially for very short genes (around 37nt, regulatory RNAs I guess). Searching for some kind of explanation I found this thread http://seqanswers.com/forums/showthread.php?t=20702 it's worth reading it, good explanation by Cole Trapnell on why in small genes you can get extremely high FPKM values
            Last edited by mmanrique; 08-04-2012, 07:25 AM.

            Comment


            • #7
              hi all, i had the same prob and i was told to run cuffdiff WITHOUT the "N" option (perform quartile normalization)

              hope it helps....
              ib

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X