Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • SEQond
    Member
    • Jul 2010
    • 27

    Cuffcompare/cuffdiff changes FPKM values of same BAM in sequential runs

    Dear all,

    Question 1.

    For an RNA Seq experiment I compare between 3 bam files (non paired-end). (Tophat- Cufflinks)

    B vs A , C vs A (no replicate arrays)

    In the first step : sequentially run Tophat-Cufflinks accross each of my files to create all the bam files.

    In the second step: sequentially run comparisons with cuffcompare-cuffdiff.

    For most XLOCs the FPKM of "setA" differs slightly from one comparison to the next as in the example below.
    Why might that be? it is the same bam file in both cases. In certain rare cases the FPKM of "setA" differs considerably accross the comparisons.

    Comparison B vs A

    XLOC_002887 setB setA NOTEST 200,206 186,633
    XLOC_003705 setB setA LOWDATA 152,669 223,705


    Comparison C vs A

    XLOC_002887 setC setA LOWDATA 201,762 185,595
    XLOC_003705 setC setA LOWDATA 253,098 222,461


    The basic workflow with options where applicable follows

    Tophat -i 30 -I 20000 --segment-length 16 --segment-mismatches 1 (since my reads are 32bp long I have used half of the read length)
    Cufflinks -N
    Cuffcompare
    Cuffdiff -N -L B,A --FDR 0.1



    Question 2.

    In a replay of this experiment as an exercise I used instead of "--segment-length 16" , "--segment-length 20" which is more than half the length of my reads. In this case , ALL FPMKs were similar to the 16 segment case but multiplied by 100

    How can this happen?

    Thanks for your input.
  • zeam
    Member
    • Oct 2010
    • 43

    #2
    For question 1, did you use an annotation in cufflinks or cuffcompare? If so ,I thinks this might explain this:
    Cuffdiff and Cufflinks now accept new options controlling whether all hits are counted towards the FPKM denominator, or only those compatible with some transcript in the reference annotation. Counting only compatible hits avoids certain types of bias that arise when one sample contains far more hits that aren't compatible with any transcript than the other sample does. For example, if one sample contains vastly more mapped ribosomal RNA hits, FPKM values will appear lower in that sample, potentially leading to false positive differential expression calls. Cuffdiff by default now uses only compatible hits. Cufflinks still uses total hits by default, as using compatible hit accounting requires a reference GTF.
    So just set --compatible-hits-norm parameter to be identical when you do cufflinks and cuffdiff.

    Also,I have one question, you said you have three bam files, A B and C.Did you use cuffdiff for each pair or just use cuffdiff once? If the later, did you get all cuffdiff test between all pairs of samples ? I have four tisssues, and command are as follows:
    ======================================
    cuffdiff -o DEG_cuffdiff -p 20 cuffcmp.combined.gtf ./tophat_sd/accepted_hits.bam ./tophat_em/accepted_hits.bam tophat_en1/accepted_hits.bam ./tophat_en2/accepted_hits.bam
    ======================================
    But I only got one cuffdiff test file between the fisrt two pairs? Follow the mannual, there should be 6 files for each pair of samples.
    Thanks.

    Comment

    • SEQond
      Member
      • Jul 2010
      • 27

      #3
      Reply issue 1.
      Yes I did use a GTF annotation file I had already created. I used the -G
      -G/--GTF <reference_annotation.(gtf/gff)> Tells Cufflinks to use the supplied reference annotation (a GFF file) to estimate isoform expression. It will not assemble novel transcripts, and the program will ignore alignments not structurally compatible with any reference transcript.


      I will check what --compatible-hits-norm returns coupled with the -G option and will get back to you.

      Reply issue 2.
      Related to the multiple comparisons in one--> What I did was to run cuffdiff separetely for each pair. I did try to do 4 comparisons at once at some point by putting the control as the first case in cuffdiff, but the software returned only the first comparison and not the rest 3.
      I ll check again if there is a way around this.


      New Question, issue 3, possibly related to issue 1.
      In the way I am doing the analysis , I am getting a result of multiple NMs per XLOC

      What I would prefer is a single NM per line

      Comment

      Latest Articles

      Collapse

      • SEQadmin2
        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
        by SEQadmin2


        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
        ...
        Yesterday, 10:05 AM
      • SEQadmin2
        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
        by SEQadmin2


        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


        Introduction

        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
        05-22-2026, 06:42 AM
      • SEQadmin2
        Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
        by SEQadmin2

        Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


        Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
        05-06-2026, 09:04 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, Yesterday, 12:03 PM
      0 responses
      19 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, Yesterday, 11:40 AM
      0 responses
      14 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 05-28-2026, 11:40 AM
      0 responses
      29 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 05-26-2026, 10:12 AM
      0 responses
      31 views
      0 reactions
      Last Post SEQadmin2  
      Working...