Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • fcchau
    Junior Member
    • Feb 2017
    • 1

    Very large FPKM from cuffdiff that doesn't match read counts

    Hi Everyone,

    I am relatively new to RNAseq, and hope can get some help from more experienced people here.

    So I used cuffdiff to look at differential gene expression. I have 2 conditions and 3 replicates for each condition. When I look at the genes_read_group_tracking file, I found that for some genes, one of the replicates has very large FPKM values that doesn't match raw frags count.

    For example this is what I see:

    tracking_id condition replicate raw_frags internal_scaled_frags external_scaled_frags FPKM effective_length status
    XLOC_009487 WT 0 4 4.21548 4.21548 1150.43 - OK
    XLOC_009487 WT 1 5 5.39083 5.39083 1.14629 - OK
    XLOC_009487 WT 2 8 7.56804 7.56804 1.60234 - OK
    XLOC_009487 OE 1 2 2.29124 2.29124 0.476229 - OK
    XLOC_009487 OE 0 5 4.84785 4.84785 0.995213 - OK
    XLOC_009487 OE 2 5 4.07315 4.07315 0.77989 - OK

    You can see that the FPKM for WT 0 is 1150 where as the raw frags is only 4.2. The other samples are fine. I observed this in multiple genes, and they don't always happen to the same sample. I also check the FPKMs from cufflinks, and they look normal. So seems that it's cuffdiff's problem.

    Does anyone know why this happen and how to solve it? Appreciate the help!
  • fanli
    Senior Member
    • Jul 2014
    • 197

    #2
    I think there is a decent amount of evidence out there that cuffdiff is less than optimal for gene-level DE analysis.

    See:
    A large number of computational methods have been developed for analyzing differential gene expression in RNA-seq data. We describe a comprehensive evaluation of common methods using the SEQC benchmark dataset and ENCODE data. We consider a number of key features, including normalization, accuracy of differential expression detection and differential expression analysis when one condition has no detectable expression. We find significant differences among the methods, but note that array-based methods adapted to RNA-seq data perform comparably to methods designed for RNA-seq. Our results demonstrate that increasing the number of replicate samples significantly improves detection power over increased sequencing depth.

    Recent advances in next-generation sequencing technology allow high-throughput cDNA sequencing (RNA-Seq) to be widely applied in transcriptomic studies, in particular for detecting differentially expressed genes between groups. Many software packages have been developed for the identification of differentially expressed genes (DEGs) between treatment groups based on RNA-Seq data. However, there is a lack of consensus on how to approach an optimal study design and choice of suitable software for the analysis. In this comparative study we evaluate the performance of three of the most frequently used software tools: Cufflinks-Cuffdiff2, DESeq and edgeR. A number of important parameters of RNA-Seq technology were taken into consideration, including the number of replicates, sequencing depth, and balanced vs. unbalanced sequencing depth within and between groups. We benchmarked results relative to sets of DEGs identified through either quantitative RT-PCR or microarray. We observed that edgeR performs slightly better than DESeq and Cuffdiff2 in terms of the ability to uncover true positives. Overall, DESeq or taking the intersection of DEGs from two or more tools is recommended if the number of false positives is a major concern in the study. In other circumstances, edgeR is slightly preferable for differential expression analysis at the expense of potentially introducing more false positives.



    If you are only doing DGE, I'd recommend edgeR or DESeq2

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Pathogen Surveillance with Advanced Genomic Tools
      by seqadmin




      The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
      03-24-2025, 11:48 AM
    • seqadmin
      New Genomics Tools and Methods Shared at AGBT 2025
      by seqadmin


      This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

      The Headliner
      The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
      03-03-2025, 01:39 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Today, 10:17 AM
    0 responses
    7 views
    0 reactions
    Last Post seqadmin  
    Started by seqadmin, 03-20-2025, 05:03 AM
    0 responses
    49 views
    0 reactions
    Last Post seqadmin  
    Started by seqadmin, 03-19-2025, 07:27 AM
    0 responses
    59 views
    0 reactions
    Last Post seqadmin  
    Started by seqadmin, 03-18-2025, 12:50 PM
    0 responses
    50 views
    0 reactions
    Last Post seqadmin  
    Working...