Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Very large FPKM from cuffdiff that doesn't match read counts

    Hi Everyone,

    I am relatively new to RNAseq, and hope can get some help from more experienced people here.

    So I used cuffdiff to look at differential gene expression. I have 2 conditions and 3 replicates for each condition. When I look at the genes_read_group_tracking file, I found that for some genes, one of the replicates has very large FPKM values that doesn't match raw frags count.

    For example this is what I see:

    tracking_id condition replicate raw_frags internal_scaled_frags external_scaled_frags FPKM effective_length status
    XLOC_009487 WT 0 4 4.21548 4.21548 1150.43 - OK
    XLOC_009487 WT 1 5 5.39083 5.39083 1.14629 - OK
    XLOC_009487 WT 2 8 7.56804 7.56804 1.60234 - OK
    XLOC_009487 OE 1 2 2.29124 2.29124 0.476229 - OK
    XLOC_009487 OE 0 5 4.84785 4.84785 0.995213 - OK
    XLOC_009487 OE 2 5 4.07315 4.07315 0.77989 - OK

    You can see that the FPKM for WT 0 is 1150 where as the raw frags is only 4.2. The other samples are fine. I observed this in multiple genes, and they don't always happen to the same sample. I also check the FPKMs from cufflinks, and they look normal. So seems that it's cuffdiff's problem.

    Does anyone know why this happen and how to solve it? Appreciate the help!

  • #2
    I think there is a decent amount of evidence out there that cuffdiff is less than optimal for gene-level DE analysis.

    See:
    A large number of computational methods have been developed for analyzing differential gene expression in RNA-seq data. We describe a comprehensive evaluation of common methods using the SEQC benchmark dataset and ENCODE data. We consider a number of key features, including normalization, accuracy of differential expression detection and differential expression analysis when one condition has no detectable expression. We find significant differences among the methods, but note that array-based methods adapted to RNA-seq data perform comparably to methods designed for RNA-seq. Our results demonstrate that increasing the number of replicate samples significantly improves detection power over increased sequencing depth.

    Recent advances in next-generation sequencing technology allow high-throughput cDNA sequencing (RNA-Seq) to be widely applied in transcriptomic studies, in particular for detecting differentially expressed genes between groups. Many software packages have been developed for the identification of differentially expressed genes (DEGs) between treatment groups based on RNA-Seq data. However, there is a lack of consensus on how to approach an optimal study design and choice of suitable software for the analysis. In this comparative study we evaluate the performance of three of the most frequently used software tools: Cufflinks-Cuffdiff2, DESeq and edgeR. A number of important parameters of RNA-Seq technology were taken into consideration, including the number of replicates, sequencing depth, and balanced vs. unbalanced sequencing depth within and between groups. We benchmarked results relative to sets of DEGs identified through either quantitative RT-PCR or microarray. We observed that edgeR performs slightly better than DESeq and Cuffdiff2 in terms of the ability to uncover true positives. Overall, DESeq or taking the intersection of DEGs from two or more tools is recommended if the number of false positives is a major concern in the study. In other circumstances, edgeR is slightly preferable for differential expression analysis at the expense of potentially introducing more false positives.



    If you are only doing DGE, I'd recommend edgeR or DESeq2

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM
    • seqadmin
      Techniques and Challenges in Conservation Genomics
      by seqadmin



      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

      Avian Conservation
      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
      03-08-2024, 10:41 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 03-27-2024, 06:37 PM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-27-2024, 06:07 PM
    0 responses
    11 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-22-2024, 10:03 AM
    0 responses
    53 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-21-2024, 07:32 AM
    0 responses
    69 views
    0 likes
    Last Post seqadmin  
    Working...
    X