Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Very large FPKM from cuffdiff that doesn't match read counts

    Hi Everyone,

    I am relatively new to RNAseq, and hope can get some help from more experienced people here.

    So I used cuffdiff to look at differential gene expression. I have 2 conditions and 3 replicates for each condition. When I look at the genes_read_group_tracking file, I found that for some genes, one of the replicates has very large FPKM values that doesn't match raw frags count.

    For example this is what I see:

    tracking_id condition replicate raw_frags internal_scaled_frags external_scaled_frags FPKM effective_length status
    XLOC_009487 WT 0 4 4.21548 4.21548 1150.43 - OK
    XLOC_009487 WT 1 5 5.39083 5.39083 1.14629 - OK
    XLOC_009487 WT 2 8 7.56804 7.56804 1.60234 - OK
    XLOC_009487 OE 1 2 2.29124 2.29124 0.476229 - OK
    XLOC_009487 OE 0 5 4.84785 4.84785 0.995213 - OK
    XLOC_009487 OE 2 5 4.07315 4.07315 0.77989 - OK

    You can see that the FPKM for WT 0 is 1150 where as the raw frags is only 4.2. The other samples are fine. I observed this in multiple genes, and they don't always happen to the same sample. I also check the FPKMs from cufflinks, and they look normal. So seems that it's cuffdiff's problem.

    Does anyone know why this happen and how to solve it? Appreciate the help!

  • #2
    I think there is a decent amount of evidence out there that cuffdiff is less than optimal for gene-level DE analysis.

    See:
    A large number of computational methods have been developed for analyzing differential gene expression in RNA-seq data. We describe a comprehensive evaluation of common methods using the SEQC benchmark dataset and ENCODE data. We consider a number of key features, including normalization, accuracy of differential expression detection and differential expression analysis when one condition has no detectable expression. We find significant differences among the methods, but note that array-based methods adapted to RNA-seq data perform comparably to methods designed for RNA-seq. Our results demonstrate that increasing the number of replicate samples significantly improves detection power over increased sequencing depth.

    Recent advances in next-generation sequencing technology allow high-throughput cDNA sequencing (RNA-Seq) to be widely applied in transcriptomic studies, in particular for detecting differentially expressed genes between groups. Many software packages have been developed for the identification of differentially expressed genes (DEGs) between treatment groups based on RNA-Seq data. However, there is a lack of consensus on how to approach an optimal study design and choice of suitable software for the analysis. In this comparative study we evaluate the performance of three of the most frequently used software tools: Cufflinks-Cuffdiff2, DESeq and edgeR. A number of important parameters of RNA-Seq technology were taken into consideration, including the number of replicates, sequencing depth, and balanced vs. unbalanced sequencing depth within and between groups. We benchmarked results relative to sets of DEGs identified through either quantitative RT-PCR or microarray. We observed that edgeR performs slightly better than DESeq and Cuffdiff2 in terms of the ability to uncover true positives. Overall, DESeq or taking the intersection of DEGs from two or more tools is recommended if the number of false positives is a major concern in the study. In other circumstances, edgeR is slightly preferable for differential expression analysis at the expense of potentially introducing more false positives.



    If you are only doing DGE, I'd recommend edgeR or DESeq2

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Advancing Precision Medicine for Rare Diseases in Children
      by seqadmin




      Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
      12-16-2024, 07:57 AM
    • seqadmin
      Recent Advances in Sequencing Technologies
      by seqadmin



      Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

      Long-Read Sequencing
      Long-read sequencing has seen remarkable advancements,...
      12-02-2024, 01:49 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 12-17-2024, 10:28 AM
    0 responses
    33 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 12-13-2024, 08:24 AM
    0 responses
    48 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 12-12-2024, 07:41 AM
    0 responses
    34 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 12-11-2024, 07:45 AM
    0 responses
    46 views
    0 likes
    Last Post seqadmin  
    Working...
    X