Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Minimal FPKM values for analysis in Cufflinks

    I'm using Cufflinks with Cuffdiff to compare some RNA-sequencing datasets. I'm wondering what, if any, is the lower limit for comparing FPKM values?

    I have some FPKMs that are <10, but the genes are called as significantly differentially expressed between samples. Can I trust this result?

    There were approximately 30M total reads per sample.

    Edit: also, how come I have some genes with apparently more reads mapped, and differential expression, but no significance? E.g. FPKMs: 17.86 68.89 (0 vs 1, not significant).
    Last edited by AdamB; 09-14-2010, 06:11 AM.

  • #2
    I also thought doing such thresholding should be a considerable idea. For me it turned out, it's not. Because applying any more or less arbitrary thresholds to FPKM values is 1st difficult to explain in subsequent paper and 2nd contributes only questionable improvements. In my opinion, if at all, you should sort by adjusted p-values. In current cufflinks release this is for sure difficult, due to adjP==0 for any gene/transcript/etc. for that either FPKM value equals 0, while the other does not (no matter of the others magnitude - could be 1e+1 or even 1e+6).

    Furthermore cuffdiffs' significance flag is not derived from any FPKM-dispersion estimation (as it could be done by taking replicates into account) and thus, the question, whether you can trust a signFlag=[yes/no] will remain, whatever kind of postprocessing you apply to the cuffdiff-stats.

    As i see it (honestly: this is my personal opinion, which is likely to be inaccurate), Cufflinks is currently very limited when it comes to statistical conclusions. It is however the only tool out there (and thats really a big deal!) that can assess FPKM in non-reference-annotated-regions (hence de-novo) so easily.

    Cheers,
    Uwe

    Comment


    • #3
      Dear Uwe,

      We have just released Cufflinks 0.9 which addresses some of the issues you wrote about, and we would appreciate any feedback.

      Comment


      • #4
        Hi,
        I have the same question about how cufflinks do the statistic test for DE genes. I am using Cufflink 2.1.1 version. I got a problem where many genes have very large-fold change in RPKM values but still NO significant.
        test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
        ENSMUSG00000047139 ENSMUSG00000047139 Cd24a 10:43579168-43584262 q1 q2 OK 96.2585 2700.55 4.8102 1.6486 0.03995 0.078237 no
        ENSMUSG00000066975 ENSMUSG00000066975 Cryba4 5:112246492-112252518 q1 q2 OK 424.582 46190.2 6.7654 0.598327 0.3408 0.442128 no

        I checked the read_group_tracking file, There is not much variations in RKPM values between replicates in each group. Is there anyone know how to explain this? Any suggestion will be appreciated.
        Thanh

        Comment


        • #5
          The explanation is potentially complex. Unlike other DE testing tools, like DESeq, cuffdiff includes knowledge about the certainty of the alignments along with the dispersion estimates it uses to model the variance associated with each gene. Whether this is a good method or not isn't clear. One one hand it's taking a look at more of the available information than any other DE tool I've used however I find it also tends to generate very different results compared to other tools (seemingly illogically at times such as your example) and even when compared to different versions of itself.
          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
          Salk Institute for Biological Studies, La Jolla, CA, USA */

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          31 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Working...
          X