Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ruben6um
    Junior Member
    • Jul 2011
    • 1

    Relatively large proportion of "LOWDATA", "FAIL" of FPKM_status running cufflink

    Hello,

    I am relatively new to analyses of RNA-seq. I am right now analyzing human blood data from 22 biological samples using the tophat and cufflinks pipeline. The cufflinks command I used to analysed the *.bam files (generated from tophat using hg19 reference) is

    cufflinks -p 16 -o S01 -G Homo_sapiensPlusChr.GRCh37.63.gtf -b hg19.fa -u -N --compatible-hits-norm ./Sample1_accepted_hits.bam

    Both the genes.fpkm_tracking and the isoform.fpkm_tracking resulting output files generated seem to have a relatively large proportion of "LOWDATA" and "FAIL" calls for the FPKM_status attribute.

    This proportion of these calls seems similar (~30%) across the multiple samples and also the genes getting these calls seem almost the same again across the multiple samples.

    I am not sure if I am doing something wrong - or if this is the expected behavior of the algorithms. I am hoping that I am (or the algorithms) are doing something incorrect.

    We have matching microarray data generated from these samples. Some of the highly expressing genes from the microarray data get this "FAIL" status even though the FPKM values seem relatively high.

    Any help would be appreciated.

    Thanks,
    -Reuben
  • Jon_Keats
    Senior Member
    • Mar 2010
    • 279

    #2
    I've encountered the same issue recently with the newest version. I'm actually testing if it is a package distribution issue right now as old cufflinks version produced values. In my case I came across a dianosis/relapse pair were TP53 went from FPKM 50 to 0 but really identical but for some reason the status is Fail in the relapse sample. In my case this occurred using the pre-compiled linux binary but in the test I did on my laptop with 10 million reads it worked on my Mac laptop with a version compiled from source. So now testing the entire set of reads on our Mac workstation to compare the output from our linux HPC resource

    Comment

    • Jon_Keats
      Senior Member
      • Mar 2010
      • 279

      #3
      In my hands the "FAIL" result is completely inconsistent. For TP53, with the full dataset (37.34 million read pairs) you get a "FAIL" but with the first 10 million pairs FPKM of 42 with an "OK". The odd thing is there is a clear correlation with the number of reads as there were 5546 genes listed as "FAIL" using 10 million reads and 8059 with the full dataset.... To make matters worse the number of "FAIL" genes moves around as you change the normalization parameters. Right now I'm not too impressed and I think I'll go back to calculating FPKM by hand and using other packages. I can understand there being computation issues with fragment handling but at the gene level it seems pretty straight forward.

      Comment

      • oliviera
        Member
        • Apr 2010
        • 31

        #4
        Dear Jon,
        Which package would you suggest as an alternative? I am not convince I can use DESeq or EdgeR with paired end data. What do you think?

        Olivier

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        10 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        45 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        105 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        125 views
        0 reactions
        Last Post SEQadmin2  
        Working...