Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks and CuffDiff bugs?

    <##I posted the following as a reply thread, but I would like to make a new thread to have more attentions to solve the problems, thanks!##>

    1. New released 1.3.0, after Cuffcompare, FPKM column contains all 0, missing FPKM values even tracking files have them;

    2. in all the versions of CuffDiff, if you compare different conditions against the same control samples, the FPKM in the same control samples in different comparing is different; for example,
    CuffDiff I: condition 1 v.s. condition control;
    CuffDiff II: condition 2 v.s condition control;

    after CuffDiff, when FPKM numbers are tracked, the FPKM of Gene X in condition control in CuffDiff I is different from the FPKM of Gene X in condition control in CuffDiff II. GeneX roughly are 20-30% in total annotated genes and the rest are the same.
    anybody has explanation or suggestions for this? Thanks!
    Last edited by lewewoo; 06-06-2011, 08:18 AM.

  • #2
    i've also been hoping for a response to this thread (as well as the other thread you posted this question in).

    cufflinks 1.0.3 is not giving FPKM values other than zero for paired-end reads from SOLiD. 1.0.3 works fine with single-end data. the same paired-end data runs fine through cufflinks 0.9.3 and FPKM values are calculated just fine.

    Code:
    cufflinks --output-dir $out --num-threads 8 --GTF-guide $gtf --multi-read-correct --library-type fr-secondstrand --upper-quartile-norm --label l --frag-bias-correct $hg19All.fa $bam
    
    (assume my variable references are correct)
    anyone have any ideas to try as a workaround? anyone else having similar issues?

    Comment


    • #3
      Seems lot of confusing changes in cufflinks. Have not able to find a fix yet.

      Comment


      • #4
        I was having a similar problem as stated in #1 by lewewoo -- cufflinks was not generating accurate FPKMs. Specifically, they were all zero.

        Cufflinks 1.0.3 using SOLiD pair-ends reads at 50 bp x 35 bp mapped using Bioscope.

        1. Add XS flag as per Cufflinks manual
        Code:
        samtools view -F 0x04 -h unedited.bam | awk 'BEGIN{OFS="\t"} (!/^@/){minus=and($2, 0x10); print $0"\tXS:A:"(minus ? "-":"+") } (/^@/){ print }' | samtools view -bhS - > xs.bam
        This runs through Cufflinks and gives FPKM = 0 for everything.

        2. Increment NH flag by 1 as per Cufflinks developer Adam Roberts
        Code:
        samtools view -F 0x04 -h xs.bam | awk 'BEGIN{OFS="\t"}(! /^@/){ split($12,a,":"); $12 = a[1]":"a[2]":"a[3]+1; print $0 } (/^@/){ print }' > xs.nh.sam
        This seems to be working, but I don't have the output of a full run yet.

        Comment


        • #5
          I'm seeing the same thing as lewewoo #2 statement.

          I am getting different FPKM values for the same control used against two different samples in two different cuffdiff runs. Is this expected? Does cuffdiff consider all samples provided to calculate the FPKM? If this is true - what is the best workflow for getting FPKM values for samples that you want to performer further analysis on outside of cufflinks suite? Should I run cufflinks on individual samples and work with those FPKMs or should I put all samples I'm interested in analyzing into cuffdiff and use those FPKMs since they might be normalized across samples?

          Any suggests or ideas to what is happening would be great!!
          Thanks!

          Comment


          • #6
            The FPKMs should have normal ranges included. Do those ranges overlap?

            Comment


            • #7
              Good point. Thanks, gringer - quick look and the ranges do seem to overlap. I did a scatter plot and there is concordance between values with a very tight spread at extremes but quite a big spread at the middle. I guess I just expected much more agreement across the range - especially since it is the same sample.

              Sorry, to ask again - but does this mean that cuffdiff does not consider both samples when calculating FPKM? (I assume this but not 100% positive this assumption is correct) What would be the recommended workflow to just get FPKM values for further analysis? Can I use cuffdiff (maybe with all the samples analyzed together if some cross sample normalization is occurring) or should I use cufflinks? BTW - I should mention - I was not using -N option (quantile normalization) in cuffdiff.

              Thanks so much for the help!! This has been a big source of discussion - that is which approach to take to get FPKMs. Really appreciate it!
              Last edited by jaldrich; 07-14-2011, 09:21 AM.

              Comment


              • #8
                I would recommend using cuffdiff for analysing FPKM, because the FPKM calculations may make assumptions that are not obvious to the people who didn't write the cufflinks/cuffdiff code.

                It's probably worth having a look at a couple of runs to see the difference with and without quantile normalisation. I would expect that cufflinks is "good enough" without this, because they haven't included it as a default option even though it's relatively simple to calculate.

                There's a bit of information on how things are calculated on the cufflinks website:

                Cuffdiff calculates the FPKM of each transcript, primary transcript, and gene in each sample. Primary transcript and gene FPKMs are computed by summing the FPKMs of transcripts in each primary transcript group or gene group.

                Cuffdiff requires that transcripts in the input GTF be annotated with certain attributes in order to look for changes in primary transcript expression, splicing, coding output, and promoter use.... The above attributes, along with the gene_id required by the GTF specification, make each transcript a member of a "gene group", "primary transcript group", and "CDS group".
                And later...
                Cuffdiff pools the fragments before calculating the individual isoform abundances and then examines the likelihood surface of the replicate pool via importance sampling.
                Note the magic word right at the end of that, sampling. This suggests that you should expect slightly different results by running cuffdiff on the same data (it is unlikely that the sampling will be done in exactly the same way on each run).

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                27 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                31 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                27 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X