Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cuffdiff v2.2.1 differential expression test bug?

    Hi,

    Can anyone spot if I am doing something wrong. At the moment I am convinced there is a bug. In gene_exp.diff, my control gene (the one that is knocked out) is being reported as not differentially expressed between conditions.

    I have two conditions mutant and wild-type, with 6 repeats for each. ~20 million reads in each repeat. My general workflow has been: use tophat2 to map reads to mm10 using enesmbl annotation -> cufflinks using annotation as guide -> cuffmerge -> cuffdiff using merged.gtf. My cuffdiff command was:

    cuffdiff -o $outdir -p 16 --frag-bias-correct $genome --multi-read-correct --mask-file $maskfile --library-type fr-firststrand $gtf $mut_bams,$wt_bams

    Here is the row for my knocked out gene in gene_exp.dff:
    Code:
    test_id	XLOC_025059
    gene_id	XLOC_025059
    gene	Bhlhe23
    locus	2:180766213-180776900
    sample_1	q1
    sample_2	q2
    status	OK
    value_1	0.0178216
    value_2	20.7402
    log2(fold_change)	10.1846
    test_stat	1.27179
    p_value	0.254
    q_value	0.999841
    significant	no
    Here is the row for the same gene in genes.fpkm_tracking:
    Code:
    tracking_id	XLOC_025059
    class_code	-
    nearest_ref_id	-
    gene_id	XLOC_025059
    gene_short_name	Bhlhe23,Gm14343
    tss_id	TSS46660,TSS46661
    locus	2:180766213-180776900
    length	-
    coverage	-
    q1_FPKM	0.0178216
    q1_conf_lo	0
    q1_conf_hi	0.215611
    q1_status	OK
    q2_FPKM	20.7402
    q2_conf_lo	15.1469
    q2_conf_hi	26.3336
    q2_status	OK
    Here are the relevant columns from genes.group_read_tracking:
    Code:
    condition	replicate	FPKM	status
    q1	1	0.036805	OK
    q1	0	0	OK
    q1	2	0	OK
    q1	3	0	OK
    q1	5	0	OK
    q1	4	0.0701245	OK
    q2	0	19.4827	OK
    q2	1	19.5316	OK
    q2	2	19.6337	OK
    q2	3	16.9464	OK
    q2	4	26.398	OK
    q2	5	22.4491	OK
    As you can see the reported p-value is 0.254 despite large differences between the FPKMs for the different conditions. A simple t-test of these FPKMs gives p-value = 2.025e-05.

    Also, if I use htseq-count (using merged.gtf) -> DESeq2 then I get p-value = 4.6259856864892E-109. What is causing this error? Is it a known bug?

    Thanks for any info.

    I have posted the same question over on google groups: https://groups.google.com/forum/#!to...rs/pWJx6XUB_Bg
    Last edited by edm1; 10-09-2014, 04:39 AM. Reason: Added cross-ref

  • #2
    You really need to assess whether your replicates are as close to each other as you imagine they are. A large empirical change does not mean a statistically significant one necessarily. Within group variation can easily confound the statistical tests.
    Last edited by Bukowski; 10-09-2014, 08:47 AM.

    Comment


    • #3
      Hi, thanks for the reply. Within group variances are very low. In mutants the (95%?) CI is 0-0.216 and wild type is 15.15-26.33. We would expect this to be highly significant.

      Comment


      • #4
        In my opinion, CuffDiff2 is just overly conservative. I have seen this type of counter-intuitive result often. I would put more trust in DESeq2 / edgeR / limma. Also, just about all DE analysis benchmarking papers have concluded that CuffDiff2 performs worse than the above based on ROC curves comparing to gold standard DE genes (from qPCR, prior knowledge, or simulations).

        Edit: Also see the Ballgown paper (http://biorxiv.org/content/early/2014/03/30/003665) for a demonstration and discussion of the excessive conservativeness off CuffDiff2.

        Comment


        • #5
          ./cuffdiff transcripts.gtf tophat_bam_sorted.bam
          You are using Cufflinks v2.2.1, which is the most recent release.
          Error: cuffdiff requires at least 2 SAM files

          I have only 1 bam file...

          Comment


          • #6
            Originally posted by Nanu View Post
            ./cuffdiff transcripts.gtf tophat_bam_sorted.bam
            You are using Cufflinks v2.2.1, which is the most recent release.
            Error: cuffdiff requires at least 2 SAM files

            I have only 1 bam file...
            Then you need one more. Cuffdiff is for differential transcript expression testing. Hence you need two things to look for differences between. So you need two BAM files.

            This should have been posted as a separate question it does not relate to the original topic.

            Comment


            • #7
              As I understood that it needs two .bam file. When I got the data I had .fna and .qual file. I converted into fastq. After that I have one bam file. from where I should get another .bam file.

              Comment


              • #8
                Originally posted by Nanu View Post
                As I understood that it needs two .bam file. When I got the data I had .fna and .qual file. I converted into fastq. After that I have one bam file. from where I should get another .bam file.
                "Differential" part refers to comparing two (or more) samples/conditions/treatments etc. If you only have a single sample, you can't do differential analysis unless you have something else (control/reference) to compare the sample to.

                Comment


                • #9
                  Thank you to all of you..Now I would like to know about the next step what should i perform? On other side I would like to do denovo assembly Will you tell me the softwares to generate the assembly.

                  Comment


                  • #10
                    Cufflinks.

                    You should start new threads for new subjects however.
                    The first post about the apparent bug in Cuffdiff was on a totally different subject.

                    Comment


                    • #11
                      I executed the cufflink, cuffcompare, cuffquant, cuffmerge..then i went onto the cuffdiff..?? I dont have the data for comparison or differential..So, I would like to know about the next step after these steps. because cuffnorm is also need two or more bam/sam files. So, I couldnt exceute it. Now what should be the next step.

                      Comment


                      • #12
                        @Nanu: Perhaps you should start a new thread describing the aim of your experiment and what you have done so far. Otherwise this is becoming a case of blind trying to lead the blind.

                        We need to get an idea of what exactly you are trying to do.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        30 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        32 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        28 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X