Hi,
Can anyone spot if I am doing something wrong. At the moment I am convinced there is a bug. In gene_exp.diff, my control gene (the one that is knocked out) is being reported as not differentially expressed between conditions.
I have two conditions mutant and wild-type, with 6 repeats for each. ~20 million reads in each repeat. My general workflow has been: use tophat2 to map reads to mm10 using enesmbl annotation -> cufflinks using annotation as guide -> cuffmerge -> cuffdiff using merged.gtf. My cuffdiff command was:
cuffdiff -o $outdir -p 16 --frag-bias-correct $genome --multi-read-correct --mask-file $maskfile --library-type fr-firststrand $gtf $mut_bams,$wt_bams
Here is the row for my knocked out gene in gene_exp.dff:
Here is the row for the same gene in genes.fpkm_tracking:
Here are the relevant columns from genes.group_read_tracking:
As you can see the reported p-value is 0.254 despite large differences between the FPKMs for the different conditions. A simple t-test of these FPKMs gives p-value = 2.025e-05.
Also, if I use htseq-count (using merged.gtf) -> DESeq2 then I get p-value = 4.6259856864892E-109. What is causing this error? Is it a known bug?
Thanks for any info.
I have posted the same question over on google groups: https://groups.google.com/forum/#!to...rs/pWJx6XUB_Bg
Can anyone spot if I am doing something wrong. At the moment I am convinced there is a bug. In gene_exp.diff, my control gene (the one that is knocked out) is being reported as not differentially expressed between conditions.
I have two conditions mutant and wild-type, with 6 repeats for each. ~20 million reads in each repeat. My general workflow has been: use tophat2 to map reads to mm10 using enesmbl annotation -> cufflinks using annotation as guide -> cuffmerge -> cuffdiff using merged.gtf. My cuffdiff command was:
cuffdiff -o $outdir -p 16 --frag-bias-correct $genome --multi-read-correct --mask-file $maskfile --library-type fr-firststrand $gtf $mut_bams,$wt_bams
Here is the row for my knocked out gene in gene_exp.dff:
Code:
test_id XLOC_025059 gene_id XLOC_025059 gene Bhlhe23 locus 2:180766213-180776900 sample_1 q1 sample_2 q2 status OK value_1 0.0178216 value_2 20.7402 log2(fold_change) 10.1846 test_stat 1.27179 p_value 0.254 q_value 0.999841 significant no
Code:
tracking_id XLOC_025059 class_code - nearest_ref_id - gene_id XLOC_025059 gene_short_name Bhlhe23,Gm14343 tss_id TSS46660,TSS46661 locus 2:180766213-180776900 length - coverage - q1_FPKM 0.0178216 q1_conf_lo 0 q1_conf_hi 0.215611 q1_status OK q2_FPKM 20.7402 q2_conf_lo 15.1469 q2_conf_hi 26.3336 q2_status OK
Code:
condition replicate FPKM status q1 1 0.036805 OK q1 0 0 OK q1 2 0 OK q1 3 0 OK q1 5 0 OK q1 4 0.0701245 OK q2 0 19.4827 OK q2 1 19.5316 OK q2 2 19.6337 OK q2 3 16.9464 OK q2 4 26.398 OK q2 5 22.4491 OK
Also, if I use htseq-count (using merged.gtf) -> DESeq2 then I get p-value = 4.6259856864892E-109. What is causing this error? Is it a known bug?
Thanks for any info.
I have posted the same question over on google groups: https://groups.google.com/forum/#!to...rs/pWJx6XUB_Bg
Comment