Hello everybody,
I'm writing here because I've noticed that cuffdiff 1.1.0 and 1.0.2 give really different results when it comes to DE testing.
This is what I get with 1.1.0
While this is what I get with 1.0.2:
All default options for the two versions appear to be the same, the command line options are exactly the same and the files (gtf, fa, bam) are the same.
As you can see, the number of significant genes changes quite a lot.
I think this might be because of the fact that (from the 1.1.0 release notes)
but still I wouldn't expect such a big difference.
Is there anyone who had similar problems? Any idea if this is expected behaviour or if there's something wrong?
Many thanks in advance for any feedback!
tom
I'm writing here because I've noticed that cuffdiff 1.1.0 and 1.0.2 give really different results when it comes to DE testing.
This is what I get with 1.1.0
Code:
cuffdiff -p 4 -o . -N --emit-count-tables --frag-bias-correct /disk1/tl344/rep/iGenome/Mus_musculus/Ensembl/NCBIM37/Sequence/WholeGenomeFasta/genome.fa -M /disk1/tl344/rep/UCSC.chrM.renamed.gtf /disk1/tl344/cuff/illumina/cuffcompare/cuffcompare.combined.gtf /disk1/tl344/mapped_reads/sample1_properlyPaired2.bam /disk1/tl344/mapped_reads/sample2_properlyPaired2.bam/disk1/tl344/mapped_reads/sample3_properlyPaired2.bam & tl344@ram--bio:~/cuff/illumina/cuffdiff$ awk '$5=="q1" && $6=="q2"' gene_exp.diff|cut -f7 |sort |uniq -c 434 FAIL 1 HIDATA 8338 LOWDATA 34844 NOTEST 4474 OK tl344@ram--bio:~/cuff/illumina/cuffdiff$ awk '$5=="q1" && $6=="q2" && $14=="yes"' gene_exp.diff |wc -l 175
Code:
tl344@ram--bio:~/cuff/illumina/cuffdiff_1.0.2$ awk '$5=="q1" && $6=="q2"' gene_exp.diff|cut -f7 |sort |uniq -c 379 FAIL 32 LOWDATA 16460 NOTEST 31220 OK tl344@ram--bio:~/cuff/illumina/cuffdiff_1.0.2$ awk '$5=="q1" && $6=="q2" && $14=="yes"' gene_exp.diff |wc -l 1534
As you can see, the number of significant genes changes quite a lot.
I think this might be because of the fact that (from the 1.1.0 release notes)
Cuffdiff now includes a more sophisticated check for sufficient sequencing depth prior to testing for differences, which substantially improves the accuracy of differential expression analysis in loci with low to medium depth.
Is there anyone who had similar problems? Any idea if this is expected behaviour or if there's something wrong?
Many thanks in advance for any feedback!
tom
Comment