I tried to run TopHat/Cufflinks using Illumina reads without and without quality-filtering. I noticed that the results of differential expression analysis are significantly different. For example, a gene that was highly DE when reads were not pre-filtered, was absolutely unchanged when reads were pre-filtered.
Why would the removal of BAD quality reads affect the results so dramatically?
Thanks!
Note: prefiltering = remove the pairs where one of the mate has all bases with quality < 2.
Why would the removal of BAD quality reads affect the results so dramatically?
Thanks!
Note: prefiltering = remove the pairs where one of the mate has all bases with quality < 2.