many biological replicates - 'traditional' statistics vs Cuffdiff or DESeq/edgeR?

I've read the discussions from Simon Anders and Lior Pachter intensively on these forums and understood them as best I can without extensive graduate training in mathematics or statistics :-) I grasp that correctly modeling the dispersion between biological replicates for a given treatment condition is vital for comparisons of gene expression between different treatment conditions, and that is more challenging for lower numbers of replicates.

Before I was aware of Cuffdiff or DESeq (and actually before Cuffdiff came out), I aligned RNASeq reads using tophat and cufflinks to gain FPKM values on a 'gene-level' basis (I wasn't interested in individual isoforms). I had 6-8 biological replicates per condition, and used a traditional statistics program (Partek Genomics Suite in this case) to determine p-values between conditions and to determine a false discovery rate.

I am in the process of comparing this approach to edgeR / DESeq to see whether comparisons between groups are generally similar. However, is there an 'a prior' statistical/mathematical reason why comparing FPKM-type data (for which, in my hands, the variance between biological replicates is normally distributed AND the variance appears to be higher when the FPKM is lower, reasonably suggesting more variance with lower read counts) between conditions is invalid or misleading?

Thanks in advance for any thoughts.
