Regarding the replicates, you are thinking of that correctly. The important part to replicates is to replicate around your largest source of experimental variation which is usually (not always) biological. For the comment on 2-4, I would change that to 3-5. 2 imo is never an option and really is no better than 1.

For read length and paired vs single, there are a few publications out there now that state that short single is sufficient. The RSEM paper describes this as well. We did a little study where we had 101 PE data from mouse and in silico created a set of data sets that ranged from 36 cycle SE, 36 cycle PE, up to the full data set including partial read subsets to explore multiplexing possibilities. We looked at our sensitivity to splice variants and detection of known transcript d/dx. What we found was that somewhere between 50 and 76 cycle SE was the optimum which includes a little personal bias towards longer reads. The multiplexing question is a bit more ambiguous so we really don't (yet? not sure) have a good handle on that. What we have been telling people is that if you have to choose between long and more, choose more.

On the MiSeq vs GA, for the MiSeq, you will be doing 2-3 at a time for 2-3M reads per replicate while if Yongde has a good run, you should be able to do all 6 (thinking triplicates) in one go and get 2-3M+ per replicate. Tell your core you want >30M reads.

