Hi Alex,

To my knowledge the Illumina pipeline performs its crosstalk matrix and phasing/prephasing calibration during the first 4 cycles by default, and this can be altered with --matrix-cycles=n. Similar to using many cycles for cluster detection this will probably mean that the workstation PC will need to store more Images until the intitial calibration calculations are done, at which point the real-time data analysis will start. Using lots of cycles will cause a back-log on the workstation, but this should be manageable for at least 10 or so cycles I would think (at elast on a GA, not so sure about the HiSeq as it generates so much more data).

You can avoid these problems by specifying a control-lane with a relatively normal base composition (--control-lane=..), such as a lane of PhiX or whole genome shotgun sequencing. Alternatively it is also possible not to perform calibration on the sample and use a pre-formatted calibration table (probably slightly different ones for GA and HiSeq).

Something else you should consider is that you might potentially lose a certain amount of data because the cluster detection does not work normally if you have low-diversity at the start of sequences, and this is completely independent of a skewed base composition. This depends mainly on the number of barcodes you have in your sample, and the cluster density. In summary, the fewer barcodes and the higher you cluster density the more data you are likely going to lose. Please refer to this post for more information (, or send me an email if you have any further questions.
