I think a simpler way to do it...just get all the reads from a particular set of tiles (I don't know how that translates on HiSeq machines) Since the first few and last few tiles are bad, pick some tiles in the middle, by grepping the read names, which have teh cluster cooridnates embedded in them. Or zgrep, if the fastqs are already zipped. That should be random with respect to where the read align to the genome, with better quality than picking the top and bottom tiles with head or foot.
But yes, this won't work on multi-line fastqs.
But yes, this won't work on multi-line fastqs.
Comment