We just completed a run where an interesting artifact of GA sequencing surprised us a little. A researcher wanted to sequence a reduced representation of a genome. He digested genomic DNA with AluI, ran the digest out on a gel and isolated fragments in a particular size range. That DNA was prepared for sequencing without further fragmentation. This means that the 5' end of all the DNA strands started with 'CT'. The consequences of this did not occur to us until we saw the images of the first cycle of sequencing. The extreme bias in the base composition of the first two cycles completely flumoxed the cluster identification routines of the Illumina image analysis software. No problem though as I can rerun the analysis from scratch with the pipeline but only using cycles 3-36.
This got me thinking. Suppose you have designed your own multiplex adapter oligos, placing the bar code immediately downstream of the primer site. Depending on the base representation in the mixture of bar codes you are using this could lead to a problem like the one we encountered above, but you can't simply ignore those cycles since you will need those bases to sort the samples.
Have I made any sense at all? Has anyone encountered this situation using bar codes or have you developed protocols to make sure you avoid it?
This got me thinking. Suppose you have designed your own multiplex adapter oligos, placing the bar code immediately downstream of the primer site. Depending on the base representation in the mixture of bar codes you are using this could lead to a problem like the one we encountered above, but you can't simply ignore those cycles since you will need those bases to sort the samples.
Have I made any sense at all? Has anyone encountered this situation using bar codes or have you developed protocols to make sure you avoid it?
Comment