Applied Biosystems has just launched their instrument, which supports their version of high-throughput sequencing chemistry, termed “SOLiD™” (little “i”, please). Acquired from Agencourt Personal Genomics in late 2006, SOLiD is a unique parallel chemistry which enables simultaneous sequencing of thousands of individual DNA molecules.
Here I will present a brief overview of the technology, aimed at those who haven’t had time to become intimately familiar with the chemistry. Figures and information taken directly from this presentation from ABI’s website.
Sequencing on the SOLiD machine starts with library preparation. In the simplest fragment library, two different adapters are ligated to sheared genomic DNA (left panel of Fig. 1). If more rigorous structural analysis is desired, a “mate-pair” library can be generated in a similar fashion, be incorporating a circularization/cleavage step prior to adapter ligation (right panel of Fig.1).
Figure 1. Library generation schematic.
Once the adapters are ligated to the library, emulsion PCR is conducted using the common primers to generate “bead clones” which each contain a single nucleic acid species.
Figure 2. Clonal bead library generation via emulsion PCR.
Each bead is then attached to the surface of a flow cell via 3’ modifications to the DNA strands.
Figure 3. Depositing beads into flow cell via end modifications.
At this point, we have a flow cell (basically a microscope slide that can be serially exposed to any liquids desired) whose surface is coated with thousands of beads each containing a single genomic DNA species, with unique adapters on either end. Each microbead can be considered a separate sequencing reaction which is monitored simultaneously via sequential digital imaging. Up to this point all next-gen sequencing technologies are very similar, this is where ABI/SOLiD diverges dramatically (see figure 4).
Figure 4. Schematic of ABI SOLiD v2.0 sequencing chemistry. SOLiD 2.0 chemistry utilizes 1/2 encoding (meaning bases 1/2 of the probe are the specific bases linked to the colorspace calls). The original version of the chemistry used 4,5 encoded probes.
The actual base detection is no longer done by the polymerase-driven incorporation of labeled dideoxy terminators. Instead, SOLiD uses a mixture of labeled oligonucleotides and queries the input strand with ligase. Understanding the labeled oligo mixture is key to understanding SOLiD technology.
Each oligo has degenerate positions at bases 3-5 (N’s), and one of 16 specific dinucleotides at positions 1-2 (numbered from the 3' end). Positions 6 through the 5’ are also degenerate (likely inosine, not confirmed), and hold one of four fluorescent dyes. The sequencing involves:
- Anneal a primer, then hybridize and ligate a mixture of fluorescent oligos (8-mers) whose 1st & 2nd 3' bases match that of the template
- Capping unextended fragments with the same mixture of nonfluorescent probes
- Phosphatase treatment to prevent any remaining unextended strands from contributing to out of phase ligation events
- Detection of the specific fluor
- Removal of fluor via two step chemical cleavage of the three 5' bases. This leaves behind a 5 base ligated probe, with a 5' phosphate
- Repeat, this time querying the 6th & 7th bases
- After 5-7 cycles of this, perform a “reset”, in which the initial primer and all ligated portions are melted from the template and discarded.
- Next a new initial primer is used that is N-1 in length. Repeating the initial cycling (steps 1-5) now generates an overlapping data set (bases 1/2, 6/7, etc, see Fig 4, Step 8 above).
Thus, 5-7 ligation reactions followed by 5 primer reset cycles are repeated generating sequence data for ~35 contiguous bases, in which each base has been queried by two different oligonucleotides.
If you’re doing the math you’ve realized there are 16 possible dinucleotides (4^2) and only 4 dyes. So data from a single color call does not tell you what base is at a given position. This is where the brilliance (and potential confusion) comes about with regard to SOLiD. There are 4 oligos for every dye, meaning there are four dinucleotides that are encoded by each dye.
Figure 5. Schematic of dibase encoding, and how it relates to calling the actual template sequence
For example (see Fig.5), the dinucleotides CA, AC, TG, and GT are all encoded by the green dye. Because each base is queried twice it is possible, using the two colors, to determine which bases were at which positions. This two color query system (known as “color space” in ABI-speak) has some interesting consequences with regard to the identification of errors. A detailed explanation of color space and it’s unique issues can be found in the PDF files attached to this post (“2Base_Pair_SOLiD_Data_V1.pdf” and "SOLiD_Dibase_Sequencing_and_Color_Space_Analysis.pdf").
One of the side effects of this dual encoding is that when aligning to a reference and attempting to determine variants...true variants will follow specific color change "rules" as defined below in Figure 6.
Figure 6. Colorspace valid variant rules
Detection of a true SNP is reflected by changes in two adjacent colorspace calls, which must follow the rules above. Figure 7 below gives some examples of this principle in examining alignments.
Figure 7. Colorspace examples
Hopefully that gives you a brief introduction to ABI’s SOLiD technology.
____________________________________________
EDIT May 2008: SOLiD 2.0 has been released.
EDIT Sept 2008: This post has been updated entirely for v2.0 chemistry
Comment