View Single Post
Old 11-02-2017, 07:15 AM   #3
Senior Member
Location: Stanford

Join Date: Jun 2009
Posts: 181

Both great questions.

Originally Posted by luc View Post
Why should the Smart oligos be 5' biotinylated?
Well, first of all, it has nothing to do with streptavidin, which always confuses people, and that's why our diagram (fig. S1 in Supplemental File 1) just calls it a blocking group. Biotin is actually just a very cheap modification to stick on custom oligos. When it's at the 5' end of the primers in template-switching reverse transcription, it prevents the formation of concatamers: if you didn't have it, then at the ends of your ds-cDNA with both adapters on it, you might just extend additional C-tails, and then additional G-overhang primers could come in and anneal again, until by the end your molecule is a series of duplicated adapters at both ends. Biotin seems to be sufficient to stop this, presumably just by steric hindrance. Another solution people have used in the literature is unnatural deoxynucleotides (iso-dC and iso-dG) at the 5' end, which prevent the extension of the unwanted 3' C-tail by maintaining a 3' underhang as there are no complementary dNTPs to pair with them. However, those are a lot more expensive.

Originally Posted by luc View Post
How much of a balanced library do you need to spike in to sequence through the template-switching Cs without running into read quality problems?
Actually we have some degenerate bases (N) between the sequencing adapter and the G-overhang, so the resulting reads all begin with NNNNNGGG before the unique cDNA sequence (fig. S3A). This guarantees sequence diversity in the cycles used for cluster registration on Illumina sequencers, so we only use a 1% PhiX spike-in, and that's just in case the sequencer fails and we need to troubleshoot. The NNNNN also serves as a short UMI.

There can still be some funny business at the beginning of the read, because for three cycles after cluster registration you'll only have G. On the two-color Illumina sequencers in particular, this is read as no signal. So the per-cycle quality graphs look weird. I'm not sure what consequences this has for the 9th cycle, but another thing that can happen is you get more than 3 G's at the beginning, because the C-tailing activity of MMLV RTase doesn't guarantee a particular length. This again screws up the quality metrics, but in practice it doesn't do much harm to the final data because modern read aligners (we used STAR) "soft-clip" non-matching bases at the ends of the read.
jwfoley is offline   Reply With Quote