SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Sample Prep / Library Generation (http://seqanswers.com/forums/forumdisplay.php?f=25)
-   -   Illumina Unique Molecular Identifier Adaptor (http://seqanswers.com/forums/showthread.php?t=78554)

Andersen 10-11-2017 05:45 AM

Illumina Unique Molecular Identifier Adaptor
 
Hi All,

I want to generate RRBS libraries where we can track each unique molecule with a UMI. Thus I have generated new TruSeq adaptors that we normally use.

The regular Truseq primers look like this:
A1_P5: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC*T
A1_P7 (AR005): ℗-GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG

What we did was to add 8 random nucleotides to the P5 so that it looks like this:
A1_P5 UMI8: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNN*T

From what i heard I should anneal the two adaptors. The annealing effeciency can be seen here: https://imgur.com/a/70LZU

What I did now was to try our regular protocol with old adaptors vs new adaptors.

The libraries look like this: https://imgur.com/a/mHZeF

Stupidly I did not generate another P5 without the 8 UMI's.

It seems like the adaptors form adaptor dimers, but for the library it seems like it does not bind.

Suggestions of other designs or ways to get it to work would be highly appreciated.

I have looked into the dual indexes and then exchanging one index with UMI's instead, but could not find the sequences, do any of you have them?

Best regards
Emil

pmiguel 10-11-2017 09:01 AM

Is this a Y-adapter design? If so, the P7 and P5 need to anneal over the last 12 bases of the P5/first 12 bases of the P7, with a 3' "T" overhang to work with many Illumina work flows. When you add 8 N's and a T to the P5, you create a 9 base 3' overhang.
No chance you can ligate that to anything using a double stranded DNA ligase.

I suppose the opposite design, where you put the 8 N's at the 5' end of P7 could work. But you would have to anneal the P5 and P7 by their 12 bases of complementarity and then extend the 3' end of the P5 strand using klenow, for example. But that would give you a blunt adapter. So your inserts would need to be blunt as well. Which would allow chimeric inserts and the formation of massive adapter dimers.

BTW, Illumina asks for the following:
Oligonucleotide sequences 2017 Illumina, Inc. All rights reserved.
and
Oligonucleotide sequences 2017 Illumina, Inc. All rights reserved. Derivative works created by
Illumina customers are authorized for use with Illumina instruments and products only. All other
uses are strictly prohibited.


to be added to any sequence information derived from their adapter sequences if they are distributed or published outside ones institution.

The correct position to place a UMI in the P5 index site would be after the TCTACAC. That is:
AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT

But, because 8 N's can compose, at most, 4^8 ~= 64K sequence combinations, they would not be "unique" in the context of a sample producing millions of sequences. And, I don't know how far you can push the length of the i5 index. Probably 12, at least. But going beyond 8 bases you would need to subtract sequence from the sequence reads to make up for the reagents you would be using to generate a longer UMI.

--
Phillip

Andersen 10-11-2017 10:23 AM

Thank you so much for your answer Phillip.

1. I am actually not really sure if it is a Y-adapter design. But I am pretty sure that it is. I tried to find information about it but couldn't, but it is the same adapter as used for TruSeq LT.

2. Thank you for suggesting not to go with blunt inserts. ;)

3. Thank you for that suggestion. Do you know if any of the current kits from Illumina uses Dual indexing where you also have the Y-adaptor setup as I presumably have? I would believe that this setup would actually be the best setup to run since I would be sure to have intact annealing at the complementary 12 bases.

Thank you for your answers
Kind regards
Emil


Quote:

Originally Posted by pmiguel (Post 211727)
Is this a Y-adapter design? If so, the P7 and P5 need to anneal over the last 12 bases of the P5/first 12 bases of the P7, with a 3' "T" overhang to work with many Illumina work flows. When you add 8 N's and a T to the P5, you create a 9 base 3' overhang.
No chance you can ligate that to anything using a double stranded DNA ligase.

I suppose the opposite design, where you put the 8 N's at the 5' end of P7 could work. But you would have to anneal the P5 and P7 by their 12 bases of complementarity and then extend the 3' end of the P5 strand using klenow, for example. But that would give you a blunt adapter. So your inserts would need to be blunt as well. Which would allow chimeric inserts and the formation of massive adapter dimers.

BTW, Illumina asks for the following:
Oligonucleotide sequences 2017 Illumina, Inc. All rights reserved.
and
Oligonucleotide sequences 2017 Illumina, Inc. All rights reserved. Derivative works created by
Illumina customers are authorized for use with Illumina instruments and products only. All other
uses are strictly prohibited.


to be added to any sequence information derived from their adapter sequences if they are distributed or published outside ones institution.

The correct position to place a UMI in the P5 index site would be after the TCTACAC. That is:
AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT

But, because 8 N's can compose, at most, 4^8 ~= 64K sequence combinations, they would not be "unique" in the context of a sample producing millions of sequences. And, I don't know how far you can push the length of the i5 index. Probably 12, at least. But going beyond 8 bases you would need to subtract sequence from the sequence reads to make up for the reagents you would be using to generate a longer UMI.

--
Phillip


pmiguel 10-11-2017 01:09 PM

Hi Emil,
Most TruSeq Illumina kits use the Y-adapters. The common "TruSeq" DNA and RNAseq ones will offer a normal single index kit option (usually going up to 24 indexes) or a "high thoughput" one with dual indexes allowing you to multiplex 96 samples in a lane. Well, as long as you don't need them to be "unique dual".

The parts of the dual index Y-adapters containing the indexes do not anneal! Only the 12 bases of the Y-adapter most proximate to the insert are annealled. The rest of the adapter has non-complementary sequence that won't anneal and hangs off like a forked tail. Get it? "Y" where the double-stranded part is the stem of the "Y" and the single stranded tails are the top part. It is only during subsequent PCR that this tail region becomes double stranded.

This is one of those brilliant solutions to a bunch amplicon construction issues when using ligation. You need to have a forward and a reverse adapter. But if you just make them both double-stranded, so that T4 DNA ligase will use them as a substrate, you create 2 problems. First some of your constructs will get forward adapters on both ends or reverse adapters on both end and won't be suitable templates for clustering. Second with a double-stranded molecule you have two ends that can be ligated -- possibly to each other.

So some fiendish genius came up with the idea of annealing single-stranded versions of both the left and right adapters to each other such that only one end was actually annealed. This way each double stranded insert was guaranteed to have an R adapter on one end and an F adapter on the other end. Actually, each strand of the the double stranded insert would have a single stranded R on one end and F on the other--the top strand in one orientation (say, F-insert-R) the bottom strand in the other orientation (R-insert-F).

--
Phillip

nucacidhunter 10-11-2017 01:33 PM

1 Attachment(s)
Sequence and structure of TruSeq HT adapters is attached.

You would need to substitute i5 sequences with N to use as UMI.

Other option you might consider is: http://www.nugen.com/products/ovatio...hyl-seq-system

It has 6 UMI base which follows the index read so the index 1 read has to be 12 cycles to utilize UMI or 6 cycles just for the index. Other advantage is that they have included diversity nucleotides and libraries can be sequenced with 1% PhiX spike in. In the conventional protocol higher PhiX (>30%) is required.

torben 10-12-2017 01:11 AM

For a way to create TruSeq adapters with UMI at the end see Kennedy SR, et al. Detecting ultralow-frequency mutations by Duplex Sequencing. Nat Protoc. 2014 Nov;9(11):2586-606. doi: 10.1038/nprot.2014.170. https://www.nature.com/nprot/journal....2014.170.html

Andersen 10-13-2017 03:09 AM

Thanks again Phillip!

Indeed they must be some fiendish geniouses. :D

I have allready generated the TruSeq DNA LT adapter piece with a 6 nt index. Do you think it would work to anneal the i5 adapter to this adapter or should i generate new i7 adapters aswell?

Also to nucacidhunter and torben, thanks for the suggestions!

Cheers
Emil

Quote:

Originally Posted by pmiguel (Post 211737)
Hi Emil,
Most TruSeq Illumina kits use the Y-adapters. The common "TruSeq" DNA and RNAseq ones will offer a normal single index kit option (usually going up to 24 indexes) or a "high thoughput" one with dual indexes allowing you to multiplex 96 samples in a lane. Well, as long as you don't need them to be "unique dual".

The parts of the dual index Y-adapters containing the indexes do not anneal! Only the 12 bases of the Y-adapter most proximate to the insert are annealled. The rest of the adapter has non-complementary sequence that won't anneal and hangs off like a forked tail. Get it? "Y" where the double-stranded part is the stem of the "Y" and the single stranded tails are the top part. It is only during subsequent PCR that this tail region becomes double stranded.

This is one of those brilliant solutions to a bunch amplicon construction issues when using ligation. You need to have a forward and a reverse adapter. But if you just make them both double-stranded, so that T4 DNA ligase will use them as a substrate, you create 2 problems. First some of your constructs will get forward adapters on both ends or reverse adapters on both end and won't be suitable templates for clustering. Second with a double-stranded molecule you have two ends that can be ligated -- possibly to each other.

So some fiendish genius came up with the idea of annealing single-stranded versions of both the left and right adapters to each other such that only one end was actually annealed. This way each double stranded insert was guaranteed to have an R adapter on one end and an F adapter on the other end. Actually, each strand of the the double stranded insert would have a single stranded R on one end and F on the other--the top strand in one orientation (say, F-insert-R) the bottom strand in the other orientation (R-insert-F).

--
Phillip


pmiguel 10-13-2017 07:48 AM

Hi Emil,
I would strongly recommend that you verify this yourself by aligning your p5 and the reverse (in the 3' - 5' direction) of your p7 sequence. You will see the terminal 12 bases on one side are complements of each other with just a 3' "T" overhang provided by the p5 oligo.
Once you have done that, you will understand how a Y-adapter is structured to function as it does.
--
Phillip

Andersen 10-15-2017 11:01 PM

Thank you for all your help! :)

I have now ordered the adapters and hope they will work!

nucacidhunter 10-15-2017 11:33 PM

I hope that you have asked all C residues to be synthesized with mC to prevent C conversion to U during bisulfite treatment (which is very expensive) unless you are using techniques that does not require mC in adapters.

Andersen 10-15-2017 11:36 PM

Quote:

Originally Posted by nucacidhunter (Post 211857)
I hope that you have asked all C residues to be synthesized with mC (which is very expensive) unless you are using techniques that does not require mC in adapters.


Indeed expensive, but yes it is synthesized with mC. Thanks for the heads up.


All times are GMT -8. The time now is 11:00 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.