SEQanswers

Go Back   SEQanswers > Applications Forums > Sample Prep / Library Generation



Similar Threads
Thread Thread Starter Forum Replies Last Post
illumina bcl2fastq remove adaptor mot Bioinformatics 2 03-30-2016 04:29 AM
Bias in unique molecular identifier usage sudders Bioinformatics 5 11-23-2015 02:30 AM
Rename fastq seq ID with unique identifier 454rocks Bioinformatics 2 03-28-2012 01:29 PM
Solexa - same sequence but unique identifier Layla Bioinformatics 5 11-27-2009 06:08 AM
illumina primer/adaptor concentrations tamosan Sample Prep / Library Generation 1 10-29-2009 07:12 AM

Reply
 
Thread Tools
Old 10-11-2017, 06:45 AM   #1
Andersen
Member
 
Location: Copenhagen

Join Date: Oct 2015
Posts: 15
Default Illumina Unique Molecular Identifier Adaptor

Hi All,

I want to generate RRBS libraries where we can track each unique molecule with a UMI. Thus I have generated new TruSeq adaptors that we normally use.

The regular Truseq primers look like this:
A1_P5: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC*T
A1_P7 (AR005): ℗-GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG

What we did was to add 8 random nucleotides to the P5 so that it looks like this:
A1_P5 UMI8: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNN*T

From what i heard I should anneal the two adaptors. The annealing effeciency can be seen here: https://imgur.com/a/70LZU

What I did now was to try our regular protocol with old adaptors vs new adaptors.

The libraries look like this: https://imgur.com/a/mHZeF

Stupidly I did not generate another P5 without the 8 UMI's.

It seems like the adaptors form adaptor dimers, but for the library it seems like it does not bind.

Suggestions of other designs or ways to get it to work would be highly appreciated.

I have looked into the dual indexes and then exchanging one index with UMI's instead, but could not find the sequences, do any of you have them?

Best regards
Emil
Andersen is offline   Reply With Quote
Old 10-11-2017, 10:01 AM   #2
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,238
Default

Is this a Y-adapter design? If so, the P7 and P5 need to anneal over the last 12 bases of the P5/first 12 bases of the P7, with a 3' "T" overhang to work with many Illumina work flows. When you add 8 N's and a T to the P5, you create a 9 base 3' overhang.
No chance you can ligate that to anything using a double stranded DNA ligase.

I suppose the opposite design, where you put the 8 N's at the 5' end of P7 could work. But you would have to anneal the P5 and P7 by their 12 bases of complementarity and then extend the 3' end of the P5 strand using klenow, for example. But that would give you a blunt adapter. So your inserts would need to be blunt as well. Which would allow chimeric inserts and the formation of massive adapter dimers.

BTW, Illumina asks for the following:
Oligonucleotide sequences 2017 Illumina, Inc. All rights reserved.
and
Oligonucleotide sequences 2017 Illumina, Inc. All rights reserved. Derivative works created by
Illumina customers are authorized for use with Illumina instruments and products only. All other
uses are strictly prohibited.


to be added to any sequence information derived from their adapter sequences if they are distributed or published outside ones institution.

The correct position to place a UMI in the P5 index site would be after the TCTACAC. That is:
AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT

But, because 8 N's can compose, at most, 4^8 ~= 64K sequence combinations, they would not be "unique" in the context of a sample producing millions of sequences. And, I don't know how far you can push the length of the i5 index. Probably 12, at least. But going beyond 8 bases you would need to subtract sequence from the sequence reads to make up for the reagents you would be using to generate a longer UMI.

--
Phillip

Last edited by pmiguel; 10-11-2017 at 10:05 AM.
pmiguel is offline   Reply With Quote
Old 10-11-2017, 11:23 AM   #3
Andersen
Member
 
Location: Copenhagen

Join Date: Oct 2015
Posts: 15
Default

Thank you so much for your answer Phillip.

1. I am actually not really sure if it is a Y-adapter design. But I am pretty sure that it is. I tried to find information about it but couldn't, but it is the same adapter as used for TruSeq LT.

2. Thank you for suggesting not to go with blunt inserts.

3. Thank you for that suggestion. Do you know if any of the current kits from Illumina uses Dual indexing where you also have the Y-adaptor setup as I presumably have? I would believe that this setup would actually be the best setup to run since I would be sure to have intact annealing at the complementary 12 bases.

Thank you for your answers
Kind regards
Emil


Quote:
Originally Posted by pmiguel View Post
Is this a Y-adapter design? If so, the P7 and P5 need to anneal over the last 12 bases of the P5/first 12 bases of the P7, with a 3' "T" overhang to work with many Illumina work flows. When you add 8 N's and a T to the P5, you create a 9 base 3' overhang.
No chance you can ligate that to anything using a double stranded DNA ligase.

I suppose the opposite design, where you put the 8 N's at the 5' end of P7 could work. But you would have to anneal the P5 and P7 by their 12 bases of complementarity and then extend the 3' end of the P5 strand using klenow, for example. But that would give you a blunt adapter. So your inserts would need to be blunt as well. Which would allow chimeric inserts and the formation of massive adapter dimers.

BTW, Illumina asks for the following:
Oligonucleotide sequences 2017 Illumina, Inc. All rights reserved.
and
Oligonucleotide sequences 2017 Illumina, Inc. All rights reserved. Derivative works created by
Illumina customers are authorized for use with Illumina instruments and products only. All other
uses are strictly prohibited.


to be added to any sequence information derived from their adapter sequences if they are distributed or published outside ones institution.

The correct position to place a UMI in the P5 index site would be after the TCTACAC. That is:
AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT

But, because 8 N's can compose, at most, 4^8 ~= 64K sequence combinations, they would not be "unique" in the context of a sample producing millions of sequences. And, I don't know how far you can push the length of the i5 index. Probably 12, at least. But going beyond 8 bases you would need to subtract sequence from the sequence reads to make up for the reagents you would be using to generate a longer UMI.

--
Phillip
Andersen is offline   Reply With Quote
Old 10-11-2017, 02:09 PM   #4
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,238
Default

Hi Emil,
Most TruSeq Illumina kits use the Y-adapters. The common "TruSeq" DNA and RNAseq ones will offer a normal single index kit option (usually going up to 24 indexes) or a "high thoughput" one with dual indexes allowing you to multiplex 96 samples in a lane. Well, as long as you don't need them to be "unique dual".

The parts of the dual index Y-adapters containing the indexes do not anneal! Only the 12 bases of the Y-adapter most proximate to the insert are annealled. The rest of the adapter has non-complementary sequence that won't anneal and hangs off like a forked tail. Get it? "Y" where the double-stranded part is the stem of the "Y" and the single stranded tails are the top part. It is only during subsequent PCR that this tail region becomes double stranded.

This is one of those brilliant solutions to a bunch amplicon construction issues when using ligation. You need to have a forward and a reverse adapter. But if you just make them both double-stranded, so that T4 DNA ligase will use them as a substrate, you create 2 problems. First some of your constructs will get forward adapters on both ends or reverse adapters on both end and won't be suitable templates for clustering. Second with a double-stranded molecule you have two ends that can be ligated -- possibly to each other.

So some fiendish genius came up with the idea of annealing single-stranded versions of both the left and right adapters to each other such that only one end was actually annealed. This way each double stranded insert was guaranteed to have an R adapter on one end and an F adapter on the other end. Actually, each strand of the the double stranded insert would have a single stranded R on one end and F on the other--the top strand in one orientation (say, F-insert-R) the bottom strand in the other orientation (R-insert-F).

--
Phillip
pmiguel is offline   Reply With Quote
Old 10-11-2017, 02:33 PM   #5
nucacidhunter
Senior Member
 
Location: Iran

Join Date: Jan 2013
Posts: 1,083
Default

Sequence and structure of TruSeq HT adapters is attached.

You would need to substitute i5 sequences with N to use as UMI.

Other option you might consider is: http://www.nugen.com/products/ovatio...hyl-seq-system

It has 6 UMI base which follows the index read so the index 1 read has to be 12 cycles to utilize UMI or 6 cycles just for the index. Other advantage is that they have included diversity nucleotides and libraries can be sequenced with 1% PhiX spike in. In the conventional protocol higher PhiX (>30%) is required.
Attached Files
File Type: pdf TruSeq HT Adapter structure.pdf (308.0 KB, 8 views)
nucacidhunter is offline   Reply With Quote
Old 10-12-2017, 02:11 AM   #6
torben
Junior Member
 
Location: Norway

Join Date: Oct 2012
Posts: 8
Default

For a way to create TruSeq adapters with UMI at the end see Kennedy SR, et al. Detecting ultralow-frequency mutations by Duplex Sequencing. Nat Protoc. 2014 Nov;9(11):2586-606. doi: 10.1038/nprot.2014.170. https://www.nature.com/nprot/journal....2014.170.html
torben is offline   Reply With Quote
Old 10-13-2017, 04:09 AM   #7
Andersen
Member
 
Location: Copenhagen

Join Date: Oct 2015
Posts: 15
Default

Thanks again Phillip!

Indeed they must be some fiendish geniouses.

I have allready generated the TruSeq DNA LT adapter piece with a 6 nt index. Do you think it would work to anneal the i5 adapter to this adapter or should i generate new i7 adapters aswell?

Also to nucacidhunter and torben, thanks for the suggestions!

Cheers
Emil

Quote:
Originally Posted by pmiguel View Post
Hi Emil,
Most TruSeq Illumina kits use the Y-adapters. The common "TruSeq" DNA and RNAseq ones will offer a normal single index kit option (usually going up to 24 indexes) or a "high thoughput" one with dual indexes allowing you to multiplex 96 samples in a lane. Well, as long as you don't need them to be "unique dual".

The parts of the dual index Y-adapters containing the indexes do not anneal! Only the 12 bases of the Y-adapter most proximate to the insert are annealled. The rest of the adapter has non-complementary sequence that won't anneal and hangs off like a forked tail. Get it? "Y" where the double-stranded part is the stem of the "Y" and the single stranded tails are the top part. It is only during subsequent PCR that this tail region becomes double stranded.

This is one of those brilliant solutions to a bunch amplicon construction issues when using ligation. You need to have a forward and a reverse adapter. But if you just make them both double-stranded, so that T4 DNA ligase will use them as a substrate, you create 2 problems. First some of your constructs will get forward adapters on both ends or reverse adapters on both end and won't be suitable templates for clustering. Second with a double-stranded molecule you have two ends that can be ligated -- possibly to each other.

So some fiendish genius came up with the idea of annealing single-stranded versions of both the left and right adapters to each other such that only one end was actually annealed. This way each double stranded insert was guaranteed to have an R adapter on one end and an F adapter on the other end. Actually, each strand of the the double stranded insert would have a single stranded R on one end and F on the other--the top strand in one orientation (say, F-insert-R) the bottom strand in the other orientation (R-insert-F).

--
Phillip
Andersen is offline   Reply With Quote
Old 10-13-2017, 08:48 AM   #8
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,238
Default

Hi Emil,
I would strongly recommend that you verify this yourself by aligning your p5 and the reverse (in the 3' - 5' direction) of your p7 sequence. You will see the terminal 12 bases on one side are complements of each other with just a 3' "T" overhang provided by the p5 oligo.
Once you have done that, you will understand how a Y-adapter is structured to function as it does.
--
Phillip
pmiguel is offline   Reply With Quote
Old 10-16-2017, 12:01 AM   #9
Andersen
Member
 
Location: Copenhagen

Join Date: Oct 2015
Posts: 15
Default

Thank you for all your help!

I have now ordered the adapters and hope they will work!
Andersen is offline   Reply With Quote
Old 10-16-2017, 12:33 AM   #10
nucacidhunter
Senior Member
 
Location: Iran

Join Date: Jan 2013
Posts: 1,083
Default

I hope that you have asked all C residues to be synthesized with mC to prevent C conversion to U during bisulfite treatment (which is very expensive) unless you are using techniques that does not require mC in adapters.
nucacidhunter is offline   Reply With Quote
Old 10-16-2017, 12:36 AM   #11
Andersen
Member
 
Location: Copenhagen

Join Date: Oct 2015
Posts: 15
Default

Quote:
Originally Posted by nucacidhunter View Post
I hope that you have asked all C residues to be synthesized with mC (which is very expensive) unless you are using techniques that does not require mC in adapters.

Indeed expensive, but yes it is synthesized with mC. Thanks for the heads up.
Andersen is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:54 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO