SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Ten Simple Rules for Getting Help from Online Scientific Communities! ECO Introductions 20 02-14-2020 07:21 AM
Mapping to Genome- 'The Rules?' NGS_user Bioinformatics 1 04-04-2011 06:21 AM
question on making BLAST db rdu Bioinformatics 4 01-12-2011 11:45 PM
ENCODE rules? LouDore General 0 08-12-2010 08:18 AM
making miRNA libraries antoniou RNA Sequencing 4 06-15-2010 05:14 PM

Reply
 
Thread Tools
Old 02-16-2012, 04:57 AM   #1
kentk
Member
 
Location: Philippines

Join Date: Dec 2011
Posts: 17
Default Rules for making your own index

We're planning to add a second index to the TruSeq v2 kit because we need more multiplexing than just 24. Are there any rules on making your own index?

I asked Illumina and they said to make sure that for each cycle A,T,C,G are all represented because the MiSeq has to "focus" or else the cycle is lost. So this means I can't make a universal tag since there will be cycles where all my base reads would only consist of a single nucleotide! Is this accurate?
kentk is offline   Reply With Quote
Old 02-16-2012, 05:10 AM   #2
ETHANol
Senior Member
 
Location: Western Australia

Join Date: Feb 2010
Posts: 308
Default

Not really sure if I am getting what you are saying but it's best to keep all the hamming distance >1 for all the barcodes.
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 02-16-2012, 05:23 AM   #3
kentk
Member
 
Location: Philippines

Join Date: Dec 2011
Posts: 17
Default

Yes I should maximize hamming distance.
But for example I have 4 indexes...

5' ATGCAT
5' TGAACG
5' GCTGTC
5' AGCTGC

An Illumina representative mentioned that I can't use that index set because the first, second and last bases will not have all of the four bases. So one the flowcell for base 1, I'll have signals for A, T, G clusters but not for C so our machine (MiSeq) will trash that cycle. Well this is what I understood from our conversation.

Any thoughts?
kentk is offline   Reply With Quote
Old 02-16-2012, 05:29 AM   #4
ETHANol
Senior Member
 
Location: Western Australia

Join Date: Feb 2010
Posts: 308
Default

On the HiSeq, you need balance nucleotide composition at the beginning of the sequencing read but not the barcode read. Otherwise you could only multiplex in multiples of four. Which was one of the drawbacks of putting the barcode at the beginning of the sequencing read. Maybe the MiSeq is more picky about the barcode read, I don't know.
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 02-16-2012, 05:52 AM   #5
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

Actually, ETHANol's statement is not 100% accurate. On the HiSeq, high cluster densities (900-1000K) have a more deleterious effect on index reads than inserts. We've had several flow cells with good cluster calling (80-90% PF) and high quality scores (mean ~38), yet fewer than 50% of the indices were called accurately. In some cases, pseudotiles at the inflow side (which contain higher cluster densities) have completely dropped out (i.e., no basecalling) during the index read after producing high-quality insert reads. The problem can be mitigated by balancing the ratio of index bases that are excited by the same laser (A/C or G/T).

If your second index is at the start of read one, then you absolutely have to use all four bases in roughly equal proportions for the first four cycles (which is when cluster calling occurs).
HESmith is offline   Reply With Quote
Old 02-16-2012, 06:01 AM   #6
ETHANol
Senior Member
 
Location: Western Australia

Join Date: Feb 2010
Posts: 308
Default

HESmith, Thanks for the correction. I'm curious here. How do you determine that the indices are called incorrectly? How do I go about performing QC on the index read?
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 02-16-2012, 06:09 AM   #7
ETHANol
Senior Member
 
Location: Western Australia

Join Date: Feb 2010
Posts: 308
Default

It's funny, I found this on the internet some time ago and follow it but Illumina hasn't followed up on it so I just assumed it wasn't a problem. Apparently, it can be. Which leads one to ask, why is this not mentioned in any of the library preparation manuals.

I think pmiguel has said that base balanced base composition for the index read is important on the HiScan.


1. Some sequencing experiments require the use of fewer than 12 index sequences in a lane with a high cluster density. In such cases, select indexes carefully to ensure optimum base calling and demultiplexing by having different bases at each cycle of the index read. Illumina recommends the following sets of indexes for low-level pooling experiments.
Pool of 2 samples:
• Index #6 GCCAAT • Index #12 CTTGTA

Pool of 3 samples:
• Index #4 TGACCA • Index #6 GCCAAT • Index #12 CTTGTA

Pool of 6 samples: • Index #2 CGATGT • Index #4 TGACCA • Index #5 ACAGTG • Index #6 GCCAAT • Index #7 CAGATC • Index #12 CTTGTA
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 02-16-2012, 06:18 AM   #8
kentk
Member
 
Location: Philippines

Join Date: Dec 2011
Posts: 17
Default

Thanks guys. Yes I was planning to introduce a 5' index. Being able to multiplex only at multiples of 4 isn't a problem. Just need to multiplex into the hundreds.

I think I've read the same post by pmiguel mentioning index reads should always contain a A/C and G/T at each position that is why I was curious why all bases should be in equal proportion.
kentk is offline   Reply With Quote
Old 02-16-2012, 06:24 AM   #9
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

Quote:
Originally Posted by ETHANol View Post
HESmith, Thanks for the correction. I'm curious here. How do you determine that the indices are called incorrectly? How do I go about performing QC on the index read?
I examined the frequencies of different indices in the Undetermined directory. The most common were one-base mismatches with the correct indices (we required perfect matches for demultiplexing), but there were nearly as many with two or more mismatches. Also, there were some pseudotiles with all Ns in the index despite high quality insert reads. Note that we observed this problem only at very high cluster densities.

You can use SAV or HCS to visualize the Q-scores for the index cycles. They are usually a bit lower than read one; if they're a lot lower, be concerned. A high fraction (>3-4%) of reads in the Undetermined directory is another indication of poor index reads.

Harold
HESmith is offline   Reply With Quote
Old 02-16-2012, 06:25 AM   #10
ETHANol
Senior Member
 
Location: Western Australia

Join Date: Feb 2010
Posts: 308
Default

http://www.plosone.org/article/info%...l.pone.0016607

With a 5' index you have the invariant T required for adapter ligation in all libraries. I guess it doesn't cause too much of a problem because people use this strategy, but it is something to think about nonetheless. Has this caused problems for anyone?
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 02-16-2012, 06:27 AM   #11
ETHANol
Senior Member
 
Location: Western Australia

Join Date: Feb 2010
Posts: 308
Default

Thanks Harold!
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 02-16-2012, 06:37 AM   #12
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

Quote:
Originally Posted by kentk View Post
Thanks guys. Yes I was planning to introduce a 5' index. Being able to multiplex only at multiples of 4 isn't a problem. Just need to multiplex into the hundreds.

I think I've read the same post by pmiguel mentioning index reads should always contain a A/C and G/T at each position that is why I was curious why all bases should be in equal proportion.
There's a distinction b/t the Illumina index read (which is separate) vs. barcodes that are incorporated at the start of your insert. In addition to cluster calling, I believe that the measured signal intensities for the first four cycles are used to calibrate values that are utilized for the remainder of the run (e.g., signal-to-noise), which would obviously affect the data if the bases are not equally represented in those cycles.

For the index read, A/C vs. G/T is usually sufficient to discriminate between a small number of barcodes.

Harold

Harold
HESmith is offline   Reply With Quote
Old 02-16-2012, 06:41 AM   #13
kentk
Member
 
Location: Philippines

Join Date: Dec 2011
Posts: 17
Default

Quote:
Originally Posted by ETHANol View Post
http://www.plosone.org/article/info%...l.pone.0016607
With a 5' index you have the invariant T required for adapter ligation in all libraries. I guess it doesn't cause too much of a problem because people use this strategy, but it is something to think about nonetheless. Has this caused problems for anyone?
Article looks interesting. I'll have to read it through first. Thanks again ETHANol

You mean the T for the T-A ligation right? No I don't think it's a problem because that T (or actually its complement A) anneals to the last base of the sequencing primer so essentially it's not part of the read
kentk is offline   Reply With Quote
Old 02-16-2012, 06:42 AM   #14
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

Quote:
Originally Posted by ETHANol View Post
http://www.plosone.org/article/info%...l.pone.0016607

With a 5' index you have the invariant T required for adapter ligation in all libraries. I guess it doesn't cause too much of a problem because people use this strategy, but it is something to think about nonetheless. Has this caused problems for anyone?
I assume you mean barcodes that are part of the adapter. Unless your index is only three bases long, you should be okay (but I haven't done the experiment). You could also resolve the problem by using indices of different length so the T is phase-shifted, and balance the other nucleotides for that cycle.
HESmith is offline   Reply With Quote
Old 03-15-2012, 07:00 AM   #15
TonyBrooks
Senior Member
 
Location: London

Join Date: Jun 2009
Posts: 298
Default

On a related note; I have a bunch of indexes (The Sanger 96-plex ones) that I'd like to use for low plexing too (4plex). These indexes are 8 bases long. Is there anything stopping just reading the first 6 bases (as per standard illumina indexing) on the GAIIx/HiSeq as long as there is AC/GT balance at all 6 positions? I only want to do this on one lane, so don't need to read 8 cycles on the other 7 lanes.
TonyBrooks is offline   Reply With Quote
Old 04-05-2012, 04:42 PM   #16
PabloMarin-Garcia
Junior Member
 
Location: Cambridge(uk)

Join Date: Jun 2010
Posts: 7
Default illumina recomendation for index selection

Quote:
Originally Posted by ETHANol View Post
It's funny, I found this on the internet some time ago and follow it but Illumina hasn't followed up on it so I just assumed it wasn't a problem. Apparently, it can be. Which leads one to ask, why is this not mentioned in any of the library preparation manuals.
They explain here for nextera:
http://www.illumina.com/documents/pr...guidelines.pdf
PabloMarin-Garcia is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:29 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO