SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
TruSeq RNA-How many samples to pool? mchotalia Sample Prep / Library Generation 0 01-11-2012 07:27 AM
PubMed: Diving BAC into the pool with microarrays. Newsbot! Literature Watch 0 08-04-2011 03:00 AM
pre-built indexes biofreak RNA Sequencing 2 07-26-2011 03:52 PM
Pool amplicons on a plate tng012 454 Pyrosequencing 3 07-10-2010 12:52 AM
Maq SNP calls from a large pool mimi_lupton Bioinformatics 0 10-31-2008 09:48 AM

Reply
 
Thread Tools
Old 09-23-2011, 11:53 AM   #1
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Lightbulb Which indexes to pool.

[Note added 12/12/2013:Illumina fixed this issue. The info below is, at best, of historical interest. At least for HiSeqs and MiSeqs. Probably HiScanSQs as well. Even the worst case scenario of a single index in a HiSeq lane generally results in >95% demultiplexing. So, if most of your reads got thrown into the "unknown" folder, there is probably some problem, other than "color balance", that caused it.]

I find it maddening when our Illumina/CASAVA chokes on a TruSeq index pool and throws all the reads into the "unknown" directory. How does one avoid this?

Disclaimer: we have not tested these out yet. Your mileage may vary, etc...

Illumina sequencers do not handle low diversity sequence well. Using a low number of indexes (bar codes) in a single lane is likely to give you low diversity. In the most extreme case an entire scanner pass yields no no visible clusters. There are two types of scanner passes -- the A/C pass and the G/T pass. If your index pool has an A or C and a G or T at all 6 positions, you can avoid a blank scanner pass.

The IUPAC code for A or C is "M" while the code for G or T is "K". So I like to think of a given index pool as "MK compatible" if there will be and M and a K at each of the 6 bases of the index. Which indexes are MK compatible?

A programmer in the lab, Dave, whipped up a script to analyze some groupings of indexes to find MK compatible pools within them. If you have a TruSeq RNA or DNA library prep kit 'version 2', you have 12 indexes. There are two types of kits, the "A" and the "B" kits, distinguished only by the indexes they use:
Box A:
2, 4, 5, 6, 7, 12, 13, 14, 15, 16, 18, 19
Box B:
1, 3, 8, 9, 10, 11, 20, 21, 22, 23, 25, 27

If Dave has it right, then these are the groupings that should minimize issues. That is, if you only have 2 or 3 libraries going into a single lane, here are the ones that, when pooled by row, are MK compatible. (For example, if you mix indexes 5 and 19 in a lane, every position will have an M and a K. Get it? Each row is good pool.)

A indexes only:
5 19
6 12
2 4 13
5 7 18

B indexes only:
3 11 21
8 20 25
9 10 21
10 22 25
10 25 27

A and B indexes:
5 19
6 12
7 17
18 25
3 5 8
1 5 9
1 4 12
2 3 12
7 10 12
1 3 13
2 4 13
8 9 13
2 11 14
7 9 14
2 11 15
7 9 15
5 11 16
10 13 16
6 10 17
1 14 18
1 15 18
5 7 18
17 18 19
3 17 20
3 11 21
4 18 21
8 16 21
9 10 21
12 14 21
12 15 21
5 6 22
12 19 22
2 5 23
8 12 23
14 16 23
15 16 23
4 17 24
14 22 24
15 22 24
19 21 24
5 17 25
7 19 25
8 20 25
10 22 25
1 11 26
2 18 26
7 23 26
8 10 26
9 16 26
13 21 26
20 22 26
5 6 27
10 25 27
12 19 27
14 24 27
15 24 27
20 26 27

How about if you have the Small RNA kit with all 48 indexes? There are a lot of possibilities. I'll just give you the 2 index MK compatible pools:

4 35
5 19
6 12
7 17
10 39
18 25
18 33
20 30
21 29
22 45
24 31
26 42
27 45
37 45


One final caveat. I did not mention to Dave that Illumina specifies reading 7 cycles for the index read -- so it can do CAFIE corrections on the first 6. So that throws a bit of a wrench in the mix...

--
Phillip

Last edited by pmiguel; 12-12-2013 at 06:30 AM. Reason: note that this thread is obsolete.
pmiguel is offline   Reply With Quote
Old 09-23-2011, 12:33 PM   #2
clostridium40
Member
 
Location: United States

Join Date: Jun 2011
Posts: 22
Default

Phillip

This is great information, very helpful for planning out your RNA-seq experiments (which I'm learning is critical). I was wondering if you have observed a number of indexes (bar codes) that serves as the cut-off point between low diversity and high diversity with the Illumina sequencers. Is it just if you only have 2 or 3 indexes, since that is the combinations that you provided or did you still see issues with 4, 5 or even 6 samples? Thanks again for the very useful information.

Kerry
clostridium40 is offline   Reply With Quote
Old 09-23-2011, 01:24 PM   #3
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

I think the critical factor is having a fair number of clusters "lit up" during any given scanning pass. Since there are 2 passes, one for "G" and "T" and another for "A" and "C" -- getting at least 5% or so of your cluster to "glow" should do the trick. Or most of it.

I think the scanner upon seeing a blank flow cell presumes something must be wrong and may attempt to change its focal depth. Not good!

I guess I could ask Dave to run the analysis to look for bad index pools of 4 or more indexes. But, as long as you had an MK compatible pair or triple in the pools, I would think you would be okay.

I should add I have not looked back to see where we got into problems exactly. But my sense was that after you got above 4 indexes, things seemed okay. But that may just be because I did not look carefully enough...
--
Phillip
pmiguel is offline   Reply With Quote
Old 09-23-2011, 01:49 PM   #4
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by pmiguel View Post
One final caveat. I did not mention to Dave that Illumina specifies reading 7 cycles for the index read -- so it can do CAFIE corrections on the first 6. So that throws a bit of a wrench in the mix...

--
Phillip
Careful Phillip. CAFIE is the other guys. Illumina does phasing correction.
kmcarr is offline   Reply With Quote
Old 09-23-2011, 02:06 PM   #5
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by pmiguel View Post
I find it maddening when our Illumina/CASAVA chokes on a TruSeq index pool and throws all the reads into the "unknown" directory. How does one avoid this?

Disclaimer: we have not tested these out yet. Your mileage may vary, etc...

Illumina sequencers do not handle low diversity sequence well. Using a low number of indexes (bar codes) in a single lane is likely to give you low diversity. In the most extreme case an entire scanner pass yields no no visible clusters. There are two types of scanner passes -- the A/C pass and the G/T pass. If your index pool has an A or C and a G or T at all 6 positions, you can avoid a blank scanner pass.
Is low diversity really the source of the your problem though? We just completed a run where in 4 of the lanes we had run single libraries prepared with TruSeq. This meant that for the index read there was a single base at each cycle, the ultimate in low diversity. All libraries had the correct barcode read for >98% of the passed filter reads. We have also successfully run lanes with only two barcodes (I don't know what he MK breakdown was for those).

On the other hand we have had complete fail of index reads even when there was a much higher order of pooling and thus very diverse barcodes. Our best guess in these cases is that the index primer failed to anneal efficiently.
kmcarr is offline   Reply With Quote
Old 09-24-2011, 10:27 AM   #6
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Speculation on my part: but it looked like a diversity issue to me. We are novices at this, with only a handful of Illumina runs under our belts though. But lanes with several indexes in them seem to invariably demultiplex without incident. Whereas lanes with one index in them generally have 2 or 3 cycles with all the base calls given very low quality values and all the reads end up in the "undetermined" catagory.

It is possible that a recent CASAVA upgrade has fixed the issue. Under v1 the instrument software would choking so bad on single indexes that tech support had us reboot the constrol software before doing read2 so that it would re-calibrate its focus. But as of v3 chemistry (and whatever the software version that went along with that), the instrument would automatically recalibrate focus before read2. So I am sure there is action behind the scenes.

I am demultiplexing our most recent run now. Will let you know how it looks.

--
Phillip
pmiguel is offline   Reply With Quote
Old 10-21-2011, 12:19 PM   #7
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

So, we had exactly the same issues I describe above with the run I mention. Lanes with low diversity in their index sequences always failed to demultiplex. Lanes with >3 indexes always succeeded.

However after going round and round with Illumina tech support about this, I think this may no longer be an issue for HiSeqs and only be an issue for HiScanSQs. Apparently HiSeqs do a separate scan for each base? I don't have a HiSeq, so I don't know for sure. The HiScanSQ definitely scans A and C together and G and T together. That might be the issue.

Anyway, I did find out that the

--use-bases-mask

parameter can be used during demultiplexing to skip bases where the instrument has clearly defocused itself and is not collecting usable data.

--
Phillip
pmiguel is offline   Reply With Quote
Old 10-27-2011, 05:10 PM   #8
BIG_SNP
Member
 
Location: CA

Join Date: Jul 2009
Posts: 14
Default

We have had good success running low (or no) diversity samples on the HiSeq using the Nugen kit which uses in-line barcodes. We simply use multiple inline barcodes for each sample which tricks the machine into thinking the libraries are diverse for the first critical cycles to pass phasing, etc.
BIG_SNP is offline   Reply With Quote
Old 10-27-2011, 08:33 PM   #9
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

One thing we do to deal with this issue is to whenever possible create large pools and spread them across multiple lanes instead of using 2-3 samples per lane. Beyond the issue of low-complexity barcodes in small pools there is less of a risk of losing all the data from a set of samples if one lane fails.
Jon_Keats is offline   Reply With Quote
Old 07-27-2012, 05:04 AM   #10
biochembug
Member
 
Location: New York

Join Date: Mar 2011
Posts: 26
Default

Hi folks,

How is the following multiplexing (Barcodes 2, 3, 4, 5, 18, 19, 12, 13, 8 libraries) in a lane for TruSeq? In case, it is the problem to pool 2, 3 or 4 libraries I may be safe.

B.No. is barcode number.
B.No. --------Composition----------
18 G T C C G C
19 G T G A A A
12 C T T G T A
13 A G T C A A
2 C G A T G T
3 T T A G G C
4 T G A C C A
5 A C A G T G

Biochembug
biochembug is offline   Reply With Quote
Old 12-03-2013, 03:23 PM   #11
BIG_SNP
Member
 
Location: CA

Join Date: Jul 2009
Posts: 14
Default question

Phillip

Many of the combinations you have listed require index 17:

for example...

A and B indexes:
7 17
3 17 20
4 17 24
5 17 25
6 10 17
17 18 19

but from the list you stated of what is included in Box A and Box B there is no index 17 included. Could you please help.

Thank you!




Quote:
Originally Posted by pmiguel View Post
I find it maddening when our Illumina/CASAVA chokes on a TruSeq index pool and throws all the reads into the "unknown" directory. How does one avoid this?

Disclaimer: we have not tested these out yet. Your mileage may vary, etc...

Illumina sequencers do not handle low diversity sequence well. Using a low number of indexes (bar codes) in a single lane is likely to give you low diversity. In the most extreme case an entire scanner pass yields no no visible clusters. There are two types of scanner passes -- the A/C pass and the G/T pass. If your index pool has an A or C and a G or T at all 6 positions, you can avoid a blank scanner pass.

The IUPAC code for A or C is "M" while the code for G or T is "K". So I like to think of a given index pool as "MK compatible" if there will be and M and a K at each of the 6 bases of the index. Which indexes are MK compatible?

A programmer in the lab, Dave, whipped up a script to analyze some groupings of indexes to find MK compatible pools within them. If you have a TruSeq RNA or DNA library prep kit 'version 2', you have 12 indexes. There are two types of kits, the "A" and the "B" kits, distinguished only by the indexes they use:
Box A:
2, 4, 5, 6, 7, 12, 13, 14, 15, 16, 18, 19
Box B:
1, 3, 8, 9, 10, 11, 20, 21, 22, 23, 25, 27

If Dave has it right, then these are the groupings that should minimize issues. That is, if you only have 2 or 3 libraries going into a single lane, here are the ones that, when pooled by row, are MK compatible. (For example, if you mix indexes 5 and 19 in a lane, every position will have an M and a K. Get it? Each row is good pool.)

A indexes only:
5 19
6 12
2 4 13
5 7 18

B indexes only:
3 11 21
8 20 25
9 10 21
10 22 25
10 25 27

A and B indexes:
5 19
6 12
7 17
18 25
3 5 8
1 5 9
1 4 12
2 3 12
7 10 12
1 3 13
2 4 13
8 9 13
2 11 14
7 9 14
2 11 15
7 9 15
5 11 16
10 13 16
6 10 17
1 14 18
1 15 18
5 7 18
17 18 19
3 17 20
3 11 21
4 18 21
8 16 21
9 10 21
12 14 21
12 15 21
5 6 22
12 19 22
2 5 23
8 12 23
14 16 23
15 16 23
4 17 24
14 22 24
15 22 24
19 21 24
5 17 25
7 19 25
8 20 25
10 22 25
1 11 26
2 18 26
7 23 26
8 10 26
9 16 26
13 21 26
20 22 26
5 6 27
10 25 27
12 19 27
14 24 27
15 24 27
20 26 27

How about if you have the Small RNA kit with all 48 indexes? There are a lot of possibilities. I'll just give you the 2 index MK compatible pools:

4 35
5 19
6 12
7 17
10 39
18 25
18 33
20 30
21 29
22 45
24 31
26 42
27 45
37 45


One final caveat. I did not mention to Dave that Illumina specifies reading 7 cycles for the index read -- so it can do CAFIE corrections on the first 6. So that throws a bit of a wrench in the mix...

--
Phillip
BIG_SNP is offline   Reply With Quote
Old 12-04-2013, 08:58 AM   #12
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by BIG_SNP View Post
Phillip

Many of the combinations you have listed require index 17:

for example...

A and B indexes:
7 17
3 17 20
4 17 24
5 17 25
6 10 17
17 18 19

but from the list you stated of what is included in Box A and Box B there is no index 17 included. Could you please help.

Thank you!
Hi Big_SNP,
Yes, I can help -- pool any indexes you like together. It doesn't matter any more. Illumina fixed this issue that caused low % demultiplexing due to unequal base representation around the time they got around to mentioning in the manuals it was a problem. Now we have manuals that warn against a non-existent problem.
Ah well...

--
Phillip
pmiguel is offline   Reply With Quote
Old 12-11-2013, 06:56 AM   #13
HeinKey
Member
 
Location: Wageningen, Netherlands

Join Date: May 2009
Posts: 21
Default

Hi Phillip,
For MiSeq runs I agree, but is RTA for HiSeq also changed? I was told the improvement for biased libraries was only for MiSeq RTA.

Hein
HeinKey is offline   Reply With Quote
Old 12-11-2013, 08:52 AM   #14
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by HeinKey View Post
Hi Phillip,
For MiSeq runs I agree, but is RTA for HiSeq also changed? I was told the improvement for biased libraries was only for MiSeq RTA.

Hein
I don't understand what you mean. This is not a library bias issue, is it? We are talking index reads, right?

Our HiSeq has never cared about index balance. It just isn't an issue. With other HiSeqs? I don't know.

Our previous sequencer: HiScanSQ -- there index balance mattered. But not for HiSeq or MiSeq. Not ever, in my experience.

--
Phillip
pmiguel is offline   Reply With Quote
Old 12-11-2013, 08:59 AM   #15
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,087
Default

Only time problems with indexes show up on HiSeq is when one over clusters samples. Regular reads are resistant but the index reads tend to start accumulating N's (when samples are over clustered) leading to losses.
GenoMax is offline   Reply With Quote
Old 12-11-2013, 09:53 AM   #16
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by GenoMax View Post
Only time problems with indexes show up on HiSeq is when one over clusters samples. Regular reads are resistant but the index reads tend to start accumulating N's (when samples are over clustered) leading to losses.
What threshhold do you start to see this? We get very high demultiplex results (>95% demultiplex of PF reads) at 1000/mm^2.

--
Phillip
pmiguel is offline   Reply With Quote
Old 12-11-2013, 10:08 AM   #17
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,087
Default

Quote:
Originally Posted by pmiguel View Post
What threshhold do you start to see this? We get very high demultiplex results (>95% demultiplex of PF reads) at 1000/mm^2.

--
Phillip
Some point north of 1000/mm^2. The tipping point is rather abrupt (as you may have experienced). We try to stay in 900-950 range to be safe.
GenoMax is offline   Reply With Quote
Old 12-26-2013, 04:33 PM   #18
colinkingswood
Junior Member
 
Location: Spain

Join Date: Apr 2010
Posts: 1
Default

I am working on software to calculate this, hopefully I will release soon.
colinkingswood is offline   Reply With Quote
Old 01-24-2014, 12:00 PM   #19
massspecgeek
Junior Member
 
Location: USA

Join Date: Jan 2011
Posts: 7
Default

Quote:
Originally Posted by pmiguel View Post
Hi Big_SNP,
Yes, I can help -- pool any indexes you like together. It doesn't matter any more. Illumina fixed this issue that caused low % demultiplexing due to unequal base representation around the time they got around to mentioning in the manuals it was a problem. Now we have manuals that warn against a non-existent problem.
Ah well...

--
Phillip
Just for the record, as of today (1/24/14) Illumina tech support says that there haven't been any changes made that address this issue and they still recommend having color balance in the index read whenever possible.

Roger
massspecgeek is offline   Reply With Quote
Old 01-24-2014, 12:40 PM   #20
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by massspecgeek View Post
Just for the record, as of today (1/24/14) Illumina tech support says that there haven't been any changes made that address this issue and they still recommend having color balance in the index read whenever possible.

Roger
Roger,
Well, I can only comment as someone who did have issues with what later was called "color balance" on our old HiScanSQ. These were severe issues that would make demultiplexing either impossible, or require special tweaking of CASAVA to get them to work.

Just as a metric, any lane with only a single index in it rarely yielded more than 50% of the reads after automatic demultiplexing. Not that it mattered as there was only one sample the lane. Still, it gave and indication of how much of an issue "color balance" was.

Now, at this time, there was nothing in Illumina literature indicating this was a problem. But it clearly was. So we figured out how to balance our indexes on our own.

At some later point, color balance became much less of an issue. Again this is based on the assay "I just ran a single library in a lane, did it demultiplex?" The answer used to be "No, not very well." But it became, "yes".

Since we upgraded to the HiSeq we have been unable to detect any loss of reads due to lack of "color balance". So, I took that to mean "Illumina fixed the problem".

However, around this time, there were additions to many of the library construction protocols specifying the importance of color balance. Those sections, near as I can tell, were now completely unnecessary. If they could be sent back in time a year, they would have been great though.

Anyway, if you want to color balance your library pools, feel free. I don't think it actually yields you more reads these days, but I don't think it hurts anything either. Maybe as an aesthetic statement it retains its importance. Sort of like wearing a waist coat to carry your pocket watch.

--
Phillip

Last edited by pmiguel; 01-24-2014 at 01:04 PM.
pmiguel is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:35 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO