Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • pmiguel
    Senior Member
    • Aug 2008
    • 2328

    Which indexes to pool.

    [Note added 12/12/2013:Illumina fixed this issue. The info below is, at best, of historical interest. At least for HiSeqs and MiSeqs. Probably HiScanSQs as well. Even the worst case scenario of a single index in a HiSeq lane generally results in >95% demultiplexing. So, if most of your reads got thrown into the "unknown" folder, there is probably some problem, other than "color balance", that caused it.]

    I find it maddening when our Illumina/CASAVA chokes on a TruSeq index pool and throws all the reads into the "unknown" directory. How does one avoid this?

    Disclaimer: we have not tested these out yet. Your mileage may vary, etc...

    Illumina sequencers do not handle low diversity sequence well. Using a low number of indexes (bar codes) in a single lane is likely to give you low diversity. In the most extreme case an entire scanner pass yields no no visible clusters. There are two types of scanner passes -- the A/C pass and the G/T pass. If your index pool has an A or C and a G or T at all 6 positions, you can avoid a blank scanner pass.

    The IUPAC code for A or C is "M" while the code for G or T is "K". So I like to think of a given index pool as "MK compatible" if there will be and M and a K at each of the 6 bases of the index. Which indexes are MK compatible?

    A programmer in the lab, Dave, whipped up a script to analyze some groupings of indexes to find MK compatible pools within them. If you have a TruSeq RNA or DNA library prep kit 'version 2', you have 12 indexes. There are two types of kits, the "A" and the "B" kits, distinguished only by the indexes they use:
    Box A:
    2, 4, 5, 6, 7, 12, 13, 14, 15, 16, 18, 19
    Box B:
    1, 3, 8, 9, 10, 11, 20, 21, 22, 23, 25, 27

    If Dave has it right, then these are the groupings that should minimize issues. That is, if you only have 2 or 3 libraries going into a single lane, here are the ones that, when pooled by row, are MK compatible. (For example, if you mix indexes 5 and 19 in a lane, every position will have an M and a K. Get it? Each row is good pool.)

    A indexes only:
    5 19
    6 12
    2 4 13
    5 7 18

    B indexes only:
    3 11 21
    8 20 25
    9 10 21
    10 22 25
    10 25 27

    A and B indexes:
    5 19
    6 12
    7 17
    18 25
    3 5 8
    1 5 9
    1 4 12
    2 3 12
    7 10 12
    1 3 13
    2 4 13
    8 9 13
    2 11 14
    7 9 14
    2 11 15
    7 9 15
    5 11 16
    10 13 16
    6 10 17
    1 14 18
    1 15 18
    5 7 18
    17 18 19
    3 17 20
    3 11 21
    4 18 21
    8 16 21
    9 10 21
    12 14 21
    12 15 21
    5 6 22
    12 19 22
    2 5 23
    8 12 23
    14 16 23
    15 16 23
    4 17 24
    14 22 24
    15 22 24
    19 21 24
    5 17 25
    7 19 25
    8 20 25
    10 22 25
    1 11 26
    2 18 26
    7 23 26
    8 10 26
    9 16 26
    13 21 26
    20 22 26
    5 6 27
    10 25 27
    12 19 27
    14 24 27
    15 24 27
    20 26 27

    How about if you have the Small RNA kit with all 48 indexes? There are a lot of possibilities. I'll just give you the 2 index MK compatible pools:

    4 35
    5 19
    6 12
    7 17
    10 39
    18 25
    18 33
    20 30
    21 29
    22 45
    24 31
    26 42
    27 45
    37 45


    One final caveat. I did not mention to Dave that Illumina specifies reading 7 cycles for the index read -- so it can do CAFIE corrections on the first 6. So that throws a bit of a wrench in the mix...

    --
    Phillip
    Last edited by pmiguel; 12-12-2013, 06:30 AM. Reason: note that this thread is obsolete.
  • clostridium40
    Member
    • Jun 2011
    • 22

    #2
    Phillip

    This is great information, very helpful for planning out your RNA-seq experiments (which I'm learning is critical). I was wondering if you have observed a number of indexes (bar codes) that serves as the cut-off point between low diversity and high diversity with the Illumina sequencers. Is it just if you only have 2 or 3 indexes, since that is the combinations that you provided or did you still see issues with 4, 5 or even 6 samples? Thanks again for the very useful information.

    Kerry

    Comment

    • pmiguel
      Senior Member
      • Aug 2008
      • 2328

      #3
      I think the critical factor is having a fair number of clusters "lit up" during any given scanning pass. Since there are 2 passes, one for "G" and "T" and another for "A" and "C" -- getting at least 5% or so of your cluster to "glow" should do the trick. Or most of it.

      I think the scanner upon seeing a blank flow cell presumes something must be wrong and may attempt to change its focal depth. Not good!

      I guess I could ask Dave to run the analysis to look for bad index pools of 4 or more indexes. But, as long as you had an MK compatible pair or triple in the pools, I would think you would be okay.

      I should add I have not looked back to see where we got into problems exactly. But my sense was that after you got above 4 indexes, things seemed okay. But that may just be because I did not look carefully enough...
      --
      Phillip

      Comment

      • kmcarr
        Senior Member
        • May 2008
        • 1181

        #4
        Originally posted by pmiguel View Post
        One final caveat. I did not mention to Dave that Illumina specifies reading 7 cycles for the index read -- so it can do CAFIE corrections on the first 6. So that throws a bit of a wrench in the mix...

        --
        Phillip
        Careful Phillip. CAFIE is the other guys. Illumina does phasing correction.

        Comment

        • kmcarr
          Senior Member
          • May 2008
          • 1181

          #5
          Originally posted by pmiguel View Post
          I find it maddening when our Illumina/CASAVA chokes on a TruSeq index pool and throws all the reads into the "unknown" directory. How does one avoid this?

          Disclaimer: we have not tested these out yet. Your mileage may vary, etc...

          Illumina sequencers do not handle low diversity sequence well. Using a low number of indexes (bar codes) in a single lane is likely to give you low diversity. In the most extreme case an entire scanner pass yields no no visible clusters. There are two types of scanner passes -- the A/C pass and the G/T pass. If your index pool has an A or C and a G or T at all 6 positions, you can avoid a blank scanner pass.
          Is low diversity really the source of the your problem though? We just completed a run where in 4 of the lanes we had run single libraries prepared with TruSeq. This meant that for the index read there was a single base at each cycle, the ultimate in low diversity. All libraries had the correct barcode read for >98% of the passed filter reads. We have also successfully run lanes with only two barcodes (I don't know what he MK breakdown was for those).

          On the other hand we have had complete fail of index reads even when there was a much higher order of pooling and thus very diverse barcodes. Our best guess in these cases is that the index primer failed to anneal efficiently.

          Comment

          • pmiguel
            Senior Member
            • Aug 2008
            • 2328

            #6
            Speculation on my part: but it looked like a diversity issue to me. We are novices at this, with only a handful of Illumina runs under our belts though. But lanes with several indexes in them seem to invariably demultiplex without incident. Whereas lanes with one index in them generally have 2 or 3 cycles with all the base calls given very low quality values and all the reads end up in the "undetermined" catagory.

            It is possible that a recent CASAVA upgrade has fixed the issue. Under v1 the instrument software would choking so bad on single indexes that tech support had us reboot the constrol software before doing read2 so that it would re-calibrate its focus. But as of v3 chemistry (and whatever the software version that went along with that), the instrument would automatically recalibrate focus before read2. So I am sure there is action behind the scenes.

            I am demultiplexing our most recent run now. Will let you know how it looks.

            --
            Phillip

            Comment

            • pmiguel
              Senior Member
              • Aug 2008
              • 2328

              #7
              So, we had exactly the same issues I describe above with the run I mention. Lanes with low diversity in their index sequences always failed to demultiplex. Lanes with >3 indexes always succeeded.

              However after going round and round with Illumina tech support about this, I think this may no longer be an issue for HiSeqs and only be an issue for HiScanSQs. Apparently HiSeqs do a separate scan for each base? I don't have a HiSeq, so I don't know for sure. The HiScanSQ definitely scans A and C together and G and T together. That might be the issue.

              Anyway, I did find out that the

              --use-bases-mask

              parameter can be used during demultiplexing to skip bases where the instrument has clearly defocused itself and is not collecting usable data.

              --
              Phillip

              Comment

              • BIG_SNP
                Member
                • Jul 2009
                • 14

                #8
                We have had good success running low (or no) diversity samples on the HiSeq using the Nugen kit which uses in-line barcodes. We simply use multiple inline barcodes for each sample which tricks the machine into thinking the libraries are diverse for the first critical cycles to pass phasing, etc.

                Comment

                • Jon_Keats
                  Senior Member
                  • Mar 2010
                  • 279

                  #9
                  One thing we do to deal with this issue is to whenever possible create large pools and spread them across multiple lanes instead of using 2-3 samples per lane. Beyond the issue of low-complexity barcodes in small pools there is less of a risk of losing all the data from a set of samples if one lane fails.

                  Comment

                  • biochembug
                    Member
                    • Mar 2011
                    • 26

                    #10
                    Hi folks,

                    How is the following multiplexing (Barcodes 2, 3, 4, 5, 18, 19, 12, 13, 8 libraries) in a lane for TruSeq? In case, it is the problem to pool 2, 3 or 4 libraries I may be safe.

                    B.No. is barcode number.
                    B.No. --------Composition----------
                    18 G T C C G C
                    19 G T G A A A
                    12 C T T G T A
                    13 A G T C A A
                    2 C G A T G T
                    3 T T A G G C
                    4 T G A C C A
                    5 A C A G T G

                    Biochembug

                    Comment

                    • BIG_SNP
                      Member
                      • Jul 2009
                      • 14

                      #11
                      question

                      Phillip

                      Many of the combinations you have listed require index 17:

                      for example...

                      A and B indexes:
                      7 17
                      3 17 20
                      4 17 24
                      5 17 25
                      6 10 17
                      17 18 19

                      but from the list you stated of what is included in Box A and Box B there is no index 17 included. Could you please help.

                      Thank you!




                      Originally posted by pmiguel View Post
                      I find it maddening when our Illumina/CASAVA chokes on a TruSeq index pool and throws all the reads into the "unknown" directory. How does one avoid this?

                      Disclaimer: we have not tested these out yet. Your mileage may vary, etc...

                      Illumina sequencers do not handle low diversity sequence well. Using a low number of indexes (bar codes) in a single lane is likely to give you low diversity. In the most extreme case an entire scanner pass yields no no visible clusters. There are two types of scanner passes -- the A/C pass and the G/T pass. If your index pool has an A or C and a G or T at all 6 positions, you can avoid a blank scanner pass.

                      The IUPAC code for A or C is "M" while the code for G or T is "K". So I like to think of a given index pool as "MK compatible" if there will be and M and a K at each of the 6 bases of the index. Which indexes are MK compatible?

                      A programmer in the lab, Dave, whipped up a script to analyze some groupings of indexes to find MK compatible pools within them. If you have a TruSeq RNA or DNA library prep kit 'version 2', you have 12 indexes. There are two types of kits, the "A" and the "B" kits, distinguished only by the indexes they use:
                      Box A:
                      2, 4, 5, 6, 7, 12, 13, 14, 15, 16, 18, 19
                      Box B:
                      1, 3, 8, 9, 10, 11, 20, 21, 22, 23, 25, 27

                      If Dave has it right, then these are the groupings that should minimize issues. That is, if you only have 2 or 3 libraries going into a single lane, here are the ones that, when pooled by row, are MK compatible. (For example, if you mix indexes 5 and 19 in a lane, every position will have an M and a K. Get it? Each row is good pool.)

                      A indexes only:
                      5 19
                      6 12
                      2 4 13
                      5 7 18

                      B indexes only:
                      3 11 21
                      8 20 25
                      9 10 21
                      10 22 25
                      10 25 27

                      A and B indexes:
                      5 19
                      6 12
                      7 17
                      18 25
                      3 5 8
                      1 5 9
                      1 4 12
                      2 3 12
                      7 10 12
                      1 3 13
                      2 4 13
                      8 9 13
                      2 11 14
                      7 9 14
                      2 11 15
                      7 9 15
                      5 11 16
                      10 13 16
                      6 10 17
                      1 14 18
                      1 15 18
                      5 7 18
                      17 18 19
                      3 17 20
                      3 11 21
                      4 18 21
                      8 16 21
                      9 10 21
                      12 14 21
                      12 15 21
                      5 6 22
                      12 19 22
                      2 5 23
                      8 12 23
                      14 16 23
                      15 16 23
                      4 17 24
                      14 22 24
                      15 22 24
                      19 21 24
                      5 17 25
                      7 19 25
                      8 20 25
                      10 22 25
                      1 11 26
                      2 18 26
                      7 23 26
                      8 10 26
                      9 16 26
                      13 21 26
                      20 22 26
                      5 6 27
                      10 25 27
                      12 19 27
                      14 24 27
                      15 24 27
                      20 26 27

                      How about if you have the Small RNA kit with all 48 indexes? There are a lot of possibilities. I'll just give you the 2 index MK compatible pools:

                      4 35
                      5 19
                      6 12
                      7 17
                      10 39
                      18 25
                      18 33
                      20 30
                      21 29
                      22 45
                      24 31
                      26 42
                      27 45
                      37 45


                      One final caveat. I did not mention to Dave that Illumina specifies reading 7 cycles for the index read -- so it can do CAFIE corrections on the first 6. So that throws a bit of a wrench in the mix...

                      --
                      Phillip

                      Comment

                      • pmiguel
                        Senior Member
                        • Aug 2008
                        • 2328

                        #12
                        Originally posted by BIG_SNP View Post
                        Phillip

                        Many of the combinations you have listed require index 17:

                        for example...

                        A and B indexes:
                        7 17
                        3 17 20
                        4 17 24
                        5 17 25
                        6 10 17
                        17 18 19

                        but from the list you stated of what is included in Box A and Box B there is no index 17 included. Could you please help.

                        Thank you!
                        Hi Big_SNP,
                        Yes, I can help -- pool any indexes you like together. It doesn't matter any more. Illumina fixed this issue that caused low % demultiplexing due to unequal base representation around the time they got around to mentioning in the manuals it was a problem. Now we have manuals that warn against a non-existent problem.
                        Ah well...

                        --
                        Phillip

                        Comment

                        • HeinKey
                          Member
                          • May 2009
                          • 21

                          #13
                          Hi Phillip,
                          For MiSeq runs I agree, but is RTA for HiSeq also changed? I was told the improvement for biased libraries was only for MiSeq RTA.

                          Hein

                          Comment

                          • pmiguel
                            Senior Member
                            • Aug 2008
                            • 2328

                            #14
                            Originally posted by HeinKey View Post
                            Hi Phillip,
                            For MiSeq runs I agree, but is RTA for HiSeq also changed? I was told the improvement for biased libraries was only for MiSeq RTA.

                            Hein
                            I don't understand what you mean. This is not a library bias issue, is it? We are talking index reads, right?

                            Our HiSeq has never cared about index balance. It just isn't an issue. With other HiSeqs? I don't know.

                            Our previous sequencer: HiScanSQ -- there index balance mattered. But not for HiSeq or MiSeq. Not ever, in my experience.

                            --
                            Phillip

                            Comment

                            • GenoMax
                              Senior Member
                              • Feb 2008
                              • 7142

                              #15
                              Only time problems with indexes show up on HiSeq is when one over clusters samples. Regular reads are resistant but the index reads tend to start accumulating N's (when samples are over clustered) leading to losses.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              24 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              41 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              48 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...