Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which indexes to pool.

    [Note added 12/12/2013:Illumina fixed this issue. The info below is, at best, of historical interest. At least for HiSeqs and MiSeqs. Probably HiScanSQs as well. Even the worst case scenario of a single index in a HiSeq lane generally results in >95% demultiplexing. So, if most of your reads got thrown into the "unknown" folder, there is probably some problem, other than "color balance", that caused it.]

    I find it maddening when our Illumina/CASAVA chokes on a TruSeq index pool and throws all the reads into the "unknown" directory. How does one avoid this?

    Disclaimer: we have not tested these out yet. Your mileage may vary, etc...

    Illumina sequencers do not handle low diversity sequence well. Using a low number of indexes (bar codes) in a single lane is likely to give you low diversity. In the most extreme case an entire scanner pass yields no no visible clusters. There are two types of scanner passes -- the A/C pass and the G/T pass. If your index pool has an A or C and a G or T at all 6 positions, you can avoid a blank scanner pass.

    The IUPAC code for A or C is "M" while the code for G or T is "K". So I like to think of a given index pool as "MK compatible" if there will be and M and a K at each of the 6 bases of the index. Which indexes are MK compatible?

    A programmer in the lab, Dave, whipped up a script to analyze some groupings of indexes to find MK compatible pools within them. If you have a TruSeq RNA or DNA library prep kit 'version 2', you have 12 indexes. There are two types of kits, the "A" and the "B" kits, distinguished only by the indexes they use:
    Box A:
    2, 4, 5, 6, 7, 12, 13, 14, 15, 16, 18, 19
    Box B:
    1, 3, 8, 9, 10, 11, 20, 21, 22, 23, 25, 27

    If Dave has it right, then these are the groupings that should minimize issues. That is, if you only have 2 or 3 libraries going into a single lane, here are the ones that, when pooled by row, are MK compatible. (For example, if you mix indexes 5 and 19 in a lane, every position will have an M and a K. Get it? Each row is good pool.)

    A indexes only:
    5 19
    6 12
    2 4 13
    5 7 18

    B indexes only:
    3 11 21
    8 20 25
    9 10 21
    10 22 25
    10 25 27

    A and B indexes:
    5 19
    6 12
    7 17
    18 25
    3 5 8
    1 5 9
    1 4 12
    2 3 12
    7 10 12
    1 3 13
    2 4 13
    8 9 13
    2 11 14
    7 9 14
    2 11 15
    7 9 15
    5 11 16
    10 13 16
    6 10 17
    1 14 18
    1 15 18
    5 7 18
    17 18 19
    3 17 20
    3 11 21
    4 18 21
    8 16 21
    9 10 21
    12 14 21
    12 15 21
    5 6 22
    12 19 22
    2 5 23
    8 12 23
    14 16 23
    15 16 23
    4 17 24
    14 22 24
    15 22 24
    19 21 24
    5 17 25
    7 19 25
    8 20 25
    10 22 25
    1 11 26
    2 18 26
    7 23 26
    8 10 26
    9 16 26
    13 21 26
    20 22 26
    5 6 27
    10 25 27
    12 19 27
    14 24 27
    15 24 27
    20 26 27

    How about if you have the Small RNA kit with all 48 indexes? There are a lot of possibilities. I'll just give you the 2 index MK compatible pools:

    4 35
    5 19
    6 12
    7 17
    10 39
    18 25
    18 33
    20 30
    21 29
    22 45
    24 31
    26 42
    27 45
    37 45


    One final caveat. I did not mention to Dave that Illumina specifies reading 7 cycles for the index read -- so it can do CAFIE corrections on the first 6. So that throws a bit of a wrench in the mix...

    --
    Phillip
    Last edited by pmiguel; 12-12-2013, 06:30 AM. Reason: note that this thread is obsolete.

  • #2
    Phillip

    This is great information, very helpful for planning out your RNA-seq experiments (which I'm learning is critical). I was wondering if you have observed a number of indexes (bar codes) that serves as the cut-off point between low diversity and high diversity with the Illumina sequencers. Is it just if you only have 2 or 3 indexes, since that is the combinations that you provided or did you still see issues with 4, 5 or even 6 samples? Thanks again for the very useful information.

    Kerry

    Comment


    • #3
      I think the critical factor is having a fair number of clusters "lit up" during any given scanning pass. Since there are 2 passes, one for "G" and "T" and another for "A" and "C" -- getting at least 5% or so of your cluster to "glow" should do the trick. Or most of it.

      I think the scanner upon seeing a blank flow cell presumes something must be wrong and may attempt to change its focal depth. Not good!

      I guess I could ask Dave to run the analysis to look for bad index pools of 4 or more indexes. But, as long as you had an MK compatible pair or triple in the pools, I would think you would be okay.

      I should add I have not looked back to see where we got into problems exactly. But my sense was that after you got above 4 indexes, things seemed okay. But that may just be because I did not look carefully enough...
      --
      Phillip

      Comment


      • #4
        Originally posted by pmiguel View Post
        One final caveat. I did not mention to Dave that Illumina specifies reading 7 cycles for the index read -- so it can do CAFIE corrections on the first 6. So that throws a bit of a wrench in the mix...

        --
        Phillip
        Careful Phillip. CAFIE is the other guys. Illumina does phasing correction.

        Comment


        • #5
          Originally posted by pmiguel View Post
          I find it maddening when our Illumina/CASAVA chokes on a TruSeq index pool and throws all the reads into the "unknown" directory. How does one avoid this?

          Disclaimer: we have not tested these out yet. Your mileage may vary, etc...

          Illumina sequencers do not handle low diversity sequence well. Using a low number of indexes (bar codes) in a single lane is likely to give you low diversity. In the most extreme case an entire scanner pass yields no no visible clusters. There are two types of scanner passes -- the A/C pass and the G/T pass. If your index pool has an A or C and a G or T at all 6 positions, you can avoid a blank scanner pass.
          Is low diversity really the source of the your problem though? We just completed a run where in 4 of the lanes we had run single libraries prepared with TruSeq. This meant that for the index read there was a single base at each cycle, the ultimate in low diversity. All libraries had the correct barcode read for >98% of the passed filter reads. We have also successfully run lanes with only two barcodes (I don't know what he MK breakdown was for those).

          On the other hand we have had complete fail of index reads even when there was a much higher order of pooling and thus very diverse barcodes. Our best guess in these cases is that the index primer failed to anneal efficiently.

          Comment


          • #6
            Speculation on my part: but it looked like a diversity issue to me. We are novices at this, with only a handful of Illumina runs under our belts though. But lanes with several indexes in them seem to invariably demultiplex without incident. Whereas lanes with one index in them generally have 2 or 3 cycles with all the base calls given very low quality values and all the reads end up in the "undetermined" catagory.

            It is possible that a recent CASAVA upgrade has fixed the issue. Under v1 the instrument software would choking so bad on single indexes that tech support had us reboot the constrol software before doing read2 so that it would re-calibrate its focus. But as of v3 chemistry (and whatever the software version that went along with that), the instrument would automatically recalibrate focus before read2. So I am sure there is action behind the scenes.

            I am demultiplexing our most recent run now. Will let you know how it looks.

            --
            Phillip

            Comment


            • #7
              So, we had exactly the same issues I describe above with the run I mention. Lanes with low diversity in their index sequences always failed to demultiplex. Lanes with >3 indexes always succeeded.

              However after going round and round with Illumina tech support about this, I think this may no longer be an issue for HiSeqs and only be an issue for HiScanSQs. Apparently HiSeqs do a separate scan for each base? I don't have a HiSeq, so I don't know for sure. The HiScanSQ definitely scans A and C together and G and T together. That might be the issue.

              Anyway, I did find out that the

              --use-bases-mask

              parameter can be used during demultiplexing to skip bases where the instrument has clearly defocused itself and is not collecting usable data.

              --
              Phillip

              Comment


              • #8
                We have had good success running low (or no) diversity samples on the HiSeq using the Nugen kit which uses in-line barcodes. We simply use multiple inline barcodes for each sample which tricks the machine into thinking the libraries are diverse for the first critical cycles to pass phasing, etc.

                Comment


                • #9
                  One thing we do to deal with this issue is to whenever possible create large pools and spread them across multiple lanes instead of using 2-3 samples per lane. Beyond the issue of low-complexity barcodes in small pools there is less of a risk of losing all the data from a set of samples if one lane fails.

                  Comment


                  • #10
                    Hi folks,

                    How is the following multiplexing (Barcodes 2, 3, 4, 5, 18, 19, 12, 13, 8 libraries) in a lane for TruSeq? In case, it is the problem to pool 2, 3 or 4 libraries I may be safe.

                    B.No. is barcode number.
                    B.No. --------Composition----------
                    18 G T C C G C
                    19 G T G A A A
                    12 C T T G T A
                    13 A G T C A A
                    2 C G A T G T
                    3 T T A G G C
                    4 T G A C C A
                    5 A C A G T G

                    Biochembug

                    Comment


                    • #11
                      question

                      Phillip

                      Many of the combinations you have listed require index 17:

                      for example...

                      A and B indexes:
                      7 17
                      3 17 20
                      4 17 24
                      5 17 25
                      6 10 17
                      17 18 19

                      but from the list you stated of what is included in Box A and Box B there is no index 17 included. Could you please help.

                      Thank you!




                      Originally posted by pmiguel View Post
                      I find it maddening when our Illumina/CASAVA chokes on a TruSeq index pool and throws all the reads into the "unknown" directory. How does one avoid this?

                      Disclaimer: we have not tested these out yet. Your mileage may vary, etc...

                      Illumina sequencers do not handle low diversity sequence well. Using a low number of indexes (bar codes) in a single lane is likely to give you low diversity. In the most extreme case an entire scanner pass yields no no visible clusters. There are two types of scanner passes -- the A/C pass and the G/T pass. If your index pool has an A or C and a G or T at all 6 positions, you can avoid a blank scanner pass.

                      The IUPAC code for A or C is "M" while the code for G or T is "K". So I like to think of a given index pool as "MK compatible" if there will be and M and a K at each of the 6 bases of the index. Which indexes are MK compatible?

                      A programmer in the lab, Dave, whipped up a script to analyze some groupings of indexes to find MK compatible pools within them. If you have a TruSeq RNA or DNA library prep kit 'version 2', you have 12 indexes. There are two types of kits, the "A" and the "B" kits, distinguished only by the indexes they use:
                      Box A:
                      2, 4, 5, 6, 7, 12, 13, 14, 15, 16, 18, 19
                      Box B:
                      1, 3, 8, 9, 10, 11, 20, 21, 22, 23, 25, 27

                      If Dave has it right, then these are the groupings that should minimize issues. That is, if you only have 2 or 3 libraries going into a single lane, here are the ones that, when pooled by row, are MK compatible. (For example, if you mix indexes 5 and 19 in a lane, every position will have an M and a K. Get it? Each row is good pool.)

                      A indexes only:
                      5 19
                      6 12
                      2 4 13
                      5 7 18

                      B indexes only:
                      3 11 21
                      8 20 25
                      9 10 21
                      10 22 25
                      10 25 27

                      A and B indexes:
                      5 19
                      6 12
                      7 17
                      18 25
                      3 5 8
                      1 5 9
                      1 4 12
                      2 3 12
                      7 10 12
                      1 3 13
                      2 4 13
                      8 9 13
                      2 11 14
                      7 9 14
                      2 11 15
                      7 9 15
                      5 11 16
                      10 13 16
                      6 10 17
                      1 14 18
                      1 15 18
                      5 7 18
                      17 18 19
                      3 17 20
                      3 11 21
                      4 18 21
                      8 16 21
                      9 10 21
                      12 14 21
                      12 15 21
                      5 6 22
                      12 19 22
                      2 5 23
                      8 12 23
                      14 16 23
                      15 16 23
                      4 17 24
                      14 22 24
                      15 22 24
                      19 21 24
                      5 17 25
                      7 19 25
                      8 20 25
                      10 22 25
                      1 11 26
                      2 18 26
                      7 23 26
                      8 10 26
                      9 16 26
                      13 21 26
                      20 22 26
                      5 6 27
                      10 25 27
                      12 19 27
                      14 24 27
                      15 24 27
                      20 26 27

                      How about if you have the Small RNA kit with all 48 indexes? There are a lot of possibilities. I'll just give you the 2 index MK compatible pools:

                      4 35
                      5 19
                      6 12
                      7 17
                      10 39
                      18 25
                      18 33
                      20 30
                      21 29
                      22 45
                      24 31
                      26 42
                      27 45
                      37 45


                      One final caveat. I did not mention to Dave that Illumina specifies reading 7 cycles for the index read -- so it can do CAFIE corrections on the first 6. So that throws a bit of a wrench in the mix...

                      --
                      Phillip

                      Comment


                      • #12
                        Originally posted by BIG_SNP View Post
                        Phillip

                        Many of the combinations you have listed require index 17:

                        for example...

                        A and B indexes:
                        7 17
                        3 17 20
                        4 17 24
                        5 17 25
                        6 10 17
                        17 18 19

                        but from the list you stated of what is included in Box A and Box B there is no index 17 included. Could you please help.

                        Thank you!
                        Hi Big_SNP,
                        Yes, I can help -- pool any indexes you like together. It doesn't matter any more. Illumina fixed this issue that caused low % demultiplexing due to unequal base representation around the time they got around to mentioning in the manuals it was a problem. Now we have manuals that warn against a non-existent problem.
                        Ah well...

                        --
                        Phillip

                        Comment


                        • #13
                          Hi Phillip,
                          For MiSeq runs I agree, but is RTA for HiSeq also changed? I was told the improvement for biased libraries was only for MiSeq RTA.

                          Hein

                          Comment


                          • #14
                            Originally posted by HeinKey View Post
                            Hi Phillip,
                            For MiSeq runs I agree, but is RTA for HiSeq also changed? I was told the improvement for biased libraries was only for MiSeq RTA.

                            Hein
                            I don't understand what you mean. This is not a library bias issue, is it? We are talking index reads, right?

                            Our HiSeq has never cared about index balance. It just isn't an issue. With other HiSeqs? I don't know.

                            Our previous sequencer: HiScanSQ -- there index balance mattered. But not for HiSeq or MiSeq. Not ever, in my experience.

                            --
                            Phillip

                            Comment


                            • #15
                              Only time problems with indexes show up on HiSeq is when one over clusters samples. Regular reads are resistant but the index reads tend to start accumulating N's (when samples are over clustered) leading to losses.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X