Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • miseq multiplex fail

    Hi,

    I'm doing metabarcoding of bat diet by amplifying a 157bp region of CO1.

    I'm using illumina protocol for 16s metagenomic libraries, but with tag modified primers. This means that the PCR primers are composed of adaptor + 5bp tag + primer. I amplified 12 plates, using the same primer, but with different tags (1 different tag per plate). I then cleaned, quantified every plate, normalized and pooled the 12 plates into just 1. Afterwards I did the indexing PCR with illumina's nextera xt kit (96 combinations). This allowed the multiplexing of 12 x 96 samples.

    However, I only got decent results of 6 plates (~5000 reads per sample), and the other 6 failed (~50 reads per sample).

    Does anyone have any idea of why this might happen? Could my tags affect the ability of miseq reading the dna?

    Thanks in advance!

    Vanessa

    ps: for some reason my cluster density was also low (400), although the library was at 4nm (quantified using qpcr and tapestation) and diluted to 12pm.

  • #2
    It could be that the second indexing failed. Can you check some of those samples and see if they are the correct size (the adapters and indices should add ~50bp on each end)

    50 seqs is basically noise, you can put in totally random sequences and often get a couple hundred seqs demultiplexed to that sample.
    Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

    Comment


    • #3
      Reads coming from only half of your plates sounds like there might have been a protocol-specific problem but ~3 million reads is pretty low so you may have had low-diversity and/or clustering problems as well.

      I have recently had problems with low-diversity and clustering issues with Illumina technology as well. A few additional pieces of information that might help with a diagnosis are:
      1.What percentage of clusters passed filter?
      2.Do you have access to Sequence Analysis Viewer so that you can look at the flow cell density chart, Density box plots and thumbnail images to assess whether overclustering took place?
      3. Some FastQC reports of your sequences that worked might help diagnose diversity/clustering issues.
      4. Did you use PhiX and if so what percentage was used and what percentage of reads mapped to PhiX?

      We were only able to overcome our low-diversity/overclusting problems on a HiSeq by increasing the PhiX from 5% to 15%, lowering cluster density down to ~650, and using barcodes or tags that were 4, 5 and 6 bp long in order to stagger the sequence region of low-diversity (our libraries were for ddRAD seq).

      Comment


      • #4
        Originally posted by thermophile View Post
        It could be that the second indexing failed. Can you check some of those samples and see if they are the correct size (the adapters and indices should add ~50bp on each end)

        50 seqs is basically noise, you can put in totally random sequences and often get a couple hundred seqs demultiplexed to that sample.
        I checked every index pcr reaction on agarose gel and they all gained ~100bp.
        However, since every reaction had 12 different samples in it, it's impossible to know if all samples were able to incorporate the indices or not (I'm guessing the ones that failed, didn't). What I'm going to do now is try to do the indexing pcr for some samples amplified with the tags that failed independently and see whether they get longer or not...

        Originally posted by ATϟGC View Post
        Reads coming from only half of your plates sounds like there might have been a protocol-specific problem but ~3 million reads is pretty low so you may have had low-diversity and/or clustering problems as well.

        I have recently had problems with low-diversity and clustering issues with Illumina technology as well. A few additional pieces of information that might help with a diagnosis are:
        1.What percentage of clusters passed filter?
        2.Do you have access to Sequence Analysis Viewer so that you can look at the flow cell density chart, Density box plots and thumbnail images to assess whether overclustering took place?
        3. Some FastQC reports of your sequences that worked might help diagnose diversity/clustering issues.
        4. Did you use PhiX and if so what percentage was used and what percentage of reads mapped to PhiX?

        We were only able to overcome our low-diversity/overclusting problems on a HiSeq by increasing the PhiX from 5% to 15%, lowering cluster density down to ~650, and using barcodes or tags that were 4, 5 and 6 bp long in order to stagger the sequence region of low-diversity (our libraries were for ddRAD seq).
        Yes, I'm afraid I might have also suffered some problems due to low diversity library. Nevertheless, I don't think I had overclustering.

        1. 81% of the clusters passed the filters.
        2. Cluster density was 380 on average.
        3. I got results from all 96 illumina index combinations, but inside each illumina index, only tags/plates 1,2,3,4,5, and 7 worked. For the remaining I only got around ~50 reads (after filtering). The whole run produced ~6 million reads.
        4.We did use PhiX, but I think only 1%. I'll have to confirm with the technician.

        What puzzles me the most is that I had 4 DNA Plates, the exact same primers, just with different tags, and did 3 replicates of each plate. Plate 1-4 (1st replicate) worked, plate 5 and 7 worked, but 6 and 8 failed (2nd replicate), and 3rd replicate failed. The PCRs themselves were ok, and I cleaned them, quantified them, and normalized them all the same way. How can the tag affect the ability of incorporating the illumina index (if that's what failed anyway...)?

        Comment


        • #5
          The couple of things that jump out at me are that your cluster density is *very* low and your PF rate is rather low considering how few sequences you actually had. A MiSeq should be able to handle ~600k/mm^2, even when dealing with low diversity libraries and at the low density the PF rate should be in the 90%s. Assuming you haven't already, you may want to double check the cluster density with a look at a thumbnail image to be sure. Sometimes overclustering manifests as low cluster density with low PF rates (the software can't resolve all the clusters so it keeps only those it can recognize with confidence).

          To be clear, is the 5bp tag on the plate the first thing you read in read 1? If the tags represent the first 5 bases of read 1, then there's definitely a potential for low diversity issues (do you have equal representation of A, C, G, and T at each base in the tag sequences?) and you might be able to solve the issue by spiking in more PhiX and underloading a bit or staggering the tags like AT-GC suggested.

          One last series of questions: how did you quantify your libraries? You said qPCR, but KAPA kit, I'm guessing? And you used the tapestation data to size correct the qPCR results? Did the failed tags/plates look okay on the tapestation prior to pooling? Have you tried re-quantifying any of the samples?

          Comment


          • #6
            Update: I tested doing the indexing pcr independently (before pooling) for 3 primer combinations that failed sequencing. They all incorporated the index, as they went from ~280bp to ~380bp. So it's not a problem of lack of index.

            The only explanation that I can find is that for some strange reason the combination between my primer and these specific tags inhibit the sequencing reaction. Maybe they form some kind of loop?

            Originally posted by Jessica_L View Post
            The couple of things that jump out at me are that your cluster density is *very* low and your PF rate is rather low considering how few sequences you actually had. A MiSeq should be able to handle ~600k/mm^2, even when dealing with low diversity libraries and at the low density the PF rate should be in the 90%s. Assuming you haven't already, you may want to double check the cluster density with a look at a thumbnail image to be sure. Sometimes overclustering manifests as low cluster density with low PF rates (the software can't resolve all the clusters so it keeps only those it can recognize with confidence).
            I looked at some pictures and they all seem normal to me (but I don't have much experience with this). They all kind of look like this:



            Originally posted by Jessica_L View Post
            To be clear, is the 5bp tag on the plate the first thing you read in read 1? If the tags represent the first 5 bases of read 1, then there's definitely a potential for low diversity issues (do you have equal representation of A, C, G, and T at each base in the tag sequences?) and you might be able to solve the issue by spiking in more PhiX and underloading a bit or staggering the tags like AT-GC suggested.
            Yes, the 5bp tag is the first thing that is read. However they start differently:

            Tag 1 - ACGAC
            Tag 2 - AGCAC
            Tag 3 - AGTAT
            Tag 4 - ATGCG
            Tag 5 - CAGTA
            Tag 6 - CTAGT
            Tag 7 - GACGT
            Tag 8 - TACTC
            Tag 9 - TCGCA
            Tag 10 - TCTCT
            Tag 11 - TGCGT
            Tag 12 - TGTAC

            Originally posted by Jessica_L View Post
            One last series of questions: how did you quantify your libraries? You said qPCR, but KAPA kit, I'm guessing? And you used the tapestation data to size correct the qPCR results? Did the failed tags/plates look okay on the tapestation prior to pooling? Have you tried re-quantifying any of the samples?
            Before pooling everything I always used agarose gel (to check for amplification success and size) and nanodrop (to quantify, after realizing picogreen methods gave fairly consistent results with nanodrop after cleaning with beads). I quantified every plate, normalized, and pooled the 12 plates into just one. Afterwards I did the indexing PCR, cleaned, quantified, normalized to 10nM and pooled all 96 libraries into 1. I then quantified with tapestation and saw a very nice peak at 380bp. Quantified with qPCR and got a concentration of 7nM (using tapestation size). If half of my plates failed incorporating the index I should have a double peak in tapestation at ~280, right?

            I got results for all illumina indexs, but for each index I only got sequences from 6 of the plates/tags. I have tried re-quantifying some samples from the plates that failed (before pooling) and they are the same. I've double checked normalization calculations and they are all correct.

            I'm completely lost here...

            Comment


            • #7
              Have you (or whoever ran this flowcell) talked with Illumina tech support? If not that would be the first thing you should do. Let them examine this run and see what their take is on this. It may be indicative of some other issue with kit/instrument, who knows.

              That said, this was n=1. If you were to re-run this pool again things may work fine (but do the above first if you have not done so).

              Comment


              • #8
                Originally posted by GenoMax View Post
                Have you (or whoever ran this flowcell) talked with Illumina tech support? If not that would be the first thing you should do. Let them examine this run and see what their take is on this. It may be indicative of some other issue with kit/instrument, who knows.

                That said, this was n=1. If you were to re-run this pool again things may work fine (but do the above first if you have not done so).
                Yeah, we've tried techsupport and they have no idea. Since the problem is plate specific, they say that there seems to be no problem with the run it self, but with our library prep.

                Comment


                • #9
                  If you feel your libraries are perfect then you could invest in a small kit (50 cycles) and run the problem plates separately to see if the problem can be replicated. If it does then you don't have a lot of options (except perhaps remake the libraries for those plates).

                  Comment


                  • #10
                    Originally posted by GenoMax View Post
                    If you feel your libraries are perfect then you could invest in a small kit (50 cycles) and run the problem plates separately to see if the problem can be replicated. If it does then you don't have a lot of options (except perhaps remake the libraries for those plates).
                    Yes, what I'm going to try to do next is sequence in separate some samples of those plates that failed and see whether the problem persists or not... but I was hoping someone would have had the same problem of apparent good libraries failing to sequence due to some strange incompatibility between miseq and specific tag/primer combinations.

                    Comment


                    • #11
                      I think it's more likely that there was a mistake when setting up some of the plates rather than anything primer related (we've all had those days where we forget one of the primers or the enzyme...)

                      On Jessica_L's point, have you looked at all 4 bases for that tile/cycle. With low diversity samples you can easily have 3 bases that look like they have acceptable clustering density but the one base that is the dominant one will show massive over clustering.
                      Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

                      Comment


                      • #12
                        Originally posted by thermophile View Post
                        I think it's more likely that there was a mistake when setting up some of the plates rather than anything primer related (we've all had those days where we forget one of the primers or the enzyme...)

                        On Jessica_L's point, have you looked at all 4 bases for that tile/cycle. With low diversity samples you can easily have 3 bases that look like they have acceptable clustering density but the one base that is the dominant one will show massive over clustering.
                        I know I know... I just can't think of any mistake that would lead to these kind of results... I was expecting some of the samples to eventually fail/be over-represented, after pooling 12 x 96 samples, each one with a different volume. But for 6 entire plates to fail would require a big big big mistake!

                        Comment


                        • #13
                          Just based on a quick glance at your 5bp tags, your first base is heavy on A and T and there are only two C and one G. The second base contains only two T but is otherwise decently distributed. If I had to guess, I think you might be having an issue with low library complexity. For things like index reads, this is generally less of a problem, but the first several bases are used for cluster registration/identification so it tends to be a bit more critical to have a good balance. Depending on which tags failed, that might upset the base pair balance even further, so that might partially explain the low-ish PF rate.

                          I'm curious, if you go back and look at the failed plates/tags, whether or not there's a predictable pattern. (i.e. Based on the scarcity of certain nucleotides at each position, I'd guess that tags 4, 6, 7 would fail and maybe even 5 and 9 would, too?) If that's the case, I'd increase your PhiX spike-in by a healthy amount, maybe go as high as 20% or so? You could start at Illumina's recommendation for low diversity (which I think is at least 5%, but realistically you may need to go as high as 15% or 20%).

                          I'm happy to look at some SAV or FASTQC data if you want to share it, otherwise I think it sounds like you're on the more or less right track as far as investigating the failed plates.
                          Last edited by Jessica_L; 05-20-2016, 11:40 AM. Reason: reworded for clarity

                          Comment


                          • #14
                            Hi everyone!

                            Just to make you a quick update.

                            I took 3 PCR products amplified with some Tags that failed (6,8,9,10) and one that didn't (7) and I re-did the Indexing PCR, this time in separate reactions (no pooling). They all seemed to incorporate the index (library got bigger). Everything was quantified and homogenized.

                            I occupied ~1% of a Miseq Run to test whether this Tag failure was consistent or not.

                            Once again, Tag7 worked (10-14MB of data per sample), but Tag 6,8,9 and 10 failed (14-76KB per sample).

                            Any ideas of why sequences with these Tags are not being read by MiSeq?

                            Theories are welcomed!!
                            Last edited by vanessamata; 06-30-2016, 02:00 PM.

                            Comment


                            • #15
                              Originally posted by Jessica_L View Post
                              I'm curious, if you go back and look at the failed plates/tags, whether or not there's a predictable pattern. (i.e. Based on the scarcity of certain nucleotides at each position, I'd guess that tags 4, 6, 7 would fail and maybe even 5 and 9 would, too?)
                              Tags 1,2,3,4,5, and 7 worked
                              Tags 6,8,9,10,11, and 12 failed

                              I can't really see any pattern...

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X