Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MiSeq amplicon sequencing of STRs: Low PF and effective reads due to short reads

    Dear all,

    we are currently working on the developement and validation of a MiSeq-based in-house developed NGS assay, using amplicons targeting forensic STR loci. We are using paired-end sequencing with dual index reads with the MiSeq Reagent Kit v3.

    For the last few NGS runs, we didn't change the characteristics of input samples (mix of intact and degraded DNA) nor the workflow. But we increased the amount of final (pooled) library from 4 pM to 12 pM and changed the number of samples included in the final pooled library from 30 to 96.

    As a result, we had an overclustered run from what we gained no results.
    For the next runs, we decreased the amount input of final library to 8-10 pM and we gained densities too high for amplicon seq (1200-1300 K/mm2), as well as a bad ≥ Q30 (47-60%). The amount of PF reads was small and the effective reads (PF reads that were trimmed and used for the calling of STR alleles) were even smaller than the amount of PF reads. We thought this was due to reduced read lengths (shown by fastqc-analysis). But this does not explain why the PF reads are too low.

    So we reduced the input amount to 6 pM, gained lower density (730 K/mm2) but still had a too low ≥ Q30 (56%) and too less PF and effective reads. A lot of PF reads still showed reduced read lengths that were too small to be used for allele calling.

    Do you have an idea why the number of PF reads could be too low and why the reads are still too short? Or another explanation why the quality is so bad?
    I was reading about a too high sample-to-cell-arrangement (i.e., the number of samples loaded on a flow cell). For the three runs runs described here, we used 96 different samples to load on the flowcell. Before, we used around 30 samples. Could this have an impact on the read lengths? It's not clear for me why it should: I think it doesn't matter for the read length whether 6 pM of the same or different DNA is loaded on a flowcell (I know about the complexity problem but this does not account to read length).

    Could it be possible that a loading of 6 pM final pooled library (containing 96 different samples and phiX) lead to a too high sample-to-cell-arrangement, resulting in an overclustering that leads to less PF reads and short reads?

    I am looking forward to reading interesting answers. If you have questions, I am happy to answer and hope that we will find an explanation together.
    Last edited by SarahAurora; 07-25-2018, 01:52 AM.

  • #2
    FastQC report of a bad and good run will be helpful for troubleshooting.

    Comment


    • #3
      What's your %PhiX?

      Comment


      • #4
        Originally posted by nucacidhunter View Post
        FastQC report of a bad and good run will be helpful for troubleshooting.
        They are not finished yet. I have fastcq reports only for single samples, not for the whole run. But the last 3 runs showed smaller reads than before

        Comment


        • #5
          Originally posted by Bukowski View Post
          What's your %PhiX?
          15% phiX. We also mix the libraries with amplicons of other (not STR amplicons) genetic loci

          Comment


          • #6
            Sorry, I meant to ask for %Base in Data by Cycle plot from SAV.

            Comment


            • #7
              Yeah, I've noticed with v2 chemistry 500 cycle runs that going above 800 K/mm2 results in higher loss of read quality towards the ends of the reads when using high bias (low complexity) samples.
              I don't think the number of samples in a pool is significant, only the cluster density.
              You mention you are using v3 chemistry? There are 2 v3 MiSeq kits -- 150 cycles and 600 cycles. Which are you running?
              --
              Phillip

              Comment


              • #8
                Originally posted by nucacidhunter View Post
                Sorry, I meant to ask for %Base in Data by Cycle plot from SAV.
                Run 5: 12pM input - was ok (Q30 58%) but not so many reads as expected.
                Run 7: 10pM input - bad quality and less PF and effective reads
                Run 9: 10pM input - overclustered, no results due to too short reads
                Run10: 6pM input but still bad quality and low PF and effective reads

                They look like low diversity and unbalanced bases (ATCG) but this fits to all libraries we use because we always use the same amplicons of the same persons. So why does Run 5 look a little better than the other runs?
                Attached Files
                Last edited by SarahAurora; 07-25-2018, 04:41 AM.

                Comment


                • #9
                  Originally posted by pmiguel View Post
                  Yeah, I've noticed with v2 chemistry 500 cycle runs that going above 800 K/mm2 results in higher loss of read quality towards the ends of the reads when using high bias (low complexity) samples.
                  I don't think the number of samples in a pool is significant, only the cluster density.
                  You mention you are using v3 chemistry? There are 2 v3 MiSeq kits -- 150 cycles and 600 cycles. Which are you running?
                  --
                  Phillip
                  600 cycles. This explains the bad quality of the read-ends but it doesn't explain why in some runs, the quality is better and the reads are longer than in the last runs (Run 7 and 10)

                  Comment


                  • #10
                    Those mainly look like you read length was overrunning your amplicon lengths. Did you check your libraries or the library pool on a bioanalyzer?

                    --
                    Phillip

                    Comment


                    • #11
                      Following usually are the cause of low PF:
                      1- Over-clustering
                      2- Low diversity
                      3- Sequencing primer quality
                      4- Adapter and primer quality

                      I think #4 would be most likely cause in this case if you have not used custom sequencing primers and it would explain low Q scores as well.

                      Illumina instruments produce base call for every cycle of PF reads and shorter reads than sequencing cycles indicates trimming either by setting up the MiSeq for automatic adapter trimming or by user post production. Run 5, 7 and 10 as pmiguel has mentioned looks like the sequencing has run into adapters and into flow cell oligo lawn.

                      Comment


                      • #12
                        What did your library look like when you did the quant before you loaded it?
                        We find that if there are lots of small fragments they will preferentially cluster and produce poor quality results. I've also seen a similar issue when there are fragments significantly longer than those we are looking at. Basically anything outside of the 400bp-800bp range is an issue.

                        Comment


                        • #13
                          Unluckily, we don't have a bioanalyzer. We were using the MultiNA device but on it, we saw more or less correct bands (anyways, nothing outside the expected length range). Yes right, reads running into the adapters makes sense, I didn't think of that, thank you. But this still doesn't explain why run 7, 9 and 10 look even worse than run 5 because we always use the same multiplex reaction containing the same amplicons (of the same people). Low diversity and very short reads, as well as adapter and primer quality is always equal. This means that the bad results are due to overclustering which is not convincing to me because Run 10 showed a density of 730 K/mm2...

                          Comment


                          • #14
                            Reported density on SAV sometimes could be incorrect if cluster density is high and software is unable to identify individual clusters. To confirm this is not the case you can examine images of few cycles for each base.

                            Sequencing through adapters will reduce overall Q score. Run 7 and 10 seems to have more smaller fragments and run 9 as you have mentioned was over clustered. Q score of the target amplicon region after trimming would be good indicator of read quality.

                            Oligo quality can vary in each synthesis and even good quality oligo can go off. Oligo quality would be portion with correct sequence and full-length primers. Some vendors provide high yields of oligos that could have high level of truncated oligos.

                            Comment


                            • #15
                              Reported density on SAV sometimes could be incorrect if cluster density is high and software is unable to identify individual clusters. To confirm this is not the case you can examine images of few cycles for each base.

                              %base per cycle indicates that sequence composition of libraries in run 5 is not the same as 7 and 10.

                              Sequencing through adapters will reduce overall Q score. Run 7 and 10 seems to have smaller fragments and run 9 as you have mentioned was over clustered. Q score of the target amplicon region after trimming would be good indicator of read quality.

                              Oligo quality can vary in each synthesis and even good quality oligo can go off. Oligo quality would be portion of correct sequence and full-length primers. Some vendors provide high yields of oligos that could have high level of truncated oligos.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X