Dear all,
we are currently working on the developement and validation of a MiSeq-based in-house developed NGS assay, using amplicons targeting forensic STR loci. We are using paired-end sequencing with dual index reads with the MiSeq Reagent Kit v3.
For the last few NGS runs, we didn't change the characteristics of input samples (mix of intact and degraded DNA) nor the workflow. But we increased the amount of final (pooled) library from 4 pM to 12 pM and changed the number of samples included in the final pooled library from 30 to 96.
As a result, we had an overclustered run from what we gained no results.
For the next runs, we decreased the amount input of final library to 8-10 pM and we gained densities too high for amplicon seq (1200-1300 K/mm2), as well as a bad ≥ Q30 (47-60%). The amount of PF reads was small and the effective reads (PF reads that were trimmed and used for the calling of STR alleles) were even smaller than the amount of PF reads. We thought this was due to reduced read lengths (shown by fastqc-analysis). But this does not explain why the PF reads are too low.
So we reduced the input amount to 6 pM, gained lower density (730 K/mm2) but still had a too low ≥ Q30 (56%) and too less PF and effective reads. A lot of PF reads still showed reduced read lengths that were too small to be used for allele calling.
Do you have an idea why the number of PF reads could be too low and why the reads are still too short? Or another explanation why the quality is so bad?
I was reading about a too high sample-to-cell-arrangement (i.e., the number of samples loaded on a flow cell). For the three runs runs described here, we used 96 different samples to load on the flowcell. Before, we used around 30 samples. Could this have an impact on the read lengths? It's not clear for me why it should: I think it doesn't matter for the read length whether 6 pM of the same or different DNA is loaded on a flowcell (I know about the complexity problem but this does not account to read length).
Could it be possible that a loading of 6 pM final pooled library (containing 96 different samples and phiX) lead to a too high sample-to-cell-arrangement, resulting in an overclustering that leads to less PF reads and short reads?
I am looking forward to reading interesting answers. If you have questions, I am happy to answer and hope that we will find an explanation together.
we are currently working on the developement and validation of a MiSeq-based in-house developed NGS assay, using amplicons targeting forensic STR loci. We are using paired-end sequencing with dual index reads with the MiSeq Reagent Kit v3.
For the last few NGS runs, we didn't change the characteristics of input samples (mix of intact and degraded DNA) nor the workflow. But we increased the amount of final (pooled) library from 4 pM to 12 pM and changed the number of samples included in the final pooled library from 30 to 96.
As a result, we had an overclustered run from what we gained no results.
For the next runs, we decreased the amount input of final library to 8-10 pM and we gained densities too high for amplicon seq (1200-1300 K/mm2), as well as a bad ≥ Q30 (47-60%). The amount of PF reads was small and the effective reads (PF reads that were trimmed and used for the calling of STR alleles) were even smaller than the amount of PF reads. We thought this was due to reduced read lengths (shown by fastqc-analysis). But this does not explain why the PF reads are too low.
So we reduced the input amount to 6 pM, gained lower density (730 K/mm2) but still had a too low ≥ Q30 (56%) and too less PF and effective reads. A lot of PF reads still showed reduced read lengths that were too small to be used for allele calling.
Do you have an idea why the number of PF reads could be too low and why the reads are still too short? Or another explanation why the quality is so bad?
I was reading about a too high sample-to-cell-arrangement (i.e., the number of samples loaded on a flow cell). For the three runs runs described here, we used 96 different samples to load on the flowcell. Before, we used around 30 samples. Could this have an impact on the read lengths? It's not clear for me why it should: I think it doesn't matter for the read length whether 6 pM of the same or different DNA is loaded on a flowcell (I know about the complexity problem but this does not account to read length).
Could it be possible that a loading of 6 pM final pooled library (containing 96 different samples and phiX) lead to a too high sample-to-cell-arrangement, resulting in an overclustering that leads to less PF reads and short reads?
I am looking forward to reading interesting answers. If you have questions, I am happy to answer and hope that we will find an explanation together.
Comment