SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Amplicon sequencing: Where do short reads come from TimK Illumina/Solexa 1 11-07-2016 10:03 PM
low complexity PCR amplicon sequencing using Miseq windhorse8 Illumina/Solexa 3 09-13-2013 06:44 AM
Low pass filter % due to short reads eladSeq 454 Pyrosequencing 4 12-28-2012 09:32 AM
lobSTR: a new tool for profiling STRs from short reads mgymrek Bioinformatics 7 10-22-2012 07:51 AM
short reads from amplicon for 454 using Titanium chemistry pseudorabies 454 Pyrosequencing 9 08-15-2011 07:46 AM

Reply
 
Thread Tools
Old 07-25-2018, 01:47 AM   #1
SarahAurora
Member
 
Location: Bonn Germany

Join Date: Jun 2017
Posts: 17
Default MiSeq amplicon sequencing of STRs: Low PF and effective reads due to short reads

Dear all,

we are currently working on the developement and validation of a MiSeq-based in-house developed NGS assay, using amplicons targeting forensic STR loci. We are using paired-end sequencing with dual index reads with the MiSeq Reagent Kit v3.

For the last few NGS runs, we didn't change the characteristics of input samples (mix of intact and degraded DNA) nor the workflow. But we increased the amount of final (pooled) library from 4 pM to 12 pM and changed the number of samples included in the final pooled library from 30 to 96.

As a result, we had an overclustered run from what we gained no results.
For the next runs, we decreased the amount input of final library to 8-10 pM and we gained densities too high for amplicon seq (1200-1300 K/mm2), as well as a bad ≥ Q30 (47-60%). The amount of PF reads was small and the effective reads (PF reads that were trimmed and used for the calling of STR alleles) were even smaller than the amount of PF reads. We thought this was due to reduced read lengths (shown by fastqc-analysis). But this does not explain why the PF reads are too low.

So we reduced the input amount to 6 pM, gained lower density (730 K/mm2) but still had a too low ≥ Q30 (56%) and too less PF and effective reads. A lot of PF reads still showed reduced read lengths that were too small to be used for allele calling.

Do you have an idea why the number of PF reads could be too low and why the reads are still too short? Or another explanation why the quality is so bad?
I was reading about a too high sample-to-cell-arrangement (i.e., the number of samples loaded on a flow cell). For the three runs runs described here, we used 96 different samples to load on the flowcell. Before, we used around 30 samples. Could this have an impact on the read lengths? It's not clear for me why it should: I think it doesn't matter for the read length whether 6 pM of the same or different DNA is loaded on a flowcell (I know about the complexity problem but this does not account to read length).

Could it be possible that a loading of 6 pM final pooled library (containing 96 different samples and phiX) lead to a too high sample-to-cell-arrangement, resulting in an overclustering that leads to less PF reads and short reads?

I am looking forward to reading interesting answers. If you have questions, I am happy to answer and hope that we will find an explanation together.

Last edited by SarahAurora; 07-25-2018 at 01:52 AM.
SarahAurora is offline   Reply With Quote
Old 07-25-2018, 03:13 AM   #2
nucacidhunter
Senior Member
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,171
Default

FastQC report of a bad and good run will be helpful for troubleshooting.
nucacidhunter is offline   Reply With Quote
Old 07-25-2018, 03:13 AM   #3
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 352
Default

What's your %PhiX?
Bukowski is offline   Reply With Quote
Old 07-25-2018, 03:20 AM   #4
SarahAurora
Member
 
Location: Bonn Germany

Join Date: Jun 2017
Posts: 17
Default

Quote:
Originally Posted by nucacidhunter View Post
FastQC report of a bad and good run will be helpful for troubleshooting.
They are not finished yet. I have fastcq reports only for single samples, not for the whole run. But the last 3 runs showed smaller reads than before
SarahAurora is offline   Reply With Quote
Old 07-25-2018, 03:26 AM   #5
SarahAurora
Member
 
Location: Bonn Germany

Join Date: Jun 2017
Posts: 17
Default

Quote:
Originally Posted by Bukowski View Post
What's your %PhiX?
15% phiX. We also mix the libraries with amplicons of other (not STR amplicons) genetic loci
SarahAurora is offline   Reply With Quote
Old 07-25-2018, 03:27 AM   #6
nucacidhunter
Senior Member
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,171
Default

Sorry, I meant to ask for %Base in Data by Cycle plot from SAV.
nucacidhunter is offline   Reply With Quote
Old 07-25-2018, 04:20 AM   #7
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,287
Default

Yeah, I've noticed with v2 chemistry 500 cycle runs that going above 800 K/mm2 results in higher loss of read quality towards the ends of the reads when using high bias (low complexity) samples.
I don't think the number of samples in a pool is significant, only the cluster density.
You mention you are using v3 chemistry? There are 2 v3 MiSeq kits -- 150 cycles and 600 cycles. Which are you running?
--
Phillip
pmiguel is offline   Reply With Quote
Old 07-25-2018, 04:35 AM   #8
SarahAurora
Member
 
Location: Bonn Germany

Join Date: Jun 2017
Posts: 17
Default

Quote:
Originally Posted by nucacidhunter View Post
Sorry, I meant to ask for %Base in Data by Cycle plot from SAV.
Run 5: 12pM input - was ok (Q30 58%) but not so many reads as expected.
Run 7: 10pM input - bad quality and less PF and effective reads
Run 9: 10pM input - overclustered, no results due to too short reads
Run10: 6pM input but still bad quality and low PF and effective reads

They look like low diversity and unbalanced bases (ATCG) but this fits to all libraries we use because we always use the same amplicons of the same persons. So why does Run 5 look a little better than the other runs?
Attached Images
File Type: jpg DataByCycle_%Base_Run5,7,9,10_short.jpg (88.4 KB, 10 views)

Last edited by SarahAurora; 07-25-2018 at 04:41 AM.
SarahAurora is offline   Reply With Quote
Old 07-25-2018, 04:53 AM   #9
SarahAurora
Member
 
Location: Bonn Germany

Join Date: Jun 2017
Posts: 17
Default

Quote:
Originally Posted by pmiguel View Post
Yeah, I've noticed with v2 chemistry 500 cycle runs that going above 800 K/mm2 results in higher loss of read quality towards the ends of the reads when using high bias (low complexity) samples.
I don't think the number of samples in a pool is significant, only the cluster density.
You mention you are using v3 chemistry? There are 2 v3 MiSeq kits -- 150 cycles and 600 cycles. Which are you running?
--
Phillip
600 cycles. This explains the bad quality of the read-ends but it doesn't explain why in some runs, the quality is better and the reads are longer than in the last runs (Run 7 and 10)
SarahAurora is offline   Reply With Quote
Old 07-25-2018, 06:02 AM   #10
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,287
Default

Those mainly look like you read length was overrunning your amplicon lengths. Did you check your libraries or the library pool on a bioanalyzer?

--
Phillip
pmiguel is offline   Reply With Quote
Old 07-25-2018, 04:35 PM   #11
nucacidhunter
Senior Member
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,171
Default

Following usually are the cause of low PF:
1- Over-clustering
2- Low diversity
3- Sequencing primer quality
4- Adapter and primer quality

I think #4 would be most likely cause in this case if you have not used custom sequencing primers and it would explain low Q scores as well.

Illumina instruments produce base call for every cycle of PF reads and shorter reads than sequencing cycles indicates trimming either by setting up the MiSeq for automatic adapter trimming or by user post production. Run 5, 7 and 10 as pmiguel has mentioned looks like the sequencing has run into adapters and into flow cell oligo lawn.
nucacidhunter is offline   Reply With Quote
Old 07-25-2018, 04:51 PM   #12
ikripp
Member
 
Location: QLD, Aus

Join Date: Jan 2018
Posts: 11
Default

What did your library look like when you did the quant before you loaded it?
We find that if there are lots of small fragments they will preferentially cluster and produce poor quality results. I've also seen a similar issue when there are fragments significantly longer than those we are looking at. Basically anything outside of the 400bp-800bp range is an issue.
ikripp is offline   Reply With Quote
Old 07-26-2018, 12:25 AM   #13
SarahAurora
Member
 
Location: Bonn Germany

Join Date: Jun 2017
Posts: 17
Default

Unluckily, we don't have a bioanalyzer. We were using the MultiNA device but on it, we saw more or less correct bands (anyways, nothing outside the expected length range). Yes right, reads running into the adapters makes sense, I didn't think of that, thank you. But this still doesn't explain why run 7, 9 and 10 look even worse than run 5 because we always use the same multiplex reaction containing the same amplicons (of the same people). Low diversity and very short reads, as well as adapter and primer quality is always equal. This means that the bad results are due to overclustering which is not convincing to me because Run 10 showed a density of 730 K/mm2...
SarahAurora is offline   Reply With Quote
Old 07-26-2018, 01:38 AM   #14
nucacidhunter
Senior Member
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,171
Default

Reported density on SAV sometimes could be incorrect if cluster density is high and software is unable to identify individual clusters. To confirm this is not the case you can examine images of few cycles for each base.

%base per cycle indicates that sequence composition of libraries in run 5 is not the same as 7 and 10.

Sequencing through adapters will reduce overall Q score. Run 7 and 10 seems to have smaller fragments and run 9 as you have mentioned was over clustered. Q score of the target amplicon region after trimming would be good indicator of read quality.

Oligo quality can vary in each synthesis and even good quality oligo can go off. Oligo quality would be portion of correct sequence and full-length primers. Some vendors provide high yields of oligos that could have high level of truncated oligos.
nucacidhunter is offline   Reply With Quote
Old 07-26-2018, 08:40 AM   #15
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,287
Default

Quote:
Originally Posted by ikripp View Post
What did your library look like when you did the quant before you loaded it?
We find that if there are lots of small fragments they will preferentially cluster and produce poor quality results. I've also seen a similar issue when there are fragments significantly longer than those we are looking at. Basically anything outside of the 400bp-800bp range is an issue.
So, I don't know if I should even bring this up. It seems to be pretty far out of most lab's comfort zone.

But native DNA electrophoresis hides a critical issue we see with some substantial fraction of amplicon pools submitted to us. We see cases where strand-denatured assays (eg, heat denature the sample and run it on an RNA pico chip) show lots of short fragments that barely show up on a non-denaturing (DNA High Sensitivity). So I presume the short fragments are annealed to full-length fragments.

We mainly decided to start using pico chips to check libraries for NovaSeq runs -- index hopping being potentiated by primers/primer-dimers -- or at least that is what we are told. But then we saw that this assay could predict bad amplicon pool runs we started using it for that purpose as well.

I don't see why you couldn't develop the same denatured DNA run on an RNA assay for the multiNA. We just heat the DNA to 96oC for 2 minutes in a thermal cycler with a heated lid to prevent evaporation. Then we "snap cool" the DNA in a wet ice bath before loading.

--
Phillip

--
Phillip
pmiguel is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:31 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO