SEQanswers (
-   Illumina/Solexa (
-   -   Optimal coverage for sequencing microsatellites with Illumina (

tugecko 07-20-2017 07:49 AM

Optimal coverage for sequencing microsatellites with Illumina
My lab is planning a phylogeography study on several different groups of lizards using microsatellites. We are interested in pooling the microsatellite amplicons for all of our individuals and then sequencing that library on an Illumina HiSeq 2500 machine (perhaps not the optimal machine for this project, but the one we have access to).

Right now, we are trying to figure out the logistics of our protocol, and one thing that we are stuck on is how much coverage we want per microsatellite locus per individual (which determines how many we could pool, etc.). I imagine that one would want more coverage than for a RAD protocol, since there are more potential variants one could be detecting, but I am really not sure. We haven't developed our microsatellites yet, so we don't know how much allelic variation we will be dealing with.

Has anyone else done a similar protocol with microsatellites? Does anyone have any advice? The few papers I found had wildly different amount of coverage (one had ~ 13x, which they determined was not enough, and the other 2000x, which seems excessive)

Just starting out, any thoughts would be appreciated!

SNPsaurus 07-20-2017 11:13 AM

Amplicons will often have very different read depths given differences in amplicon lengths and GC content. Different samples will also have different total read counts. So you will want to oversequence to get sufficient depth of your worse-performing samples and worse-performing amplicons. If you can't fit it all in, then you'll have to decide to do fewer samples or be OK with not all amplicons returning data.

At low read depths, sampling probability rules. Let's say two alleles are present at a locus and they have the same amplifying performance. At 10X read depth there is a (1/2)^10 or 0.1% chance of not sampling that allele (not too bad). But let's say the allele is a little longer amplicon and the read balance is 7 to 3. Now there is a 3% chance of not getting a read in the worse performing allele. Now imagine you want 3 reads to call the allele... the chance is actually quite high you won't achieve that.

I'd pick some number, like 20X depth, then add more for different reasons... let's say 50% of the library is off-target amplifications, so double the reads needed. Now predict you have a 4-fold variation in read count between samples and you want good coverage of the low ones... multiply by 4. There is a 10-fold variation in locus coverage, thats 10X more. Now it seems super high, but you can decide to drop the very worst loci and multiply by 5 instead of 10. Anyway, that's the process!

All times are GMT -8. The time now is 11:13 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.