Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa

Similar Threads
Thread Thread Starter Forum Replies Last Post
Optimal Insert Length for RNAseq on Illumina? MTTom Sample Prep / Library Generation 0 02-26-2013 12:25 PM
Optimal coverage for RNA-seq analysis of DGE in bacteria seb330 RNA Sequencing 3 11-14-2012 06:42 PM
Microsatellites development with Next G. seq + Illumina transcriptome analysis Alexander Tchourbanov Bioinformatics 2 10-07-2012 06:42 PM
sequencing microsatellites koen de gelas 454 Pyrosequencing 1 08-29-2012 04:50 PM

Thread Tools
Old 07-20-2017, 08:49 AM   #1
Junior Member
Location: Philadelphia

Join Date: Jun 2017
Posts: 3
Default Optimal coverage for sequencing microsatellites with Illumina

My lab is planning a phylogeography study on several different groups of lizards using microsatellites. We are interested in pooling the microsatellite amplicons for all of our individuals and then sequencing that library on an Illumina HiSeq 2500 machine (perhaps not the optimal machine for this project, but the one we have access to).

Right now, we are trying to figure out the logistics of our protocol, and one thing that we are stuck on is how much coverage we want per microsatellite locus per individual (which determines how many we could pool, etc.). I imagine that one would want more coverage than for a RAD protocol, since there are more potential variants one could be detecting, but I am really not sure. We haven't developed our microsatellites yet, so we don't know how much allelic variation we will be dealing with.

Has anyone else done a similar protocol with microsatellites? Does anyone have any advice? The few papers I found had wildly different amount of coverage (one had ~ 13x, which they determined was not enough, and the other 2000x, which seems excessive)

Just starting out, any thoughts would be appreciated!
tugecko is offline   Reply With Quote
Old 07-20-2017, 12:13 PM   #2
Registered Vendor
Location: Eugene, OR

Join Date: May 2013
Posts: 516

Amplicons will often have very different read depths given differences in amplicon lengths and GC content. Different samples will also have different total read counts. So you will want to oversequence to get sufficient depth of your worse-performing samples and worse-performing amplicons. If you can't fit it all in, then you'll have to decide to do fewer samples or be OK with not all amplicons returning data.

At low read depths, sampling probability rules. Let's say two alleles are present at a locus and they have the same amplifying performance. At 10X read depth there is a (1/2)^10 or 0.1% chance of not sampling that allele (not too bad). But let's say the allele is a little longer amplicon and the read balance is 7 to 3. Now there is a 3% chance of not getting a read in the worse performing allele. Now imagine you want 3 reads to call the allele... the chance is actually quite high you won't achieve that.

I'd pick some number, like 20X depth, then add more for different reasons... let's say 50% of the library is off-target amplifications, so double the reads needed. Now predict you have a 4-fold variation in read count between samples and you want good coverage of the low ones... multiply by 4. There is a 10-fold variation in locus coverage, thats 10X more. Now it seems super high, but you can decide to drop the very worst loci and multiply by 5 instead of 10. Anyway, that's the process!
Providing nextRAD genotyping and PacBio sequencing services.
SNPsaurus is offline   Reply With Quote

allele, coverage, illumina, microsatellite, microsatellites

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 02:41 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO