View Single Post
Old 07-20-2017, 11:13 AM   #2
Registered Vendor
Location: Eugene, OR

Join Date: May 2013
Posts: 489

Amplicons will often have very different read depths given differences in amplicon lengths and GC content. Different samples will also have different total read counts. So you will want to oversequence to get sufficient depth of your worse-performing samples and worse-performing amplicons. If you can't fit it all in, then you'll have to decide to do fewer samples or be OK with not all amplicons returning data.

At low read depths, sampling probability rules. Let's say two alleles are present at a locus and they have the same amplifying performance. At 10X read depth there is a (1/2)^10 or 0.1% chance of not sampling that allele (not too bad). But let's say the allele is a little longer amplicon and the read balance is 7 to 3. Now there is a 3% chance of not getting a read in the worse performing allele. Now imagine you want 3 reads to call the allele... the chance is actually quite high you won't achieve that.

I'd pick some number, like 20X depth, then add more for different reasons... let's say 50% of the library is off-target amplifications, so double the reads needed. Now predict you have a 4-fold variation in read count between samples and you want good coverage of the low ones... multiply by 4. There is a 10-fold variation in locus coverage, thats 10X more. Now it seems super high, but you can decide to drop the very worst loci and multiply by 5 instead of 10. Anyway, that's the process!
Providing nextRAD genotyping and PacBio sequencing services.
SNPsaurus is offline   Reply With Quote