SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
RADseq flyingoyster Illumina/Solexa 28 05-24-2018 12:14 PM
RADseq library preparation restriction enzymes LoeraI General 1 02-21-2013 05:42 AM
Differences in read count from barcoded RADseq? susjoh Illumina/Solexa 1 06-12-2012 04:33 PM
RNAseq Novice kkizhatil Introductions 0 07-05-2011 07:30 AM
population genetics parameter estimation of solexa population sequencing baohua100 Bioinformatics 1 07-16-2008 09:53 AM

Reply
 
Thread Tools
Old 07-24-2013, 01:24 PM   #1
TKC
Member
 
Location: US

Join Date: Jul 2013
Posts: 10
Default Novice help: RADseq strategy for population level study

Hi all,

I am new to NGS and was hoping to receive some advice regarding the experimental design for a project we are currently putting together in which we are looking to ID informative SNPs for detecting introgression between multiple species with a convoluted history of much hybridization/ reticulations. Genome size approx. 2.2 gb w/ estimate of 38% GC content. The “closest” genomic resource available is within family, different genera. Other (potentially pertinent?) details: We think that 20X coverage as a minimum would be appropriate, and are looking to multiplex at least 96 individuals per lane.

Question: Does it make sense to genotype all (1000+) of our samples using RAD (or ddRAD?), or should we use RADseq on a subset of individuals (96 individuals total from “pure” populations?), ID informative SNPs, and then screen the rest of our samples using some genotyping assay (e.g. Sequenom)? Many projects seem to go this direction, but the problem I see is that it requires that there be enough flanking sequence (to the SNP) to develop oligos for Sequenom.

We will be running this on an Illumina HiSeq 2000, which yields fragments that I think may be too small (~100 bp) for us to be able to develop flanking oligos for use with Sequenom, unless the SNP happens to be smack in the center of the fragment, right? Does anyone have any experience with the Sequenom MassArray platform for SNP genotyping and could shed light on this issue?
Alternatively, we may have access to a MiSeq to yield larger fragment sizes, or we could use an overlapping paired-end method to try and get longer fragments as well (Hohenlohe et al. 2013; Molecular Ecology).

Any help/ direction is much appreciated.
TKC is offline   Reply With Quote
Old 07-24-2013, 02:07 PM   #2
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

"Does it make sense to genotype all (1000+) of our samples using RAD (or ddRAD?)"
Not only does it not make sense but will be VERY expensive!

You can get a good estimate of genome wide diversity in a population with 20-30 individuals



"or should we use RADseq on a subset of individuals (96 individuals total from “pure” populations?), ID informative SNPs, and then screen the rest of our samples using some genotyping assay (e.g. Sequenom)?"

This is counter to the rationale behind RADseq. Radseq negates the need for costly, timely, expensive SNP assays. RADseq is genotyping by sequencing. You ID the SNPs and genotype at the same time. Assays not needed

"Many projects seem to go this direction, but the problem I see is that it requires that there be enough flanking sequence (to the SNP) to develop oligos for Sequenom. "

Unless you want to develop an assay that will be used a lot, then it's not worth it. You need to think carefully about the costs of each. For what you want to do (detect introgression between species) you could do this probably with a minimum of 15 samples from each species. We are !

I would advise do RAD on a fraction of your samples, and then look at neutral markers (micro-sats, mtDNA) in the remainder to get the whole picture.
JackieBadger is offline   Reply With Quote
Old 07-25-2013, 08:07 AM   #3
TKC
Member
 
Location: US

Join Date: Jul 2013
Posts: 10
Default

Hi,

Thank you for the response, I'll take all the help I can get!

"You can get a good estimate of genome wide diversity in a population with 20-30 individuals"
-I guess I should clarify, this is a species complex for which we have samples representing multiple populations per species, and the 1000+ samples is 20-30 individuals per population per species. So we were looking for the most affordable way of looking at every population (as work with msats and mtDNA has already shown that they should be managed as distinct units)... We also know from previous work than hybridization is likely much more/less extensive in some populations than others.

As to cost, what would be a good estimate (rough, ballpark, etc) of cost per individual for RADseq? Would you recommend attempting the library prep in house, or farming that out with the sequencing? We have talked with Floragenex to some extent, and still don't have any concrete quote for total cost per individual...

"This is counter to the rationale behind RADseq. Radseq negates the need for costly, timely, expensive SNP assays. RADseq is genotyping by sequencing. You ID the SNPs and genotype at the same time. Assays not needed"
-I'm not 100% sure how much RADseq will cost us per individual, but Sequenom should cost us (after some start up costs) $4-5 per ~30 SNPs per individual (quoted from a commercial outfit- does anyone else know better??). So assuming we need 150 SNPs to have good diagnostic power that should cost us $20-25 per individual to assay. Since RADseq will give us more information than we really need, and if Sequenom is cheaper, it made sense to me that we could identify the SNPs (via RAD) that are useful for diagnosing hybrids, and then run the rest of our samples through the assay for genotyping.

Again, thanks for taking the time to respond!
TKC is offline   Reply With Quote
Old 07-25-2013, 09:18 AM   #4
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 501
Default

Hi TKC,

In the "old" days of a few years ago, lots of people did what you suggest here: use RAD-Seq to identify SNPs in a subset of a population, then convert a subset of those SNPs to a high-throughput genotyping platform. People used RAD PE contigs to get 300-500 bp, or overlap PE. And as you mention, a MiSeq run would now also have the same purpose.

But as sequencing costs drop, the population size where that strategy is appropriate keeps getting higher. It is an investment to set up the genotyping, and you are then dealing with only previously known SNPs so lose the ability to discern new alleles that may be of interest.

It sounds like the first question is if you need to get information on all 1000 individuals. When someone contacts SNPsaurus or my lab about a genotyping project (disclosure: my academic lab developed RAD-Seq, I have equity in Floragenex which offers RAD-Seq, I founded SNPsaurus which offers nextRAD), we ask how many markers are needed, do you need perfect information at each locus (this is a little tricky, some applications such as a genetic map prefer to have high quality genotypes that reliably call each allele of a heterozygote, others want good quality calls but missing alleles are OK), and what is the SNP rate in the population if known (i.e. how many sequenced loci will have a SNP in the population).

From that, you can design the experiment. In your case, since cost is a factor for a large population, if you can get away with it you'd hope to get by with fewer markers and lower coverage. So a project assaying 30,000 tags per genome (10,000 markers if 1/3 of tags have SNPs) at 5x coverage (you will have good quality calls but miss a portion of heterozygous alleles) can fit >600 samples per lane. Usually the project is constrained by index availability at that kind of multiplexing (for nextRAD we use dual indexing and typically mutliplex at 192 samples per lane), in which case you need 6 lanes of sequencing, and will get higher coverage than planned because you aren't multiplexing as much as possible.

For this kind of low coverage sequencing, the library cost will dominate, then. Most people peg the cost of materials at $15-20 per sample. I think the biggest unplanned for cost is labor (again, I'm an outsourcing service provider so I'll make that argument, but in my academic lab we help people with RAD projects and sometimes it drags on for months with run failure after run failure, so we see the ugly side as well!).

Oops, I went back and saw your 20X coverage... it actually still fits in 6 lanes, so the project would be the same.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

Last edited by SNPsaurus; 07-25-2013 at 09:20 AM.
SNPsaurus is offline   Reply With Quote
Old 07-25-2013, 09:44 AM   #5
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

5x coverage is way too low in my opinion. 20-30x minimum.

If cost is no issue, go with one of the Oregon providers (they ain't cheap!..but will deliver SNPs hassle free).... if you do not have 10's/100's of thousands of dollars to spend, I would suggest collaborating with a lab which has expertise.
JackieBadger is offline   Reply With Quote
Old 07-25-2013, 12:17 PM   #6
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 501
Default

JackieBadger, 5X may be low, but it really depends on the application. GBS (the Elshire method) is designed to sample many loci at sub-1X coverage, for example. If TKC just wants to see if introgression is happening, and there are hundreds of strain-specific SNPs, then having some missing data won't be a problem. But, you are right that it is better to be conservative about it!
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:43 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO