Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Novice help: RADseq strategy for population level study

    Hi all,

    I am new to NGS and was hoping to receive some advice regarding the experimental design for a project we are currently putting together in which we are looking to ID informative SNPs for detecting introgression between multiple species with a convoluted history of much hybridization/ reticulations. Genome size approx. 2.2 gb w/ estimate of 38% GC content. The “closest” genomic resource available is within family, different genera. Other (potentially pertinent?) details: We think that 20X coverage as a minimum would be appropriate, and are looking to multiplex at least 96 individuals per lane.

    Question: Does it make sense to genotype all (1000+) of our samples using RAD (or ddRAD?), or should we use RADseq on a subset of individuals (96 individuals total from “pure” populations?), ID informative SNPs, and then screen the rest of our samples using some genotyping assay (e.g. Sequenom)? Many projects seem to go this direction, but the problem I see is that it requires that there be enough flanking sequence (to the SNP) to develop oligos for Sequenom.

    We will be running this on an Illumina HiSeq 2000, which yields fragments that I think may be too small (~100 bp) for us to be able to develop flanking oligos for use with Sequenom, unless the SNP happens to be smack in the center of the fragment, right? Does anyone have any experience with the Sequenom MassArray platform for SNP genotyping and could shed light on this issue?
    Alternatively, we may have access to a MiSeq to yield larger fragment sizes, or we could use an overlapping paired-end method to try and get longer fragments as well (Hohenlohe et al. 2013; Molecular Ecology).

    Any help/ direction is much appreciated.

  • #2
    "Does it make sense to genotype all (1000+) of our samples using RAD (or ddRAD?)"
    Not only does it not make sense but will be VERY expensive!

    You can get a good estimate of genome wide diversity in a population with 20-30 individuals



    "or should we use RADseq on a subset of individuals (96 individuals total from “pure” populations?), ID informative SNPs, and then screen the rest of our samples using some genotyping assay (e.g. Sequenom)?"

    This is counter to the rationale behind RADseq. Radseq negates the need for costly, timely, expensive SNP assays. RADseq is genotyping by sequencing. You ID the SNPs and genotype at the same time. Assays not needed

    "Many projects seem to go this direction, but the problem I see is that it requires that there be enough flanking sequence (to the SNP) to develop oligos for Sequenom. "

    Unless you want to develop an assay that will be used a lot, then it's not worth it. You need to think carefully about the costs of each. For what you want to do (detect introgression between species) you could do this probably with a minimum of 15 samples from each species. We are !

    I would advise do RAD on a fraction of your samples, and then look at neutral markers (micro-sats, mtDNA) in the remainder to get the whole picture.

    Comment


    • #3
      Hi,

      Thank you for the response, I'll take all the help I can get!

      "You can get a good estimate of genome wide diversity in a population with 20-30 individuals"
      -I guess I should clarify, this is a species complex for which we have samples representing multiple populations per species, and the 1000+ samples is 20-30 individuals per population per species. So we were looking for the most affordable way of looking at every population (as work with msats and mtDNA has already shown that they should be managed as distinct units)... We also know from previous work than hybridization is likely much more/less extensive in some populations than others.

      As to cost, what would be a good estimate (rough, ballpark, etc) of cost per individual for RADseq? Would you recommend attempting the library prep in house, or farming that out with the sequencing? We have talked with Floragenex to some extent, and still don't have any concrete quote for total cost per individual...

      "This is counter to the rationale behind RADseq. Radseq negates the need for costly, timely, expensive SNP assays. RADseq is genotyping by sequencing. You ID the SNPs and genotype at the same time. Assays not needed"
      -I'm not 100% sure how much RADseq will cost us per individual, but Sequenom should cost us (after some start up costs) $4-5 per ~30 SNPs per individual (quoted from a commercial outfit- does anyone else know better??). So assuming we need 150 SNPs to have good diagnostic power that should cost us $20-25 per individual to assay. Since RADseq will give us more information than we really need, and if Sequenom is cheaper, it made sense to me that we could identify the SNPs (via RAD) that are useful for diagnosing hybrids, and then run the rest of our samples through the assay for genotyping.

      Again, thanks for taking the time to respond!

      Comment


      • #4
        Hi TKC,

        In the "old" days of a few years ago, lots of people did what you suggest here: use RAD-Seq to identify SNPs in a subset of a population, then convert a subset of those SNPs to a high-throughput genotyping platform. People used RAD PE contigs to get 300-500 bp, or overlap PE. And as you mention, a MiSeq run would now also have the same purpose.

        But as sequencing costs drop, the population size where that strategy is appropriate keeps getting higher. It is an investment to set up the genotyping, and you are then dealing with only previously known SNPs so lose the ability to discern new alleles that may be of interest.

        It sounds like the first question is if you need to get information on all 1000 individuals. When someone contacts SNPsaurus or my lab about a genotyping project (disclosure: my academic lab developed RAD-Seq, I have equity in Floragenex which offers RAD-Seq, I founded SNPsaurus which offers nextRAD), we ask how many markers are needed, do you need perfect information at each locus (this is a little tricky, some applications such as a genetic map prefer to have high quality genotypes that reliably call each allele of a heterozygote, others want good quality calls but missing alleles are OK), and what is the SNP rate in the population if known (i.e. how many sequenced loci will have a SNP in the population).

        From that, you can design the experiment. In your case, since cost is a factor for a large population, if you can get away with it you'd hope to get by with fewer markers and lower coverage. So a project assaying 30,000 tags per genome (10,000 markers if 1/3 of tags have SNPs) at 5x coverage (you will have good quality calls but miss a portion of heterozygous alleles) can fit >600 samples per lane. Usually the project is constrained by index availability at that kind of multiplexing (for nextRAD we use dual indexing and typically mutliplex at 192 samples per lane), in which case you need 6 lanes of sequencing, and will get higher coverage than planned because you aren't multiplexing as much as possible.

        For this kind of low coverage sequencing, the library cost will dominate, then. Most people peg the cost of materials at $15-20 per sample. I think the biggest unplanned for cost is labor (again, I'm an outsourcing service provider so I'll make that argument, but in my academic lab we help people with RAD projects and sometimes it drags on for months with run failure after run failure, so we see the ugly side as well!).

        Oops, I went back and saw your 20X coverage... it actually still fits in 6 lanes, so the project would be the same.
        Last edited by SNPsaurus; 07-25-2013, 09:20 AM.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment


        • #5
          5x coverage is way too low in my opinion. 20-30x minimum.

          If cost is no issue, go with one of the Oregon providers (they ain't cheap!..but will deliver SNPs hassle free).... if you do not have 10's/100's of thousands of dollars to spend, I would suggest collaborating with a lab which has expertise.

          Comment


          • #6
            JackieBadger, 5X may be low, but it really depends on the application. GBS (the Elshire method) is designed to sample many loci at sub-1X coverage, for example. If TKC just wants to see if introgression is happening, and there are hundreds of strain-specific SNPs, then having some missing data won't be a problem. But, you are right that it is better to be conservative about it!
            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Working...
            X