Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Feedback on RAD-seq strategy

    Hi,

    I'm designing an experiment for RAD-seq and I was wondering if I could get some feedback about my experimental design.

    I am looking to RAD-seq ~75 individuals of a highly heterozygous species with no reference genome. All individuals are from natural populations. The goals of my study are to decipher the population structure of my species as well as to detect signatures of selection, population differentiation and allelic diversity. Hence I chose RAD-seq over other methods such as GBS or ddRAD-seq.

    Through in silico digest, I found an enzyme (BamHI) which produces ~90,000 fragments >300bp in a related species. Is that a sufficient number? I suppose after sequencing some fragments will be lost due to repetitive nature or for not being present across all samples. Even if I lost 2/3rds, would 30K fragments be good enough for achieving my goals?

    Also, I plan to use 2 lanes of HiSeq2000 (~40-plex), possibly giving me ~17X coverage with 100bp PE reads. Is that a good enough coverage? I read somewhere (ddRAD-seq paper) that 7X is sufficient, but then elsewhere it said ~20X. But I'm concerned that 90K fragments may not be sufficient. Any thoughts? What's a good in silico fragment number to start with?

    Penultimate question. I suppose RAD-seq is not a problem for heterozygous species, since previous studies on deer mice, barley, sticklebacks, switchgrass were heterozygous, but I am not sure how the bioinformatics works? Does the software (eg: STACKS) differentiate between heterozygous reads? How often do you have to throw something away? Whats a good coverage to recover most of the heterozygous reads?

    Finally, I'd like to plug my question about library prep here for better visibility.

    Thanks!
    Last edited by flobpf; 01-29-2014, 07:12 AM. Reason: clarity

  • #2
    Good questions. It sounds like you are working with a small genome, if BamHI digestion yields 90k fragments. What species are you working with? You will see some variation in locus-to-locus coverage when shearing with a frequent cutter, as the smaller restriction fragments shear less efficiently.

    How many RAD tags is needed depends on the system. In Hohenlohe et al 2010 (bias notification: I am EAJ, one of the authors) a less-frequently cutting enzyme, SbfI, was used. But the stickleback system had recent selective sweeps so the larger blocks of effect could be more easily detected. Remember, also, that each cut site yields 2 RAD tags, so 90,000 cut site would be 180,000 tags. Of course, if a cut site is lost by mutation then both tags drop out. And if your system is highly heterozygous you are likely to find a SNP in each of the two tags across the populaiton.

    Coverage... with any genotyping system there will be inefficiencies from the calculated. Some loci amplify better than others, the in silico digest may not reflect sites in repetitive regions that are difficult to assemble. So if you were to say calculate (#lanes * 150M reads)/(#samples * #sites * 2) you will get much lower coverage in the end. It is safer to add 50% more reads to the calculation. I would aim for 20X coverage at least if you want to assay both alleles at a locus (7X coverage means 3-4X coverage of each chromosome, which means a 5-15% chance of not sampling one of them). On the other hand, I think it is possible to calculate Fst and other population statistics with low coverage data, counting one chromosome per sample instead of two, and getting by with 5X coverage. You lose power (half the # of chromosomes), but save money.

    Sure, Stacks is meant to find reads that are allelic and 'stack' them. You could also try pyRAD, which was designed for analyzing RAD data across related species, rather than within a species.

    As for the adapters, look at the protocols here https://www.wiki.ed.ac.uk/display/RADSequencing/Home.
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment


    • #3
      Thanks for the quick reply, Dr. Johnson!

      I am working with a tomato species, genome might be 900-1000Mb with 38% GC, based on what we know for cultivated tomato.

      About coverage...my calculations were based on a conservative 120M reads/lane, and considering 2 tags/site. With 150M reads, I might get 22X. Nevertheless, I didn't know about this formula (my calculations were old school) nor had I considered 2 chromosomes. So thanks for that.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      17 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      48 views
      0 likes
      Last Post seqadmin  
      Working...
      X