Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • questions for ddRAD seq project on frogs

    Hi everyone,

    I'm working on a population genetic project involving a non-model organism of frog.
    We decided to use the ddRADseq to ask some population genetic questions.
    I started to dig into the protocol of Peterson et al. paper, and look around on the web and in the forum, but I prefer to ask some specialist for some precision/confirmation.
    First, I don't have a reference genome, but I have access to a first draft of a closely related one, which is useful for in silico analysis of RE digestion. The size of the genome is quite big, and probably around 10Gb.
    As suggested by the paper, I started by the following combination (SphI+EcoR, SphI+MlucI, NlaIII+EcoRI, NlaIII+MlucI, MspI+EcoRI and SphI+MspI). Based on the in silico digestion I'm expecting (roughly) the following number of fragments for the range 200-400bp:
    -SphI+EcoR: ~33,000
    -SphI+MlucI: ~570,000
    -NlaIII+EcoRI: ~410.000
    -NlaIII+MlucI: ~5,000,000
    -MspI+EcoRI: ~180,000 (~60,000 for 275-400bp and ~51,000 for 375-425bp)
    -SphI+MspI: ~ 227,000 (~72,000 for 375-425bp)
    But, based on that in silico digestion, I still don't know what might be the ideal number of markers and fragment size I should try to select for, and also which pair of enzyme. Maybe some other RE I didn't try yet ?

    If anyone has any experience with amphibian in general and frogs in particular or any suggestion, I would be most grateful. I read a lot of good advices coming from SNPsaurus in other threads, hope to see him looking around
    Thanks for your help
    A.

  • #2
    Hi Alex,

    I've been neglecting SeqAnswers mostly because I've been following the AGBT meeting all day (and reviewing grants, supposedly). Thanks for the shout-out!

    I'd say for a pop gen project you will have an excess of loci compared to what you need, so no need to stretch for lots of loci. However, I think with ddRAD (and other RADs) people can run into problems when trying to get a very small percent of the genome amplified, so that is a countervailing force. How many samples do you want to process?

    I also think people run into problems when the ratio of "good" fragments with the desired 2 cut sites is low compared to all other fragments (with both sides being the more frequent cutter, for example). So combining SbfI with a 4-cutter will allow more artifacts such as chewed back fragments and adapters ligating. Also, with ddRAD a wide size range is better so you don't end up dropping lots of loci with every size selection.

    I'd try the six-cutter enzyme combos to keep the number of loci down, with the caveat that it may be hard to get amplification off the library. In that case, trying some 6-cutters with better frequencies might be the next step. Then see how many loci you can easily amplify and cut back the size range to get the number of samples/lane you need.

    I'd give yourself lots of extra sequencing per sample. You can't just sequence 100,000 loci at 10X and expect to need 1M reads per sample. Some loci can more reads than others (maybe a 5-fold swing) and some samples more reads than others (maybe a 3-fold swing), so doubling or more the reads needed is good unless you want lots of dropped data.
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment


    • #3
      Hi SNPsaurus,
      Thanks for your fast answer.
      Here are some precision about the project:

      Originally posted by SNPsaurus View Post
      I'd say for a pop gen project you will have an excess of loci compared to what you need, so no need to stretch for lots of loci. However, I think with ddRAD (and other RADs) people can run into problems when trying to get a very small percent of the genome amplified, so that is a countervailing force. How many samples do you want to process?
      We have around 200 samples from 14-15 populations plus extra closely related sisters species that we need for calibration (we plan to have a time calibrated tree), so <250 individuals total.

      Originally posted by SNPsaurus View Post
      Also, with ddRAD a wide size range is better so you don't end up dropping lots of loci with every size selection.
      What is for you a wide size range? 400bp +/- 50 for example or from 200-400bp (I guess the later one might give to much fragments, but depends of RE combination)


      Originally posted by SNPsaurus View Post
      I'd give yourself lots of extra sequencing per sample. You can't just sequence 100,000 loci at 10X and expect to need 1M reads per sample. Some loci can more reads than others (maybe a 5-fold swing) and some samples more reads than others (maybe a 3-fold swing), so doubling or more the reads needed is good unless you want lots of dropped data.
      So, if I want 100,000 with 10X coverage, I need at least 2M read per sample (and 3-5M will be more secure, right?)

      More generally, with these RADseq data we would like to have some markers associated with color (our frogs are highly variable, almost 1 color variant per population), but also some marker(s) that are sex specific.
      I guess we should not cut down too much the number of markers but do you think that 50,000 to 100,000 markers are enough for that purpose ? Based on the planned budget, I think we can't go over 2M read per samples.

      Every advice/feedback is welcome.
      Cheers,
      A.

      Comment


      • #4
        Originally posted by alexbenroland View Post
        Hi SNPsaurus,
        We have around 200 samples from 14-15 populations plus extra closely related sisters species that we need for calibration (we plan to have a time calibrated tree), so <250 individuals total.

        What is for you a wide size range? 400bp +/- 50 for example or from 200-400bp (I guess the later one might give to much fragments, but depends of RE combination)
        This is where there start to be lots of competing pressures on a project design. If you are dealing with different species (albeit related) then you will have lots of polymorphisms. That almost certainly rules out the use of a restriction enzyme with a 4-bp recognition sequence. Think about how many 4-bp sequences would be within a 400-bp fragment that are a single mutation away from being the recognition sequence. Probably around 12, but could be as many as 20 depending on the GC-content of the genome and cut site. Now add the nucleotides of the two enzymes (6 + 4), so 20-30 nucleotides, that if mutated, will cause the locus to drop out of the library. If you have a SNP every 50 bp in your population, you will have loss of heterozygosity at every other locus somewhere in the population.

        Having a long fragment will exacerbate the problem, since there are more "almost cut sites" in a long fragment. But, what is your method of size selection? Let's say you have 100,000 loci in your 400 bp +- 50 bp library. That is 1,000 loci per bp. With ddRAD, the loci have a fixed size, so if you size select from 355-455 in one library and 345-445 in another, that means 20,000 loci are present in one library and missing in the other. But if you make the size range any bigger you may need to increase the fragment size to make sure reads don't hit the adapter (and then deal with "new site dropouts"). A 100 bp range is probably as good as you can do.



        So, if I want 100,000 with 10X coverage, I need at least 2M read per sample (and 3-5M will be more secure, right?)

        More generally, with these RADseq data we would like to have some markers associated with color (our frogs are highly variable, almost 1 color variant per population), but also some marker(s) that are sex specific.
        I guess we should not cut down too much the number of markers but do you think that 50,000 to 100,000 markers are enough for that purpose ? Based on the planned budget, I think we can't go over 2M read per samples.
        10X coverage is really pushing the low end, but on the other hand you are looking at massive loss of loci so lots of heterozygous dropout anyway. Maybe you should just embrace the lack of completeness and downsample the data to a single allele. You've got a huge genome, so 50-100,000 markers are probably not enough to assay each haplotype block for an association study, unless there were recent selective sweeps for color.

        It's a tough to get everything you want with budget constraints. Especially since more markers = more frequent cut sites = more locus dropouts. I guess downsampling means a loss of power in your analyses, as well.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment


        • #5
          Originally posted by SNPsaurus View Post
          Having a long fragment will exacerbate the problem, since there are more "almost cut sites" in a long fragment. But, what is your method of size selection?
          We have access to a Pippin prep, so I will use that for size selection

          Originally posted by SNPsaurus View Post

          It's a tough to get everything you want with budget constraints. Especially since more markers = more frequent cut sites = more locus dropouts. I guess downsampling means a loss of power in your analyses, as well.
          I agree, designing this experiment start to be tricky.
          I wonder if it's might not be interesting to make it in 2 times.
          A first library for the gen-pop part. If I don't need that much markers, would it be possible to let say:
          - I use one lane of sequencing which might be around 87.5M reads for my <250 individuals with 25X coverage. It's like 14,000 reads per individuals. Let say also we dropout 50% of them, we might have around 7,000 markers/animal. Is that enough ?
          Then, I pick up some representative samples of my populations (13 populations, 2-3 animals/pop) to drop down the number of individuals to have max number of reads and allow more markers than can eventually be associated with color (and maybe sex). Does it make sense?

          A.

          Comment


          • #6
            I was wondering what could be the best choice in RE combination+size selection, to obtain a good sampling of the genome.
            Let's say for the same number of loci (10,000 as an example), I have the choice between the following:
            - using 2 rare cutters followed by a wide size selection (200-700bp)
            - or using a common cutter in combination with a rare cutter, followed by a narrow size selection (275-425bp).
            Does it make sense to try wide size range with rare cutter, more than the second example? Did someone ever tried?
            Any advice is welcome.
            A.

            Comment


            • #7
              Originally posted by alexbenroland View Post
              I was wondering what could be the best choice in RE combination+size selection, to obtain a good sampling of the genome.
              Let's say for the same number of loci (10,000 as an example), I have the choice between the following:
              - using 2 rare cutters followed by a wide size selection (200-700bp)
              - or using a common cutter in combination with a rare cutter, followed by a narrow size selection (275-425bp).
              Does it make sense to try wide size range with rare cutter, more than the second example? Did someone ever tried?
              Any advice is welcome.
              A.
              Narrow size selection would give better results because with wider range there will be drop out from larger fragments during PCR which might not be the same fragments in each sample thereby reducing overlapping loci.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              31 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Working...
              X