Seqanswers Leaderboard Ad

**SNPsaurus** · 01-29-2014, 08:29 AM

Good questions. It sounds like you are working with a small genome, if BamHI digestion yields 90k fragments. What species are you working with? You will see some variation in locus-to-locus coverage when shearing with a frequent cutter, as the smaller restriction fragments shear less efficiently.

How many RAD tags is needed depends on the system. In Hohenlohe et al 2010 (bias notification: I am EAJ, one of the authors) a less-frequently cutting enzyme, SbfI, was used. But the stickleback system had recent selective sweeps so the larger blocks of effect could be more easily detected. Remember, also, that each cut site yields 2 RAD tags, so 90,000 cut site would be 180,000 tags. Of course, if a cut site is lost by mutation then both tags drop out. And if your system is highly heterozygous you are likely to find a SNP in each of the two tags across the populaiton.

Coverage... with any genotyping system there will be inefficiencies from the calculated. Some loci amplify better than others, the in silico digest may not reflect sites in repetitive regions that are difficult to assemble. So if you were to say calculate (#lanes * 150M reads)/(#samples * #sites * 2) you will get much lower coverage in the end. It is safer to add 50% more reads to the calculation. I would aim for 20X coverage at least if you want to assay both alleles at a locus (7X coverage means 3-4X coverage of each chromosome, which means a 5-15% chance of not sampling one of them). On the other hand, I think it is possible to calculate Fst and other population statistics with low coverage data, counting one chromosome per sample instead of two, and getting by with 5X coverage. You lose power (half the # of chromosomes), but save money.

Sure, Stacks is meant to find reads that are allelic and 'stack' them. You could also try pyRAD, which was designed for analyzing RAD data across related species, rather than within a species.

As for the adapters, look at the protocols here https://www.wiki.ed.ac.uk/display/RADSequencing/Home.

**flobpf** · 01-29-2014, 09:26 AM

Thanks for the quick reply, Dr. Johnson!

I am working with a tomato species, genome might be 900-1000Mb with 38% GC, based on what we know for cultivated tomato.

About coverage...my calculations were based on a conservative 120M reads/lane, and considering 2 tags/site. With 150M reads, I might get 22X. Nevertheless, I didn't know about this formula (my calculations were old school) nor had I considered 2 chromosomes. So thanks for that.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 9 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Feedback on RAD-seq strategy

Comment

Comment

Latest Articles

ad_right_rmr

News