Hi,
I'm designing an experiment for RAD-seq and I was wondering if I could get some feedback about my experimental design.
I am looking to RAD-seq ~75 individuals of a highly heterozygous species with no reference genome. All individuals are from natural populations. The goals of my study are to decipher the population structure of my species as well as to detect signatures of selection, population differentiation and allelic diversity. Hence I chose RAD-seq over other methods such as GBS or ddRAD-seq.
Through in silico digest, I found an enzyme (BamHI) which produces ~90,000 fragments >300bp in a related species. Is that a sufficient number? I suppose after sequencing some fragments will be lost due to repetitive nature or for not being present across all samples. Even if I lost 2/3rds, would 30K fragments be good enough for achieving my goals?
Also, I plan to use 2 lanes of HiSeq2000 (~40-plex), possibly giving me ~17X coverage with 100bp PE reads. Is that a good enough coverage? I read somewhere (ddRAD-seq paper) that 7X is sufficient, but then elsewhere it said ~20X. But I'm concerned that 90K fragments may not be sufficient. Any thoughts? What's a good in silico fragment number to start with?
Penultimate question. I suppose RAD-seq is not a problem for heterozygous species, since previous studies on deer mice, barley, sticklebacks, switchgrass were heterozygous, but I am not sure how the bioinformatics works? Does the software (eg: STACKS) differentiate between heterozygous reads? How often do you have to throw something away? Whats a good coverage to recover most of the heterozygous reads?
Finally, I'd like to plug my question about library prep here for better visibility.
Thanks!
I'm designing an experiment for RAD-seq and I was wondering if I could get some feedback about my experimental design.
I am looking to RAD-seq ~75 individuals of a highly heterozygous species with no reference genome. All individuals are from natural populations. The goals of my study are to decipher the population structure of my species as well as to detect signatures of selection, population differentiation and allelic diversity. Hence I chose RAD-seq over other methods such as GBS or ddRAD-seq.
Through in silico digest, I found an enzyme (BamHI) which produces ~90,000 fragments >300bp in a related species. Is that a sufficient number? I suppose after sequencing some fragments will be lost due to repetitive nature or for not being present across all samples. Even if I lost 2/3rds, would 30K fragments be good enough for achieving my goals?
Also, I plan to use 2 lanes of HiSeq2000 (~40-plex), possibly giving me ~17X coverage with 100bp PE reads. Is that a good enough coverage? I read somewhere (ddRAD-seq paper) that 7X is sufficient, but then elsewhere it said ~20X. But I'm concerned that 90K fragments may not be sufficient. Any thoughts? What's a good in silico fragment number to start with?
Penultimate question. I suppose RAD-seq is not a problem for heterozygous species, since previous studies on deer mice, barley, sticklebacks, switchgrass were heterozygous, but I am not sure how the bioinformatics works? Does the software (eg: STACKS) differentiate between heterozygous reads? How often do you have to throw something away? Whats a good coverage to recover most of the heterozygous reads?
Finally, I'd like to plug my question about library prep here for better visibility.
Thanks!
Comment