Hey,
I'm working on getting SNP calls from Illumina data in a plant with no reference genome in order to infer population structure.
I'm using restriction-site associated DNA (RAD loci) as a complexity reduction step (Samples are digested then size-selected to get an arbitrary but reproducible fraction of the genome). I'm also using multiplexing to get the sample size up.
So far I've been using ABySS to assemble the reads into mini-contigs (contig length = read length), then MAQ to map the reads back to the contigs, and finally custom scripts to call the SNPs.
If anyone has suggestions on other bioinformatic strategies, I'd be happy to hear them! (I've been looking at SOAP, Mira, CLC, velvet)
Thanks for all the useful posts in the archives!
G
I'm working on getting SNP calls from Illumina data in a plant with no reference genome in order to infer population structure.
I'm using restriction-site associated DNA (RAD loci) as a complexity reduction step (Samples are digested then size-selected to get an arbitrary but reproducible fraction of the genome). I'm also using multiplexing to get the sample size up.
So far I've been using ABySS to assemble the reads into mini-contigs (contig length = read length), then MAQ to map the reads back to the contigs, and finally custom scripts to call the SNPs.
If anyone has suggestions on other bioinformatic strategies, I'd be happy to hear them! (I've been looking at SOAP, Mira, CLC, velvet)
Thanks for all the useful posts in the archives!
G
Code:
Genome ----------------------------- Restriction sites * * * Plant_1 = = = = = = Reads = = = = = Plant_2 = = = = = + = = = + (SNP)