Hello,
I'm new to genomics, but I have assembled a de novo transcriptome (Trinity), and have alignments for two sister species (two individuals per species = 4 individuals total) - the same species I used to develop the transcriptome.
My goal is to screen these species for SNPs using samtools, to see what genes of potential ecological importance are different between these closely related species. I am familiar with the samtools variant calling pipeline, but was looking for insight on how I should call the SNPs.
Do I pool my alignments, and obtain one set of SNPs for all species, or do I obtain SNPs separately for each species? I've read literature that discuss both, but was wondering why I should choose one or another. This is a basic question, but it has me stumped. I guess I'm confused whether the SNPs are called between my reference transcriptome and the alignments, or if they are called between the alignments, and the reference is being used as some sort of guide.
Thanks in advance.
I'm new to genomics, but I have assembled a de novo transcriptome (Trinity), and have alignments for two sister species (two individuals per species = 4 individuals total) - the same species I used to develop the transcriptome.
My goal is to screen these species for SNPs using samtools, to see what genes of potential ecological importance are different between these closely related species. I am familiar with the samtools variant calling pipeline, but was looking for insight on how I should call the SNPs.
Do I pool my alignments, and obtain one set of SNPs for all species, or do I obtain SNPs separately for each species? I've read literature that discuss both, but was wondering why I should choose one or another. This is a basic question, but it has me stumped. I guess I'm confused whether the SNPs are called between my reference transcriptome and the alignments, or if they are called between the alignments, and the reference is being used as some sort of guide.
Thanks in advance.