Hello all,
About to start on a bit of bioinformatics endeavour for my population genomics study and before I do I just wondered if anyone had any pointers/suggestions.
I have access to the resequenced genomes of ~25 individuals. While further along I want to do some more in-depth analysis, right now I would just like to randomly sample the genomes for independent loci to get some simple estimates of some basic population genomic parameters (i.e. theta). So I would ideally like to get loci 500-1000 bp, approximately 100 kb apart (to ensure independence).
At the moment, all of the genomes have been assembled and mapped to a reference genome. So my question is, what is the best way to go about extracting loci? One idea I had was to align the consensus sequences using a whole genome aligner and then use a tool like Phylomarker to extract loci from orthologous blocks.
However, since the genomes have all been aligned to the same reference sequence, that seems a bit computationally wasteful. My other idea was to take the BAM files from each of the alignments and extract loci fitting my requirement from those. For what it's worth, I'm not afraid of scripting in Perl or R (and maybe even Python) if it's required to get the job done.
Any input would be very much appreciated!
About to start on a bit of bioinformatics endeavour for my population genomics study and before I do I just wondered if anyone had any pointers/suggestions.
I have access to the resequenced genomes of ~25 individuals. While further along I want to do some more in-depth analysis, right now I would just like to randomly sample the genomes for independent loci to get some simple estimates of some basic population genomic parameters (i.e. theta). So I would ideally like to get loci 500-1000 bp, approximately 100 kb apart (to ensure independence).
At the moment, all of the genomes have been assembled and mapped to a reference genome. So my question is, what is the best way to go about extracting loci? One idea I had was to align the consensus sequences using a whole genome aligner and then use a tool like Phylomarker to extract loci from orthologous blocks.
However, since the genomes have all been aligned to the same reference sequence, that seems a bit computationally wasteful. My other idea was to take the BAM files from each of the alignments and extract loci fitting my requirement from those. For what it's worth, I'm not afraid of scripting in Perl or R (and maybe even Python) if it's required to get the job done.
Any input would be very much appreciated!
Comment