Very new to bioinformatics and I am trying to write a code that will allow me to use a reference geneset (e.g. MLST) and BLAST it against genomes available on Genbank so that I can create datasets of these genesets.
For example, if I wish to do a multi locus gene comparison for a given genus, I'd like to be able to input my reference gene set, BLAST it against the available genomes and pull the gene sequences (fasta) that match with X% ID from the genome.
The reason I thought to do a BLAST-extraction is because some of the genomes I'm looking through are not annotated.
My idea is to use the BLAST+ tools to do this, but I cannot call the module in python. I've downloaded the software, but can only call it in terminal (I am using a mac). I am using python 2.7.5 and have added numpy and biopython.
Could anyone provide some advice for a novice?
Thanks in advance!
For example, if I wish to do a multi locus gene comparison for a given genus, I'd like to be able to input my reference gene set, BLAST it against the available genomes and pull the gene sequences (fasta) that match with X% ID from the genome.
The reason I thought to do a BLAST-extraction is because some of the genomes I'm looking through are not annotated.
My idea is to use the BLAST+ tools to do this, but I cannot call the module in python. I've downloaded the software, but can only call it in terminal (I am using a mac). I am using python 2.7.5 and have added numpy and biopython.
Could anyone provide some advice for a novice?
Thanks in advance!
Comment