Greetings, I am trying to retrieve information regarding the putative taxonomic identifications of 16S/18S rRNA genes retrieved from a HiSeq Illumina run using BLASTN (from the blast+ package). Thus far i have been relying on biopython to parse the data. I'm able to retrieve information regarding the e-values to each query, alignment lengths and such for all of the hits using commands like the ones below.
###############################################
>>>from Bio.Blast import NCBIXML
>>>blast = NCBIXML.parse(open('16SxmlResults', 'rU'))
>>>for record in blast:
>>> print record.alignments[0].hsps[0].score
###############################################
The above prints all the high-scoring pair bit scores to standard output.
However, the piece of information i can't seem to access is located in the <Hit_def>. Looks like this;
<Hit_def>JR951091.270.2233 Bacteria;Proteobacteria;Alphaproteobacteria;Rickettsiales;mitochondria;Pisum sativum (pea)
I have looked into the biopython Bio.Blast.Record documention, as well as the tutorial, and can't seem to find any mixes/matches of how to retrieve this information. As well, I have also tried using elementtree to parse the data. This works, but i'm having a hard time "looping" through the whole file (there are ~4000 entries.
If anyone has any suggestions, or can provide some guidance i would sincerely appreciate it. Thanks,
-Tony
###############################################
>>>from Bio.Blast import NCBIXML
>>>blast = NCBIXML.parse(open('16SxmlResults', 'rU'))
>>>for record in blast:
>>> print record.alignments[0].hsps[0].score
###############################################
The above prints all the high-scoring pair bit scores to standard output.
However, the piece of information i can't seem to access is located in the <Hit_def>. Looks like this;
<Hit_def>JR951091.270.2233 Bacteria;Proteobacteria;Alphaproteobacteria;Rickettsiales;mitochondria;Pisum sativum (pea)
I have looked into the biopython Bio.Blast.Record documention, as well as the tutorial, and can't seem to find any mixes/matches of how to retrieve this information. As well, I have also tried using elementtree to parse the data. This works, but i'm having a hard time "looping" through the whole file (there are ~4000 entries.
If anyone has any suggestions, or can provide some guidance i would sincerely appreciate it. Thanks,
-Tony
Comment