Hi people,
I'd like to resolve the following problem(s):
I'm trying to resolve the structure of reads, partial of retroviral origin, partial of host genomic origin. In addition, I'd like to detect primer binding sites for a certain primer pair.
My approach was to blast my sequences against the virus, the genome and the primer. In theory I could construct one big database, but for the primers I used blastn, while megablast was much faster and appropriate for the other two databases.
Eyeballing the results, I can "easily" determine virus,genome and primer. Now I tried to parse my blast results to filter out overlapping hit (the genome has multiple insertions of the virus which will match every viral structure and I only want to extract genome hits without any overlap to any other db ref).
important would be, that a genomic hit overlapping with any other hits will be sorted out (marked with x)
Right at the beginning, the following problem came up:
Parsing blast xml with BioPerls SearchIO gives me only one query all the time! Here is the test code I used:
I would be very glad if anyone could tell me where I went into the wrong direction. The script should work with any blastxml output.
If you have other suggestions solving the general problem, please tell me.
Thanks in advance!
I'd like to resolve the following problem(s):
I'm trying to resolve the structure of reads, partial of retroviral origin, partial of host genomic origin. In addition, I'd like to detect primer binding sites for a certain primer pair.
My approach was to blast my sequences against the virus, the genome and the primer. In theory I could construct one big database, but for the primers I used blastn, while megablast was much faster and appropriate for the other two databases.
Eyeballing the results, I can "easily" determine virus,genome and primer. Now I tried to parse my blast results to filter out overlapping hit (the genome has multiple insertions of the virus which will match every viral structure and I only want to extract genome hits without any overlap to any other db ref).
important would be, that a genomic hit overlapping with any other hits will be sorted out (marked with x)
Right at the beginning, the following problem came up:
Parsing blast xml with BioPerls SearchIO gives me only one query all the time! Here is the test code I used:
Code:
#! /usr/bin/perl -w use strict; use Bio::Perl; use Bio::SearchIO; my $hitcountMAN=0; my $test; # Get the report my $searchio = Bio::SearchIO ->new ( -format => 'blastxml', -file => $ARGV[0]); while(my $result = $searchio->next_result){ my $algorithm_type = $result->algorithm; my $algorithm_version = $result ->algorithm_version; while (my $hit = $result->next_hit) { my $sseqid = $hit->name ; my $qseqid= $result->query_description; # print "$qseqid\t$sseqid\n"; #initialize test if($hitcountMAN==0){ $test=$qseqid; } if($qseqid!~$test){ print "new qseqid: $test\n"; $test=$qseqid; } $hitcountMAN++; } # Lets do some statistics my $resultcount=$searchio->result_count(); my $hitcount=$result->num_hits; print "SearchIO parsed result from $ARGV[0]: $algorithm_type\t $algorithm_version\t $resultcount result(s) with $hitcount hit(s)\n"; } open (INFILE, $ARGV[0]) or die $!; my @blastxml=<INFILE>; close INFILE;
If you have other suggestions solving the general problem, please tell me.
Thanks in advance!
Comment