Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Installation of rnammer-1.2 for Trinnoate: which version Perl XML/ is requir gwilymh Bioinformatics 1 02-19-2016 01:19 PM
bioperl blast parsing problem Mark Bioinformatics 9 04-27-2013 04:22 PM
parsing blast output using filters using regular perl newbie2this Bioinformatics 0 11-10-2012 05:02 AM
perl bioperl version m_elena_bioinfo Bioinformatics 1 02-21-2011 10:37 AM
Parsing BLAST results using BioPerl Ben Saville Bioinformatics 8 08-24-2010 07:43 AM

Thread Tools
Old 03-16-2018, 08:01 AM   #1
Location: Germany

Join Date: Mar 2013
Posts: 34
Question Parsing Blast xml using Perl (BioPerl)

Hi people,
I'd like to resolve the following problem(s):

I'm trying to resolve the structure of reads, partial of retroviral origin, partial of host genomic origin. In addition, I'd like to detect primer binding sites for a certain primer pair.

My approach was to blast my sequences against the virus, the genome and the primer. In theory I could construct one big database, but for the primers I used blastn, while megablast was much faster and appropriate for the other two databases.

Eyeballing the results, I can "easily" determine virus,genome and primer. Now I tried to parse my blast results to filter out overlapping hit (the genome has multiple insertions of the virus which will match every viral structure and I only want to extract genome hits without any overlap to any other db ref).

important would be, that a genomic hit overlapping with any other hits will be sorted out (marked with x)

Right at the beginning, the following problem came up:
Parsing blast xml with BioPerls SearchIO gives me only one query all the time! Here is the test code I used:

#! /usr/bin/perl -w
use strict;
use Bio::Perl;
use Bio::SearchIO;

my $hitcountMAN=0;
my $test;

  # Get the report 
my  $searchio = Bio::SearchIO  ->new (
                                        -format => 'blastxml',
                                        -file   => $ARGV[0]);

while(my $result = $searchio->next_result){
        my  $algorithm_type =  $result->algorithm;
        my $algorithm_version = $result ->algorithm_version;

        while (my $hit = $result->next_hit) {
                my  $sseqid = $hit->name ;
                my $qseqid= $result->query_description;
#               print "$qseqid\t$sseqid\n";

#initialize test


                        print "new qseqid: $test\n";
# Lets do some statistics
        my $resultcount=$searchio->result_count();
        my $hitcount=$result->num_hits;

        print "SearchIO parsed result from $ARGV[0]: $algorithm_type\t $algorithm_version\t $resultcount result(s) with $hitcount hit(s)\n";

open (INFILE, $ARGV[0]) or die $!;
my @blastxml=<INFILE>;
close INFILE;
I would be very glad if anyone could tell me where I went into the wrong direction. The script should work with any blastxml output.

If you have other suggestions solving the general problem, please tell me.

Thanks in advance!
uloeber is offline   Reply With Quote
Old 03-21-2018, 05:12 AM   #2
Location: Germany

Join Date: Mar 2013
Posts: 34
Default Update

Okay, I gave up on bioperl and wrapped my tabular output.
I got a multidimensional hash structure now.
How may I compare all qstart/qend values of %result{$consensus} and keep only the once which do not overlap or overlap more then a given tolerance?

Any suggestions?

#! /usr/bin/perl -w
use strict;

#input -outfmt "6 qseqid sseqid qlen qstart qend slen sstart send length pident evalue"
open (INPUT, $ARGV[0]) or die $!;
my @blastin=<INPUT>;
close INPUT;

my $lqseqid="";
my %result;
my $count=0;
my @primres;
my @primer=("LTR","pol");
my @virus=("KoRV");
my %subres;

foreach my $line(@blastin){
        chomp $line;
        my @array=split "\t",$line;
        my $qseqid=$array[0];
#       my $sseqid=$array[1];
#       my $qlen=$array[2];
#       my $qstart=$array[3];
#       my $qend=$array[4];
#       my $slen=$array[5];
#       my $sstart=$array[6];
#       my $send=$array[7];
#       my $length=$array[8];
#For later comparison, it's important, that the smaller position always is the start position
#print "Key: $_ and Value: $result{$_}\n" foreach(keys%result);
#access toplevel (qseqid)
foreach my $consensus(keys%result){
                my $hitnum=keys%{$result{$consensus}};
#               print "$consensus\t$hitnum\n";          #print the number of hits for each query
        foreach my $hit(keys %{$result{$consensus}}){

#check primers depending on lenth ratio (alignment length/subject length) and sseqid
                my $lengthratio=0.9;
                for (@primer){

                                if($result{$consensus}{$hit}{sseqid} =~ $_ ){           #compares sseqid to all elements of @primer match TRUE
#                                       push @primres, "$result{$consensus}{$hit}{qseqid}\t$result{$consensus}{$hit}{sseqid}\t$result{$consensus}{$hit}{qstart}\t$result{$consensus}{$hit}{qend}\t$result{$$
#                                       print "$result{$consensus}{$hit}";
# other subjects (not in primer list)                                 
                        else{                    #compares sseqid to all elements of @primer match FALSE
                                        if($result{$consensus}{$hit}{sseqid} =~ $_){
                                                # search non overlapping significant hits

                                                # search uncovered (non viral) regions

uloeber is offline   Reply With Quote

bioperl, blastxml, parser, perl, searchio

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 01:24 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO