Hi everybody,
I come here with a beginner question. Sorry.
I'm learning Perl and I begin using BioPerl.
This is the job I want to do : get full amino-acid sequences obtained by TblastN on a nucleotide database.... I explain :
I "TblastN" a protein sequence of interest (let's say HumanProt) on a local EST database (let's say FishEST) . Then my blast result output is an alignement of the query protein with matching portions of FishEST sequences (the classical default BLAST output). Of course this output contains many hits, each containing one or several hsp.
In this output, each hsp is shown as a translation in the frame giving the best alignment with the query.
Score = 51.2 bits (121), Expect = 2e-08, Method: Compositional matrix adjust.
Identities = 25/81 (30%), Positives = 44/81 (54%), Gaps = 0/81 (0%)
Frame = +1
Query 513 QIEGPEGCNLFIYHLPQEFTDTDLASTFLPFGNVISAKVFIDKQTSLSKCFGFVSFDNPD 572
Sbjct 607 QEEATKFSRIYVSSIHGDLTDRDVKSVFEAFGHIVSIDLAPDNVPGKHRGWGYVEYDNPK 786
Query 573 SAQVAIKAMNGFQVGTKRLKV 593
Sbjct 787 SAADAIASXNLFDLGGQXLRV 849
At this point, I'm able to display only sequence IDs (-outfmt 6 sallac) and to extract them from my EST database (blastdbcmd). But what I extract are nucleotide sequences. If I want amino-acid sequences, I have to translate them in 6 frames and to extract the amino-acid sequence corresponding to my blast result.... I lost so much time
This is my dream : I would like the output of TblastN research to be the tranlation of the retreived sequences (I mean the entire sequences, not only the matching portion), in the frame corresponding to the best hsp of each hit.... in a fasta format of course
Do you think that it's possible ? is it necessary to use BioPerl to do that, or does it exist a Blast output format correponding to what I want (I checked in Blast User Manual but I haven't found).
Thank you for your help.
Alex
I come here with a beginner question. Sorry.
I'm learning Perl and I begin using BioPerl.
This is the job I want to do : get full amino-acid sequences obtained by TblastN on a nucleotide database.... I explain :
I "TblastN" a protein sequence of interest (let's say HumanProt) on a local EST database (let's say FishEST) . Then my blast result output is an alignement of the query protein with matching portions of FishEST sequences (the classical default BLAST output). Of course this output contains many hits, each containing one or several hsp.
In this output, each hsp is shown as a translation in the frame giving the best alignment with the query.
Score = 51.2 bits (121), Expect = 2e-08, Method: Compositional matrix adjust.
Identities = 25/81 (30%), Positives = 44/81 (54%), Gaps = 0/81 (0%)
Frame = +1
Query 513 QIEGPEGCNLFIYHLPQEFTDTDLASTFLPFGNVISAKVFIDKQTSLSKCFGFVSFDNPD 572
Sbjct 607 QEEATKFSRIYVSSIHGDLTDRDVKSVFEAFGHIVSIDLAPDNVPGKHRGWGYVEYDNPK 786
Query 573 SAQVAIKAMNGFQVGTKRLKV 593
Sbjct 787 SAADAIASXNLFDLGGQXLRV 849
At this point, I'm able to display only sequence IDs (-outfmt 6 sallac) and to extract them from my EST database (blastdbcmd). But what I extract are nucleotide sequences. If I want amino-acid sequences, I have to translate them in 6 frames and to extract the amino-acid sequence corresponding to my blast result.... I lost so much time
This is my dream : I would like the output of TblastN research to be the tranlation of the retreived sequences (I mean the entire sequences, not only the matching portion), in the frame corresponding to the best hsp of each hit.... in a fasta format of course
Do you think that it's possible ? is it necessary to use BioPerl to do that, or does it exist a Blast output format correponding to what I want (I checked in Blast User Manual but I haven't found).
Thank you for your help.
Alex
Comment