SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Get fasta amino-acid BLAST result (http://seqanswers.com/forums/showthread.php?t=9654)

aliealexandre 02-23-2011 07:56 PM

Get fasta amino-acid BLAST result
 
Hi everybody,

I come here with a beginner question. Sorry.

I'm learning Perl and I begin using BioPerl.

This is the job I want to do : get full amino-acid sequences obtained by TblastN on a nucleotide database.... I explain :

I "TblastN" a protein sequence of interest (let's say HumanProt) on a local EST database (let's say FishEST) . Then my blast result output is an alignement of the query protein with matching portions of FishEST sequences (the classical default BLAST output). Of course this output contains many hits, each containing one or several hsp.

In this output, each hsp is shown as a translation in the frame giving the best alignment with the query.


Score = 51.2 bits (121), Expect = 2e-08, Method: Compositional matrix adjust.
Identities = 25/81 (30%), Positives = 44/81 (54%), Gaps = 0/81 (0%)
Frame = +1

Query 513 QIEGPEGCNLFIYHLPQEFTDTDLASTFLPFGNVISAKVFIDKQTSLSKCFGFVSFDNPD 572

Sbjct 607 QEEATKFSRIYVSSIHGDLTDRDVKSVFEAFGHIVSIDLAPDNVPGKHRGWGYVEYDNPK 786

Query 573 SAQVAIKAMNGFQVGTKRLKV 593

Sbjct 787 SAADAIASXNLFDLGGQXLRV 849


At this point, I'm able to display only sequence IDs (-outfmt 6 sallac) and to extract them from my EST database (blastdbcmd). But what I extract are nucleotide sequences. If I want amino-acid sequences, I have to translate them in 6 frames and to extract the amino-acid sequence corresponding to my blast result.... I lost so much time :(

This is my dream : I would like the output of TblastN research to be the tranlation of the retreived sequences (I mean the entire sequences, not only the matching portion), in the frame corresponding to the best hsp of each hit.... in a fasta format of course ;)

Do you think that it's possible ? is it necessary to use BioPerl to do that, or does it exist a Blast output format correponding to what I want (I checked in Blast User Manual but I haven't found).

Thank you for your help.

Alex

anna_ 02-25-2011 06:01 AM

Hi alex,

I just wanted to say: It seems we have very similar problems and no idea.

I didn't get your problem clearly enough, but I have a general advice:

Check all the out format options you have in tblastn and blastdbcmd? You get them by typing tblastn -help and blastdbcmd -help.

Above that there are perl scripts in the BLAST book written by Korf, Yandell and Bedell. With these you can handle your output for example get hits with higher ninety percent identity.

aparna 02-25-2011 06:12 AM

Quote:

Originally Posted by aliealexandre (Post 35735)
Hi everybody,

I come here with a beginner question. Sorry.

I'm learning Perl and I begin using BioPerl.

This is the job I want to do : get full amino-acid sequences obtained by TblastN on a nucleotide database.... I explain :

I "TblastN" a protein sequence of interest (let's say HumanProt) on a local EST database (let's say FishEST) . Then my blast result output is an alignement of the query protein with matching portions of FishEST sequences (the classical default BLAST output). Of course this output contains many hits, each containing one or several hsp.

In this output, each hsp is shown as a translation in the frame giving the best alignment with the query.


Score = 51.2 bits (121), Expect = 2e-08, Method: Compositional matrix adjust.
Identities = 25/81 (30%), Positives = 44/81 (54%), Gaps = 0/81 (0%)
Frame = +1

Query 513 QIEGPEGCNLFIYHLPQEFTDTDLASTFLPFGNVISAKVFIDKQTSLSKCFGFVSFDNPD 572

Sbjct 607 QEEATKFSRIYVSSIHGDLTDRDVKSVFEAFGHIVSIDLAPDNVPGKHRGWGYVEYDNPK 786

Query 573 SAQVAIKAMNGFQVGTKRLKV 593

Sbjct 787 SAADAIASXNLFDLGGQXLRV 849


At this point, I'm able to display only sequence IDs (-outfmt 6 sallac) and to extract them from my EST database (blastdbcmd). But what I extract are nucleotide sequences. If I want amino-acid sequences, I have to translate them in 6 frames and to extract the amino-acid sequence corresponding to my blast result.... I lost so much time :(

This is my dream : I would like the output of TblastN research to be the tranlation of the retreived sequences (I mean the entire sequences, not only the matching portion), in the frame corresponding to the best hsp of each hit.... in a fasta format of course ;)

Do you think that it's possible ? is it necessary to use BioPerl to do that, or does it exist a Blast output format correponding to what I want (I checked in Blast User Manual but I haven't found).

Thank you for your help.

Alex


what do you mean by "I would like the output of TblastN research"?
If I understand correctly,you are querying a nucleotide sequence againt nr with tblastn and would like to extract the corresponding amino acid sequence in fastq format?

BLAST any flavour gives you an xml out put format (-m option) which is what is exploited by most of the 'Bio' utilities.
You could write a quick perl script to read/write out all the xml tags that you want and write out in any fashion you want.

aliealexandre 02-27-2011 03:13 PM

Aparna,

you have well understood. (I'm sorry about my horrible english, I'm a french frog;-))

I agree with you about xml format followed by sparsing with Perl. However if I do that (and I did) I can only retreive the portion of the sequence that match with my query. I mean the portion of the sequence that appears in the blast output.

What I want is a translation of the entire sequence. I didn't find any script to do that (please consider that I m a beginner not able to write a script by myself)

Thanks for your advices.

Alex

aparna 02-28-2011 05:51 AM

Hey Alex,
Thats all right.

BLAST only outputs the extent that it matches to your sequences.If you want the entire sequneces from data base,the only way that I know to work around is to use -I T option which gives you the 'gi' accessions of the db sequences.
You can use these accessions to fetch out complete sequences from nr.
You could use e-utils but its little complicated for a biginner you could copy paste the space delimited gi accessions within the search bar and get the sequences .

Thx

amango 09-24-2012 10:23 AM

Alex, did you ever find a solution to this? If so would you mind sharing? I have the exact some problem, and I am also a beginner. I used tblastn to compare my own de novo assembled contigs to a protein database. I would like to extract the full translated sequences that come up as hits, ideally from the blast results themselves. It would be very great to avoid doing the translations myself, and then finding the correct reading frame.

htetre 10-21-2013 09:54 AM

Hello,

I am trying to translate my DNA sequences to protein with fasta files that can contain as many as 7,000 sequences. Instead of translating to all 6 reading frames I would like to perform a tBLASTx and extract the best protein sequence with the best reading frame according to blast results. Does anyone know the best way of doing this? It sounds very much like what everyone in this thread has done or has tried to do.

Thanks so much for any help!

amarth 10-22-2013 08:43 AM

dude,

1. tblastx your query with the database sequences, and save the output in the standard format. (most important: The ID and frame)
2. retrieve such sequences by ID and make a fasta file.
3. translate these sequences to all frames
4. compare with the tblastx results.

depends on the amount of sequences, it may take a couple of minutes or hours :s

greets

kurban910 03-25-2015 03:09 AM

hello everyone!
seems like the thread is several years old. so @ Alex, did u find the way to solve your problem? if u did would you mind share the process?!
i blasted my nucleotide fasta file to protein db fasta file and got the blast result. now i also want to extract blasted nucleotide sequences' corresponding translated amino acid sequences form blastx result file. if those extracted sequences in a fasta format that would be great.


All times are GMT -8. The time now is 01:50 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.