SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Translating amino acid position to chromosomal location cristae8 Bioinformatics 0 09-08-2011 12:38 PM
amino acid change effect on protein structure using bioinformatics ketan_bnf Bioinformatics 0 07-20-2011 09:00 PM
how to find the genomic coordinate based on the amino acid change cliff Bioinformatics 0 07-07-2011 01:20 PM
BLAST result rdu Bioinformatics 0 04-28-2011 10:22 AM
IGV: display six-frame (amino-acid) translation of reference genome? d f Bioinformatics 2 03-07-2011 02:57 PM

Reply
 
Thread Tools
Old 02-23-2011, 07:56 PM   #1
aliealexandre
Junior Member
 
Location: Japan

Join Date: Feb 2011
Posts: 8
Default Get fasta amino-acid BLAST result

Hi everybody,

I come here with a beginner question. Sorry.

I'm learning Perl and I begin using BioPerl.

This is the job I want to do : get full amino-acid sequences obtained by TblastN on a nucleotide database.... I explain :

I "TblastN" a protein sequence of interest (let's say HumanProt) on a local EST database (let's say FishEST) . Then my blast result output is an alignement of the query protein with matching portions of FishEST sequences (the classical default BLAST output). Of course this output contains many hits, each containing one or several hsp.

In this output, each hsp is shown as a translation in the frame giving the best alignment with the query.


Score = 51.2 bits (121), Expect = 2e-08, Method: Compositional matrix adjust.
Identities = 25/81 (30%), Positives = 44/81 (54%), Gaps = 0/81 (0%)
Frame = +1

Query 513 QIEGPEGCNLFIYHLPQEFTDTDLASTFLPFGNVISAKVFIDKQTSLSKCFGFVSFDNPD 572

Sbjct 607 QEEATKFSRIYVSSIHGDLTDRDVKSVFEAFGHIVSIDLAPDNVPGKHRGWGYVEYDNPK 786

Query 573 SAQVAIKAMNGFQVGTKRLKV 593

Sbjct 787 SAADAIASXNLFDLGGQXLRV 849


At this point, I'm able to display only sequence IDs (-outfmt 6 sallac) and to extract them from my EST database (blastdbcmd). But what I extract are nucleotide sequences. If I want amino-acid sequences, I have to translate them in 6 frames and to extract the amino-acid sequence corresponding to my blast result.... I lost so much time

This is my dream : I would like the output of TblastN research to be the tranlation of the retreived sequences (I mean the entire sequences, not only the matching portion), in the frame corresponding to the best hsp of each hit.... in a fasta format of course

Do you think that it's possible ? is it necessary to use BioPerl to do that, or does it exist a Blast output format correponding to what I want (I checked in Blast User Manual but I haven't found).

Thank you for your help.

Alex
aliealexandre is offline   Reply With Quote
Old 02-25-2011, 06:01 AM   #2
anna_
Member
 
Location: germany

Join Date: Dec 2010
Posts: 15
Default

Hi alex,

I just wanted to say: It seems we have very similar problems and no idea.

I didn't get your problem clearly enough, but I have a general advice:

Check all the out format options you have in tblastn and blastdbcmd? You get them by typing tblastn -help and blastdbcmd -help.

Above that there are perl scripts in the BLAST book written by Korf, Yandell and Bedell. With these you can handle your output for example get hits with higher ninety percent identity.
anna_ is offline   Reply With Quote
Old 02-25-2011, 06:12 AM   #3
aparna
Member
 
Location: USA

Join Date: Feb 2009
Posts: 15
Default

Quote:
Originally Posted by aliealexandre View Post
Hi everybody,

I come here with a beginner question. Sorry.

I'm learning Perl and I begin using BioPerl.

This is the job I want to do : get full amino-acid sequences obtained by TblastN on a nucleotide database.... I explain :

I "TblastN" a protein sequence of interest (let's say HumanProt) on a local EST database (let's say FishEST) . Then my blast result output is an alignement of the query protein with matching portions of FishEST sequences (the classical default BLAST output). Of course this output contains many hits, each containing one or several hsp.

In this output, each hsp is shown as a translation in the frame giving the best alignment with the query.


Score = 51.2 bits (121), Expect = 2e-08, Method: Compositional matrix adjust.
Identities = 25/81 (30%), Positives = 44/81 (54%), Gaps = 0/81 (0%)
Frame = +1

Query 513 QIEGPEGCNLFIYHLPQEFTDTDLASTFLPFGNVISAKVFIDKQTSLSKCFGFVSFDNPD 572

Sbjct 607 QEEATKFSRIYVSSIHGDLTDRDVKSVFEAFGHIVSIDLAPDNVPGKHRGWGYVEYDNPK 786

Query 573 SAQVAIKAMNGFQVGTKRLKV 593

Sbjct 787 SAADAIASXNLFDLGGQXLRV 849


At this point, I'm able to display only sequence IDs (-outfmt 6 sallac) and to extract them from my EST database (blastdbcmd). But what I extract are nucleotide sequences. If I want amino-acid sequences, I have to translate them in 6 frames and to extract the amino-acid sequence corresponding to my blast result.... I lost so much time

This is my dream : I would like the output of TblastN research to be the tranlation of the retreived sequences (I mean the entire sequences, not only the matching portion), in the frame corresponding to the best hsp of each hit.... in a fasta format of course

Do you think that it's possible ? is it necessary to use BioPerl to do that, or does it exist a Blast output format correponding to what I want (I checked in Blast User Manual but I haven't found).

Thank you for your help.

Alex

what do you mean by "I would like the output of TblastN research"?
If I understand correctly,you are querying a nucleotide sequence againt nr with tblastn and would like to extract the corresponding amino acid sequence in fastq format?

BLAST any flavour gives you an xml out put format (-m option) which is what is exploited by most of the 'Bio' utilities.
You could write a quick perl script to read/write out all the xml tags that you want and write out in any fashion you want.
aparna is offline   Reply With Quote
Old 02-27-2011, 03:13 PM   #4
aliealexandre
Junior Member
 
Location: Japan

Join Date: Feb 2011
Posts: 8
Default

Aparna,

you have well understood. (I'm sorry about my horrible english, I'm a french frog;-))

I agree with you about xml format followed by sparsing with Perl. However if I do that (and I did) I can only retreive the portion of the sequence that match with my query. I mean the portion of the sequence that appears in the blast output.

What I want is a translation of the entire sequence. I didn't find any script to do that (please consider that I m a beginner not able to write a script by myself)

Thanks for your advices.

Alex
aliealexandre is offline   Reply With Quote
Old 02-28-2011, 05:51 AM   #5
aparna
Member
 
Location: USA

Join Date: Feb 2009
Posts: 15
Default

Hey Alex,
Thats all right.

BLAST only outputs the extent that it matches to your sequences.If you want the entire sequneces from data base,the only way that I know to work around is to use -I T option which gives you the 'gi' accessions of the db sequences.
You can use these accessions to fetch out complete sequences from nr.
You could use e-utils but its little complicated for a biginner you could copy paste the space delimited gi accessions within the search bar and get the sequences .

Thx
aparna is offline   Reply With Quote
Old 09-24-2012, 10:23 AM   #6
amango
Member
 
Location: New York

Join Date: Dec 2009
Posts: 17
Default

Alex, did you ever find a solution to this? If so would you mind sharing? I have the exact some problem, and I am also a beginner. I used tblastn to compare my own de novo assembled contigs to a protein database. I would like to extract the full translated sequences that come up as hits, ideally from the blast results themselves. It would be very great to avoid doing the translations myself, and then finding the correct reading frame.
amango is offline   Reply With Quote
Old 10-21-2013, 09:54 AM   #7
htetre
Member
 
Location: US

Join Date: Jul 2013
Posts: 28
Default

Hello,

I am trying to translate my DNA sequences to protein with fasta files that can contain as many as 7,000 sequences. Instead of translating to all 6 reading frames I would like to perform a tBLASTx and extract the best protein sequence with the best reading frame according to blast results. Does anyone know the best way of doing this? It sounds very much like what everyone in this thread has done or has tried to do.

Thanks so much for any help!
htetre is offline   Reply With Quote
Old 10-22-2013, 08:43 AM   #8
amarth
Member
 
Location: Mexico City

Join Date: Dec 2012
Posts: 14
Default

dude,

1. tblastx your query with the database sequences, and save the output in the standard format. (most important: The ID and frame)
2. retrieve such sequences by ID and make a fasta file.
3. translate these sequences to all frames
4. compare with the tblastx results.

depends on the amount of sequences, it may take a couple of minutes or hours :s

greets
amarth is offline   Reply With Quote
Old 03-25-2015, 03:09 AM   #9
kurban910
Member
 
Location: urumqi

Join Date: Jul 2014
Posts: 58
Default

hello everyone!
seems like the thread is several years old. so @ Alex, did u find the way to solve your problem? if u did would you mind share the process?!
i blasted my nucleotide fasta file to protein db fasta file and got the blast result. now i also want to extract blasted nucleotide sequences' corresponding translated amino acid sequences form blastx result file. if those extracted sequences in a fasta format that would be great.
kurban910 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:03 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO