Seqanswers Leaderboard Ad

**anna_** · 02-25-2011, 07:01 AM

Hi alex,

I just wanted to say: It seems we have very similar problems and no idea.

I didn't get your problem clearly enough, but I have a general advice:

Check all the out format options you have in tblastn and blastdbcmd? You get them by typing tblastn -help and blastdbcmd -help.

Above that there are perl scripts in the BLAST book written by Korf, Yandell and Bedell. With these you can handle your output for example get hits with higher ninety percent identity.

**aparna** · 02-25-2011, 07:12 AM

Originally posted by aliealexandre View Post

Hi everybody,

I come here with a beginner question. Sorry.

I'm learning Perl and I begin using BioPerl.

This is the job I want to do : get full amino-acid sequences obtained by TblastN on a nucleotide database.... I explain :

I "TblastN" a protein sequence of interest (let's say HumanProt) on a local EST database (let's say FishEST) . Then my blast result output is an alignement of the query protein with matching portions of FishEST sequences (the classical default BLAST output). Of course this output contains many hits, each containing one or several hsp.

In this output, each hsp is shown as a translation in the frame giving the best alignment with the query.

Score = 51.2 bits (121), Expect = 2e-08, Method: Compositional matrix adjust.
Identities = 25/81 (30%), Positives = 44/81 (54%), Gaps = 0/81 (0%)
Frame = +1

Query 513 QIEGPEGCNLFIYHLPQEFTDTDLASTFLPFGNVISAKVFIDKQTSLSKCFGFVSFDNPD 572

Sbjct 607 QEEATKFSRIYVSSIHGDLTDRDVKSVFEAFGHIVSIDLAPDNVPGKHRGWGYVEYDNPK 786

Query 573 SAQVAIKAMNGFQVGTKRLKV 593

Sbjct 787 SAADAIASXNLFDLGGQXLRV 849

At this point, I'm able to display only sequence IDs (-outfmt 6 sallac) and to extract them from my EST database (blastdbcmd). But what I extract are nucleotide sequences. If I want amino-acid sequences, I have to translate them in 6 frames and to extract the amino-acid sequence corresponding to my blast result.... I lost so much time

This is my dream : I would like the output of TblastN research to be the tranlation of the retreived sequences (I mean the entire sequences, not only the matching portion), in the frame corresponding to the best hsp of each hit.... in a fasta format of course

Do you think that it's possible ? is it necessary to use BioPerl to do that, or does it exist a Blast output format correponding to what I want (I checked in Blast User Manual but I haven't found).

Thank you for your help.

Alex

what do you mean by "I would like the output of TblastN research"?
If I understand correctly,you are querying a nucleotide sequence againt nr with tblastn and would like to extract the corresponding amino acid sequence in fastq format?

BLAST any flavour gives you an xml out put format (-m option) which is what is exploited by most of the 'Bio' utilities.
You could write a quick perl script to read/write out all the xml tags that you want and write out in any fashion you want.

**aliealexandre** · 02-27-2011, 04:13 PM

Aparna,

you have well understood. (I'm sorry about my horrible english, I'm a french frog;-))

I agree with you about xml format followed by sparsing with Perl. However if I do that (and I did) I can only retreive the portion of the sequence that match with my query. I mean the portion of the sequence that appears in the blast output.

What I want is a translation of the entire sequence. I didn't find any script to do that (please consider that I m a beginner not able to write a script by myself)

Thanks for your advices.

Alex

**aparna** · 02-28-2011, 06:51 AM

Hey Alex,
Thats all right.

BLAST only outputs the extent that it matches to your sequences.If you want the entire sequneces from data base,the only way that I know to work around is to use -I T option which gives you the 'gi' accessions of the db sequences.
You can use these accessions to fetch out complete sequences from nr.
You could use e-utils but its little complicated for a biginner you could copy paste the space delimited gi accessions within the search bar and get the sequences .

Thx

**amango** · 09-24-2012, 10:23 AM

Alex, did you ever find a solution to this? If so would you mind sharing? I have the exact some problem, and I am also a beginner. I used tblastn to compare my own de novo assembled contigs to a protein database. I would like to extract the full translated sequences that come up as hits, ideally from the blast results themselves. It would be very great to avoid doing the translations myself, and then finding the correct reading frame.

**htetre** · 10-21-2013, 09:54 AM

Hello,

I am trying to translate my DNA sequences to protein with fasta files that can contain as many as 7,000 sequences. Instead of translating to all 6 reading frames I would like to perform a tBLASTx and extract the best protein sequence with the best reading frame according to blast results. Does anyone know the best way of doing this? It sounds very much like what everyone in this thread has done or has tried to do.

Thanks so much for any help!

**amarth** · 10-22-2013, 08:43 AM

dude,

1. tblastx your query with the database sequences, and save the output in the standard format. (most important: The ID and frame)
2. retrieve such sequences by ID and make a fasta file.
3. translate these sequences to all frames
4. compare with the tblastx results.

depends on the amount of sequences, it may take a couple of minutes or hours :s

greets

**kurban910** · 03-25-2015, 03:09 AM

hello everyone!
seems like the thread is several years old. so @ Alex, did u find the way to solve your problem? if u did would you mind share the process?!
i blasted my nucleotide fasta file to protein db fasta file and got the blast result. now i also want to extract blasted nucleotide sequences' corresponding translated amino acid sequences form blastx result file. if those extracted sequences in a fasta format that would be great.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 52 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Get fasta amino-acid BLAST result

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News