Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Get fasta amino-acid BLAST result

    Hi everybody,

    I come here with a beginner question. Sorry.

    I'm learning Perl and I begin using BioPerl.

    This is the job I want to do : get full amino-acid sequences obtained by TblastN on a nucleotide database.... I explain :

    I "TblastN" a protein sequence of interest (let's say HumanProt) on a local EST database (let's say FishEST) . Then my blast result output is an alignement of the query protein with matching portions of FishEST sequences (the classical default BLAST output). Of course this output contains many hits, each containing one or several hsp.

    In this output, each hsp is shown as a translation in the frame giving the best alignment with the query.


    Score = 51.2 bits (121), Expect = 2e-08, Method: Compositional matrix adjust.
    Identities = 25/81 (30%), Positives = 44/81 (54%), Gaps = 0/81 (0%)
    Frame = +1

    Query 513 QIEGPEGCNLFIYHLPQEFTDTDLASTFLPFGNVISAKVFIDKQTSLSKCFGFVSFDNPD 572

    Sbjct 607 QEEATKFSRIYVSSIHGDLTDRDVKSVFEAFGHIVSIDLAPDNVPGKHRGWGYVEYDNPK 786

    Query 573 SAQVAIKAMNGFQVGTKRLKV 593

    Sbjct 787 SAADAIASXNLFDLGGQXLRV 849


    At this point, I'm able to display only sequence IDs (-outfmt 6 sallac) and to extract them from my EST database (blastdbcmd). But what I extract are nucleotide sequences. If I want amino-acid sequences, I have to translate them in 6 frames and to extract the amino-acid sequence corresponding to my blast result.... I lost so much time

    This is my dream : I would like the output of TblastN research to be the tranlation of the retreived sequences (I mean the entire sequences, not only the matching portion), in the frame corresponding to the best hsp of each hit.... in a fasta format of course

    Do you think that it's possible ? is it necessary to use BioPerl to do that, or does it exist a Blast output format correponding to what I want (I checked in Blast User Manual but I haven't found).

    Thank you for your help.

    Alex

  • #2
    Hi alex,

    I just wanted to say: It seems we have very similar problems and no idea.

    I didn't get your problem clearly enough, but I have a general advice:

    Check all the out format options you have in tblastn and blastdbcmd? You get them by typing tblastn -help and blastdbcmd -help.

    Above that there are perl scripts in the BLAST book written by Korf, Yandell and Bedell. With these you can handle your output for example get hits with higher ninety percent identity.

    Comment


    • #3
      Originally posted by aliealexandre View Post
      Hi everybody,

      I come here with a beginner question. Sorry.

      I'm learning Perl and I begin using BioPerl.

      This is the job I want to do : get full amino-acid sequences obtained by TblastN on a nucleotide database.... I explain :

      I "TblastN" a protein sequence of interest (let's say HumanProt) on a local EST database (let's say FishEST) . Then my blast result output is an alignement of the query protein with matching portions of FishEST sequences (the classical default BLAST output). Of course this output contains many hits, each containing one or several hsp.

      In this output, each hsp is shown as a translation in the frame giving the best alignment with the query.


      Score = 51.2 bits (121), Expect = 2e-08, Method: Compositional matrix adjust.
      Identities = 25/81 (30%), Positives = 44/81 (54%), Gaps = 0/81 (0%)
      Frame = +1

      Query 513 QIEGPEGCNLFIYHLPQEFTDTDLASTFLPFGNVISAKVFIDKQTSLSKCFGFVSFDNPD 572

      Sbjct 607 QEEATKFSRIYVSSIHGDLTDRDVKSVFEAFGHIVSIDLAPDNVPGKHRGWGYVEYDNPK 786

      Query 573 SAQVAIKAMNGFQVGTKRLKV 593

      Sbjct 787 SAADAIASXNLFDLGGQXLRV 849


      At this point, I'm able to display only sequence IDs (-outfmt 6 sallac) and to extract them from my EST database (blastdbcmd). But what I extract are nucleotide sequences. If I want amino-acid sequences, I have to translate them in 6 frames and to extract the amino-acid sequence corresponding to my blast result.... I lost so much time

      This is my dream : I would like the output of TblastN research to be the tranlation of the retreived sequences (I mean the entire sequences, not only the matching portion), in the frame corresponding to the best hsp of each hit.... in a fasta format of course

      Do you think that it's possible ? is it necessary to use BioPerl to do that, or does it exist a Blast output format correponding to what I want (I checked in Blast User Manual but I haven't found).

      Thank you for your help.

      Alex

      what do you mean by "I would like the output of TblastN research"?
      If I understand correctly,you are querying a nucleotide sequence againt nr with tblastn and would like to extract the corresponding amino acid sequence in fastq format?

      BLAST any flavour gives you an xml out put format (-m option) which is what is exploited by most of the 'Bio' utilities.
      You could write a quick perl script to read/write out all the xml tags that you want and write out in any fashion you want.

      Comment


      • #4
        Aparna,

        you have well understood. (I'm sorry about my horrible english, I'm a french frog;-))

        I agree with you about xml format followed by sparsing with Perl. However if I do that (and I did) I can only retreive the portion of the sequence that match with my query. I mean the portion of the sequence that appears in the blast output.

        What I want is a translation of the entire sequence. I didn't find any script to do that (please consider that I m a beginner not able to write a script by myself)

        Thanks for your advices.

        Alex

        Comment


        • #5
          Hey Alex,
          Thats all right.

          BLAST only outputs the extent that it matches to your sequences.If you want the entire sequneces from data base,the only way that I know to work around is to use -I T option which gives you the 'gi' accessions of the db sequences.
          You can use these accessions to fetch out complete sequences from nr.
          You could use e-utils but its little complicated for a biginner you could copy paste the space delimited gi accessions within the search bar and get the sequences .

          Thx

          Comment


          • #6
            Alex, did you ever find a solution to this? If so would you mind sharing? I have the exact some problem, and I am also a beginner. I used tblastn to compare my own de novo assembled contigs to a protein database. I would like to extract the full translated sequences that come up as hits, ideally from the blast results themselves. It would be very great to avoid doing the translations myself, and then finding the correct reading frame.

            Comment


            • #7
              Hello,

              I am trying to translate my DNA sequences to protein with fasta files that can contain as many as 7,000 sequences. Instead of translating to all 6 reading frames I would like to perform a tBLASTx and extract the best protein sequence with the best reading frame according to blast results. Does anyone know the best way of doing this? It sounds very much like what everyone in this thread has done or has tried to do.

              Thanks so much for any help!

              Comment


              • #8
                dude,

                1. tblastx your query with the database sequences, and save the output in the standard format. (most important: The ID and frame)
                2. retrieve such sequences by ID and make a fasta file.
                3. translate these sequences to all frames
                4. compare with the tblastx results.

                depends on the amount of sequences, it may take a couple of minutes or hours :s

                greets

                Comment


                • #9
                  hello everyone!
                  seems like the thread is several years old. so @ Alex, did u find the way to solve your problem? if u did would you mind share the process?!
                  i blasted my nucleotide fasta file to protein db fasta file and got the blast result. now i also want to extract blasted nucleotide sequences' corresponding translated amino acid sequences form blastx result file. if those extracted sequences in a fasta format that would be great.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM
                  • seqadmin
                    The Impact of AI in Genomic Medicine
                    by seqadmin



                    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                    02-26-2024, 02:07 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-14-2024, 06:13 AM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-08-2024, 08:03 AM
                  0 responses
                  71 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-07-2024, 08:13 AM
                  0 responses
                  80 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-06-2024, 09:51 AM
                  0 responses
                  68 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X