Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • blast e-value sorting

    Hi,

    I don't know to sort blastn output compare to the score ( or the e-value ). I've got to sequence ( one with some queries, one with the subjects )

    I executed this command :

    blast2 -i query.fa -j subject.fa -o output.blast -p blastn -W 6

    Here's a little example of my output :

    Code:
                                                                    
    Sequences producing significant alignments:                   score(bits)   E-Value
    
    1_0 hsa-let-7a MIMAT0000062 Homo sapiens let-7a                        36   4e-09
    2_0 hsa-let-7a* MIMAT0004481 Homo sapiens let-7a*                      28   1e-06
    4_0 hsa-let-7b MIMAT0000063 Homo sapiens let-7b                        24   2e-05
    5_0 hsa-let-7b* MIMAT0004482 Homo sapiens let-7b*                      20   3e-04
    6_0 hsa-let-7c MIMAT0000064 Homo sapiens let-7c                        28   1e-06
    8_0 hsa-let-7d MIMAT0000065 Homo sapiens let-7d                        26   4e-06
    9_0 hsa-let-7d* MIMAT0004484 Homo sapiens let-7d*                      12   0.064
    10_0 hsa-let-7e MIMAT0000066 Homo sapiens let-7e                       28   1e-06
    11_0 hsa-let-7e* MIMAT0004485 Homo sapiens let-7e*                     12   0.064
    [COLOR="Red"]12_0 hsa-let-7f MIMAT0000067 Homo sapiens let-7f                       44   2e-11[/COLOR]
    13_0 hsa-let-7f-1* MIMAT0004486 Homo sapiens let-7f-1*                 24   2e-05
    14_0 hsa-let-7f-2* MIMAT0004487 Homo sapiens let-7f-2*                 20   3e-04
    23_0 hsa-miR-19a* MIMAT0004490 Homo sapiens miR-19a*                   12   0.064
    24_0 hsa-miR-19a MIMAT0000073 Homo sapiens miR-19a                     12   0.067
    I highlight in red the best alignment. What's the parameter for sorting the matches compare to the e-value or the score ?

    Thanks a lot,

    Nicolas

  • #2
    Could you provide a sample of your two .fa files and the version of BLAST that you are using?

    Is it possible that those are the best hits from different queries. I'm very not familiar with this style of BLAST output. Have you considered using the tabular output format? If you do not need the actual alignments but only the alignment info (including score) you can specify tabular output using the '-m 8' option.

    Comment


    • #3
      My version of blast is 2.2.21

      here's a sample of the two .fa I use.

      the query :

      >04_x240274
      TGAGGTAGTAGATTGTATAGTTT


      the subject (it's the homo sapiens mature mirna from mirbase 16 ) :

      >hsa-let-7a MIMAT0000062 Homo sapiens let-7a
      TGAGGTAGTAGGTTGTATAGTT
      >hsa-let-7a* MIMAT0004481 Homo sapiens let-7a*
      CTATACAATCTACTGTCTTTC
      >hsa-let-7a-2* MIMAT0010195 Homo sapiens let-7a-2*
      CTGTACAGCCTCCTAGCTTTCC
      >hsa-let-7b MIMAT0000063 Homo sapiens let-7b
      TGAGGTAGTAGGTTGTGTGGTT
      >hsa-let-7b* MIMAT0004482 Homo sapiens let-7b*
      CTATACAACCTACTGCCTTCCC
      >hsa-let-7c MIMAT0000064 Homo sapiens let-7c
      TGAGGTAGTAGGTTGTATGGTT
      >hsa-let-7c* MIMAT0004483 Homo sapiens let-7c*
      TAGAGTTACACCCTGGGAGTTA
      >hsa-let-7d MIMAT0000065 Homo sapiens let-7d
      AGAGGTAGTAGGTTGCATAGTT
      >hsa-let-7d* MIMAT0004484 Homo sapiens let-7d*
      CTATACGACCTGCTGCCTTTCT
      >hsa-let-7e MIMAT0000066 Homo sapiens let-7e
      TGAGGTAGGAGGTTGTATAGTT
      >hsa-let-7e* MIMAT0004485 Homo sapiens let-7e*
      CTATACGGCCTCCTAGCTTTCC
      >hsa-let-7f MIMAT0000067 Homo sapiens let-7f
      TGAGGTAGTAGATTGTATAGTT
      >hsa-let-7f-1* MIMAT0004486 Homo sapiens let-7f-1*
      CTATACAATCTATTGCCTTCCC
      >hsa-let-7f-2* MIMAT0004487 Homo sapiens let-7f-2*
      CTATACAGTCTACTGTCTTTCC

      Comment


      • #4
        Your output is definitely strange... Try something like the following using the binaries within your BLAST installation:

        Create a search database using formatdb
        formatdb -i subject.fa -t subject -p F -o F -n subject

        Now use blastall in a similar command to the one you posted using a word size of 6 and standard output option
        blastall -i query.fa -d subject -p blastn -W 6 -o alignments.txt

        Now try again using the tabular output option
        blastall -m 8 -i query.fa -d subject -p blastn -W 6 -o alignments.tabular.txt

        Sequences producing significant alignments: (bits) Value
        hsa-let-7f MIMAT0000067 Homo sapiens let-7f 44 2e-10
        hsa-let-7a MIMAT0000062 Homo sapiens let-7a 36 5e-08
        hsa-let-7e MIMAT0000066 Homo sapiens let-7e 28 1e-05
        hsa-let-7c MIMAT0000064 Homo sapiens let-7c 28 1e-05
        hsa-let-7a* MIMAT0004481 Homo sapiens let-7a* 28 1e-05
        hsa-let-7d MIMAT0000065 Homo sapiens let-7d 26 5e-05
        hsa-let-7f-1* MIMAT0004486 Homo sapiens let-7f-1* 24 2e-04
        hsa-let-7b MIMAT0000063 Homo sapiens let-7b 24 2e-04
        hsa-let-7f-2* MIMAT0004487 Homo sapiens let-7f-2* 20 0.003
        hsa-let-7e* MIMAT0004485 Homo sapiens let-7e* 12 0.73
        hsa-let-7d* MIMAT0004484 Homo sapiens let-7d* 12 0.73

        In tabular format it looks like this:
        04_x240274 hsa-let-7f 100.00 22 0 0 1 22 1 22 2e-10 44.1
        04_x240274 hsa-let-7a 95.45 22 1 0 1 22 1 22 5e-08 36.2
        04_x240274 hsa-let-7e 90.91 22 2 0 1 22 1 22 1e-05 28.2
        04_x240274 hsa-let-7c 94.44 18 1 0 1 18 1 18 1e-05 28.2
        04_x240274 hsa-let-7a* 100.00 14 0 0 7 20 14 1 1e-05 28.2
        04_x240274 hsa-let-7d 90.48 21 2 0 2 22 2 22 5e-05 26.3
        04_x240274 hsa-let-7f-1* 100.00 12 0 0 9 20 12 1 2e-04 24.3
        04_x240274 hsa-let-7b 93.75 16 1 0 1 16 1 16 2e-04 24.3
        04_x240274 hsa-let-7b 92.86 14 1 0 7 20 36 23 0.003 20.3
        04_x240274 hsa-let-7f-2* 92.86 14 1 0 7 20 14 1 0.003 20.3
        04_x240274 hsa-let-7e* 100.00 6 0 0 15 20 6 1 0.73 12.4
        04_x240274 hsa-let-7d* 100.00 6 0 0 15 20 6 1 0.73 12.4

        As you can see, the sorting does appear to be according to E-value by default...
        Attached Files

        Comment


        • #5
          works fine now !

          Thanks a lot !

          Nicolas

          Comment


          • #6
            other question

            I want to thresold on the e-value to discard the bad alignment. What is the best value for the thresold ? 0.1 , 1 ?

            Thanks

            Comment


            • #7
              How to mark the order of the output by e-value or score?

              I have a question following the thread.
              When the output is in tabular format (-m 8), is there a way to mark the score by order with rank?
              Say the original output:
              Code:
              hsa-let-7f MIMAT0000067 Homo sapiens let-7f 44 2e-10
              hsa-let-7f MIMAT0000062 Homo sapiens let-7a 36 5e-08
              hsa-let-7f MIMAT0000066 Homo sapiens let-7e 28 1e-05
              But I want it like this with the extra column (look at the second last):
              Code:
              hsa-let-7f MIMAT0000067 Homo sapiens let-7f 44 1 2e-10
              hsa-let-7f MIMAT0000062 Homo sapiens let-7a 36 2 5e-08
              hsa-let-7f MIMAT0000066 Homo sapiens let-7e 28 3 1e-05
              The reason I want this is to pick up the second one only (instead of the first one, sounds silly?!) which could be done by sorting. I tried the b option which seemed not working. Not sure the parameter for this format of output, if any. Thanks a lot!

              Yifang
              Last edited by yifangt; 03-08-2011, 04:35 PM.

              Comment


              • #8
                Hi,
                You want the second best blast hit ? If so, no need to count the rank, just use a shell command :
                Code:
                head -n 2 blast_output.m8 | tail -n 1
                (in fact, change 2 by the position of the line you want)
                Will give you the second line in the blast_output.m8 file
                Code:
                hsa-let-7f MIMAT0000062 Homo sapiens let-7a 36 5e-08
                Hope this could help.

                Comment


                • #9
                  How to mark the order of the output by e-value or score?

                  Originally posted by nicolallias View Post
                  Hi,
                  You want the second best blast hit ? If so, no need to count the rank, just use a shell command :
                  Code:
                  head -n 2 blast_output.m8 | tail -n 1
                  (in fact, change 2 by the position of the line you want)
                  Will give you the second line in the blast_output.m8 file
                  Code:
                  hsa-let-7f MIMAT0000062 Homo sapiens let-7a 36 5e-08
                  Hope this could help.
                  Thanks,
                  I should have made myself clearer, actually my question is more general as for several thousands queries, each of which may give many hits, than the specific 3 rows example.
                  What I meant is the second best hit (or the first hit, or third hit, etc) of each query in the output file. That is why I need the rank of the e-value or bits score of "each query" so that I can sort them later. There are parameters to ask BLAST only to output the first 3 hits (-v3 -b3), but without an extra rank column that I need. It seems there is a option in the blastall command line. Just could not figure it out myself.
                  Iam aware of the output can be parsed later for this purpose, but the option (if any) could make the job much easier.

                  Comment


                  • #10
                    Sorry, now I get it...
                    Originally posted by yifangt View Post
                    Just could not figure it out myself.
                    Me neither
                    Originally posted by yifangt View Post
                    Iam aware of the output can be parsed later for this purpose, but the option (if any) could make the job much easier.
                    Anyway, you could pipe with a
                    Code:
                    awk '{print $1" "$2" "$3" "$4" "$5" "$6" "NR" "$7}'

                    Comment


                    • #11
                      normal untested disclaimers apply but the pattern is adaptable
                      this is assuming the fields are grouped by first (queries) and ordered by whatever field (E value) you choose. and do not mind the rank number out front

                      awk 'BEGIN{q=$1;c=0}'{if(q != $1){q=$1;c=0};c++;print c $0}'

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      18 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      22 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      17 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      49 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X