Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • blast e-value sorting

    Hi,

    I don't know to sort blastn output compare to the score ( or the e-value ). I've got to sequence ( one with some queries, one with the subjects )

    I executed this command :

    blast2 -i query.fa -j subject.fa -o output.blast -p blastn -W 6

    Here's a little example of my output :

    Code:
                                                                    
    Sequences producing significant alignments:                   score(bits)   E-Value
    
    1_0 hsa-let-7a MIMAT0000062 Homo sapiens let-7a                        36   4e-09
    2_0 hsa-let-7a* MIMAT0004481 Homo sapiens let-7a*                      28   1e-06
    4_0 hsa-let-7b MIMAT0000063 Homo sapiens let-7b                        24   2e-05
    5_0 hsa-let-7b* MIMAT0004482 Homo sapiens let-7b*                      20   3e-04
    6_0 hsa-let-7c MIMAT0000064 Homo sapiens let-7c                        28   1e-06
    8_0 hsa-let-7d MIMAT0000065 Homo sapiens let-7d                        26   4e-06
    9_0 hsa-let-7d* MIMAT0004484 Homo sapiens let-7d*                      12   0.064
    10_0 hsa-let-7e MIMAT0000066 Homo sapiens let-7e                       28   1e-06
    11_0 hsa-let-7e* MIMAT0004485 Homo sapiens let-7e*                     12   0.064
    [COLOR="Red"]12_0 hsa-let-7f MIMAT0000067 Homo sapiens let-7f                       44   2e-11[/COLOR]
    13_0 hsa-let-7f-1* MIMAT0004486 Homo sapiens let-7f-1*                 24   2e-05
    14_0 hsa-let-7f-2* MIMAT0004487 Homo sapiens let-7f-2*                 20   3e-04
    23_0 hsa-miR-19a* MIMAT0004490 Homo sapiens miR-19a*                   12   0.064
    24_0 hsa-miR-19a MIMAT0000073 Homo sapiens miR-19a                     12   0.067
    I highlight in red the best alignment. What's the parameter for sorting the matches compare to the e-value or the score ?

    Thanks a lot,

    Nicolas

  • #2
    Could you provide a sample of your two .fa files and the version of BLAST that you are using?

    Is it possible that those are the best hits from different queries. I'm very not familiar with this style of BLAST output. Have you considered using the tabular output format? If you do not need the actual alignments but only the alignment info (including score) you can specify tabular output using the '-m 8' option.

    Comment


    • #3
      My version of blast is 2.2.21

      here's a sample of the two .fa I use.

      the query :

      >04_x240274
      TGAGGTAGTAGATTGTATAGTTT


      the subject (it's the homo sapiens mature mirna from mirbase 16 ) :

      >hsa-let-7a MIMAT0000062 Homo sapiens let-7a
      TGAGGTAGTAGGTTGTATAGTT
      >hsa-let-7a* MIMAT0004481 Homo sapiens let-7a*
      CTATACAATCTACTGTCTTTC
      >hsa-let-7a-2* MIMAT0010195 Homo sapiens let-7a-2*
      CTGTACAGCCTCCTAGCTTTCC
      >hsa-let-7b MIMAT0000063 Homo sapiens let-7b
      TGAGGTAGTAGGTTGTGTGGTT
      >hsa-let-7b* MIMAT0004482 Homo sapiens let-7b*
      CTATACAACCTACTGCCTTCCC
      >hsa-let-7c MIMAT0000064 Homo sapiens let-7c
      TGAGGTAGTAGGTTGTATGGTT
      >hsa-let-7c* MIMAT0004483 Homo sapiens let-7c*
      TAGAGTTACACCCTGGGAGTTA
      >hsa-let-7d MIMAT0000065 Homo sapiens let-7d
      AGAGGTAGTAGGTTGCATAGTT
      >hsa-let-7d* MIMAT0004484 Homo sapiens let-7d*
      CTATACGACCTGCTGCCTTTCT
      >hsa-let-7e MIMAT0000066 Homo sapiens let-7e
      TGAGGTAGGAGGTTGTATAGTT
      >hsa-let-7e* MIMAT0004485 Homo sapiens let-7e*
      CTATACGGCCTCCTAGCTTTCC
      >hsa-let-7f MIMAT0000067 Homo sapiens let-7f
      TGAGGTAGTAGATTGTATAGTT
      >hsa-let-7f-1* MIMAT0004486 Homo sapiens let-7f-1*
      CTATACAATCTATTGCCTTCCC
      >hsa-let-7f-2* MIMAT0004487 Homo sapiens let-7f-2*
      CTATACAGTCTACTGTCTTTCC

      Comment


      • #4
        Your output is definitely strange... Try something like the following using the binaries within your BLAST installation:

        Create a search database using formatdb
        formatdb -i subject.fa -t subject -p F -o F -n subject

        Now use blastall in a similar command to the one you posted using a word size of 6 and standard output option
        blastall -i query.fa -d subject -p blastn -W 6 -o alignments.txt

        Now try again using the tabular output option
        blastall -m 8 -i query.fa -d subject -p blastn -W 6 -o alignments.tabular.txt

        Sequences producing significant alignments: (bits) Value
        hsa-let-7f MIMAT0000067 Homo sapiens let-7f 44 2e-10
        hsa-let-7a MIMAT0000062 Homo sapiens let-7a 36 5e-08
        hsa-let-7e MIMAT0000066 Homo sapiens let-7e 28 1e-05
        hsa-let-7c MIMAT0000064 Homo sapiens let-7c 28 1e-05
        hsa-let-7a* MIMAT0004481 Homo sapiens let-7a* 28 1e-05
        hsa-let-7d MIMAT0000065 Homo sapiens let-7d 26 5e-05
        hsa-let-7f-1* MIMAT0004486 Homo sapiens let-7f-1* 24 2e-04
        hsa-let-7b MIMAT0000063 Homo sapiens let-7b 24 2e-04
        hsa-let-7f-2* MIMAT0004487 Homo sapiens let-7f-2* 20 0.003
        hsa-let-7e* MIMAT0004485 Homo sapiens let-7e* 12 0.73
        hsa-let-7d* MIMAT0004484 Homo sapiens let-7d* 12 0.73

        In tabular format it looks like this:
        04_x240274 hsa-let-7f 100.00 22 0 0 1 22 1 22 2e-10 44.1
        04_x240274 hsa-let-7a 95.45 22 1 0 1 22 1 22 5e-08 36.2
        04_x240274 hsa-let-7e 90.91 22 2 0 1 22 1 22 1e-05 28.2
        04_x240274 hsa-let-7c 94.44 18 1 0 1 18 1 18 1e-05 28.2
        04_x240274 hsa-let-7a* 100.00 14 0 0 7 20 14 1 1e-05 28.2
        04_x240274 hsa-let-7d 90.48 21 2 0 2 22 2 22 5e-05 26.3
        04_x240274 hsa-let-7f-1* 100.00 12 0 0 9 20 12 1 2e-04 24.3
        04_x240274 hsa-let-7b 93.75 16 1 0 1 16 1 16 2e-04 24.3
        04_x240274 hsa-let-7b 92.86 14 1 0 7 20 36 23 0.003 20.3
        04_x240274 hsa-let-7f-2* 92.86 14 1 0 7 20 14 1 0.003 20.3
        04_x240274 hsa-let-7e* 100.00 6 0 0 15 20 6 1 0.73 12.4
        04_x240274 hsa-let-7d* 100.00 6 0 0 15 20 6 1 0.73 12.4

        As you can see, the sorting does appear to be according to E-value by default...
        Attached Files

        Comment


        • #5
          works fine now !

          Thanks a lot !

          Nicolas

          Comment


          • #6
            other question

            I want to thresold on the e-value to discard the bad alignment. What is the best value for the thresold ? 0.1 , 1 ?

            Thanks

            Comment


            • #7
              How to mark the order of the output by e-value or score?

              I have a question following the thread.
              When the output is in tabular format (-m 8), is there a way to mark the score by order with rank?
              Say the original output:
              Code:
              hsa-let-7f MIMAT0000067 Homo sapiens let-7f 44 2e-10
              hsa-let-7f MIMAT0000062 Homo sapiens let-7a 36 5e-08
              hsa-let-7f MIMAT0000066 Homo sapiens let-7e 28 1e-05
              But I want it like this with the extra column (look at the second last):
              Code:
              hsa-let-7f MIMAT0000067 Homo sapiens let-7f 44 1 2e-10
              hsa-let-7f MIMAT0000062 Homo sapiens let-7a 36 2 5e-08
              hsa-let-7f MIMAT0000066 Homo sapiens let-7e 28 3 1e-05
              The reason I want this is to pick up the second one only (instead of the first one, sounds silly?!) which could be done by sorting. I tried the b option which seemed not working. Not sure the parameter for this format of output, if any. Thanks a lot!

              Yifang
              Last edited by yifangt; 03-08-2011, 04:35 PM.

              Comment


              • #8
                Hi,
                You want the second best blast hit ? If so, no need to count the rank, just use a shell command :
                Code:
                head -n 2 blast_output.m8 | tail -n 1
                (in fact, change 2 by the position of the line you want)
                Will give you the second line in the blast_output.m8 file
                Code:
                hsa-let-7f MIMAT0000062 Homo sapiens let-7a 36 5e-08
                Hope this could help.

                Comment


                • #9
                  How to mark the order of the output by e-value or score?

                  Originally posted by nicolallias View Post
                  Hi,
                  You want the second best blast hit ? If so, no need to count the rank, just use a shell command :
                  Code:
                  head -n 2 blast_output.m8 | tail -n 1
                  (in fact, change 2 by the position of the line you want)
                  Will give you the second line in the blast_output.m8 file
                  Code:
                  hsa-let-7f MIMAT0000062 Homo sapiens let-7a 36 5e-08
                  Hope this could help.
                  Thanks,
                  I should have made myself clearer, actually my question is more general as for several thousands queries, each of which may give many hits, than the specific 3 rows example.
                  What I meant is the second best hit (or the first hit, or third hit, etc) of each query in the output file. That is why I need the rank of the e-value or bits score of "each query" so that I can sort them later. There are parameters to ask BLAST only to output the first 3 hits (-v3 -b3), but without an extra rank column that I need. It seems there is a option in the blastall command line. Just could not figure it out myself.
                  Iam aware of the output can be parsed later for this purpose, but the option (if any) could make the job much easier.

                  Comment


                  • #10
                    Sorry, now I get it...
                    Originally posted by yifangt View Post
                    Just could not figure it out myself.
                    Me neither
                    Originally posted by yifangt View Post
                    Iam aware of the output can be parsed later for this purpose, but the option (if any) could make the job much easier.
                    Anyway, you could pipe with a
                    Code:
                    awk '{print $1" "$2" "$3" "$4" "$5" "$6" "NR" "$7}'

                    Comment


                    • #11
                      normal untested disclaimers apply but the pattern is adaptable
                      this is assuming the fields are grouped by first (queries) and ordered by whatever field (E value) you choose. and do not mind the rank number out front

                      awk 'BEGIN{q=$1;c=0}'{if(q != $1){q=$1;c=0};c++;print c $0}'

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Recent Advances in Sequencing Analysis Tools
                        by seqadmin


                        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                        05-06-2024, 07:48 AM
                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 02:46 PM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-07-2024, 06:57 AM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-06-2024, 07:17 AM
                      0 responses
                      17 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-02-2024, 08:06 AM
                      0 responses
                      23 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X