Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • blastall output: NCBI vs command line

    Sorry for basic question. I wonder is there any options in blastall I can use in order to get output in the same format as we get it using blast online on the NCBI website?
    Specifically, I want to get "Sequences producing significant alignments:" line that contains columns: Accession Description Max score Total score Query coverage E value Max ident


    But using blastall from command line
    e.g. blastall -p blastx -i input.fa -d /blast/db/nr -a 4 -b 5 -v 5 -e 1e-20 -o output.file
    I get only Accession Description Score (bits) E value columns.


    However, I want also to get Query coverage and Max ident columns.
    I didn't find solution in the blastall manual. Perhaps, it depends on the parameter -m, but there are many options...
    Thanks in advance!

    UPD: -m 9 (tabular with comment lines (post-processed, sorted) view) produces almost what I want, but it gives only an Accession ID without description. And something like gi|66734174|gb|AAY53484.1| isn't very helpful.
    Last edited by ElMichael; 03-09-2011, 02:26 PM.

  • #2
    Originally posted by ElMichael View Post
    Sorry for basic question. I wonder is there any options in blastall I can use in order to get output in the same format as we get it using blast online on the NCBI website?
    The NCBI website is now using BLAST+ rather than 'legacy' BLAST. So one thing to do would be to switch from using 'legacy' blastall binary to the blastx binary. Note that with BLAST+ you can request lots of extra columns in the tabular output - that may cover what you want.

    Comment


    • #3
      I haven't tried Blast+ yet, but in the past we have used a combination of
      blastx -m 8
      and blastx (without the -m parameter)
      to get coverage and protein hit names.

      Comment


      • #4
        I haven't tried Blast+ yet, but in the past we have used a combination of
        blastx -m 8
        and blastx (without the -m parameter)
        to get coverage and protein hit names.

        Comment


        • #5
          maubp, colindaven, thanks for your advice!
          I tried the blast+, but, unfortunately, the number of supported format specifiers doesn't include Description of subject (I wonder why?!) and Query coverage (though it could be calculated, but again why?!).
          I think, I have to use combination of two blastx runs as colindaven suggested.
          (Though still hope that there is some unknown to me magic option that produces required format).

          Comment


          • #6
            Originally posted by ElMichael View Post
            maubp, colindaven, thanks for your advice!
            I tried the blast+, but, unfortunately, the number of supported format specifiers doesn't include Description of subject (I wonder why?!) and Query coverage (though it could be calculated, but again why?!).
            I'd like to be able to have query length and subject length as output columns (which then makes either percentage coverage easily calculated).
            Originally posted by ElMichael View Post
            I think, I have to use combination of two blastx runs as colindaven suggested.
            (Though still hope that there is some unknown to me magic option that produces required format).
            You don't have to do that, run BLAST+ once with ASN.1 output, then use blast_formatter to turn this into any of the output formats (text, html, xml, tabular).

            Comment


            • #7
              This sounds like a job for Bio::SearchIO. However you have to be very comfortable already with BioPerl.

              Comment


              • #8
                Originally posted by maubp View Post
                You don't have to do that, run BLAST+ once with ASN.1 output, then use blast_formatter to turn this into any of the output formats (text, html, xml, tabular).
                Thanks for the hint.

                kmcarr, that works terrific! Exactly, what I wanted. Thank you.

                Comment


                • #9
                  follow up blast+

                  Hello,
                  I met similar case to blast. Not familiar with the blast+ though. Anyway, I tried:
                  Code:
                  blastall -p blastx -i all-EST-cleaned.fasta -d my-db -m 9 -B 3 -b 10  -o blast-output.txt
                  and I got the result,
                  Code:
                  # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q6IBW4|CNDH2_HUMAN	34.69	49	32	0	360	214	258	306	1.0	32.7
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q5T655|CC147_HUMAN	20.73	82	65	0	260	15	39	120	1.3	32.3
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q9BRQ6|CHCH6_HUMAN	29.87	77	43	2	420	223	63	139	2.9	31.2
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q9Y3L3|3BP1_HUMAN	35.48	62	39	2	414	232	7	62	2.9	31.2
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q9LEM8|NAC2_CHLRE	38.46	39	24	0	387	271	1313	1351	3.8	30.8
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|P33424|POLN_HEVPA	39.34	61	37	2	423	241	1034	1090	5.0	30.4
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q81862|POLN_HEVCH	39.34	61	37	2	423	241	1034	1090	5.0	30.4
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q9UKP4|ATS7_HUMAN	39.47	38	23	0	423	310	1022	1059	5.0	30.4
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q2PC93|SSPO_CHICK	39.47	38	22	1	286	176	4073	4110	8.4	29.6
                  Now,
                  1) how can I add the annotation to the end for each subject entry,
                  2) how to reformat the subject entries as html link to NCBI if not using 1)?
                  Thanks!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 08:47 AM
                  0 responses
                  14 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  54 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X