Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #16
    If you run "blastx -help" on the command line you will get all options for blastx. One of the sections is for output formats. Default format is "0".

    Code:
     -outfmt <String>
       alignment view options:
         0 = pairwise,
         1 = query-anchored showing identities,
         2 = query-anchored no identities,
         3 = flat query-anchored, show identities,
         4 = flat query-anchored, no identities,
         5 = XML Blast output,
         6 = tabular,
         7 = tabular with comment lines,
         8 = Text ASN.1,
         9 = Binary ASN.1,
        10 = Comma-separated values,
        11 = BLAST archive format (ASN.1) 
        12 = JSON Seqalign output 
       
       Options 6, 7, and 10 can be additionally configured to produce
       a custom format specified by space delimited format specifiers.
       The supported format specifiers are:
                qseqid means Query Seq-id
                   qgi means Query GI
                  qacc means Query accesion
               qaccver means Query accesion.version
                  qlen means Query sequence length
                sseqid means Subject Seq-id
             sallseqid means All subject Seq-id(s), separated by a ';'
                   sgi means Subject GI
                sallgi means All subject GIs
                  sacc means Subject accession
               saccver means Subject accession.version
               sallacc means All subject accessions
                  slen means Subject sequence length
                qstart means Start of alignment in query
                  qend means End of alignment in query
                sstart means Start of alignment in subject
                  send means End of alignment in subject
                  qseq means Aligned part of query sequence
                  sseq means Aligned part of subject sequence
                evalue means Expect value
              bitscore means Bit score
                 score means Raw score
                length means Alignment length
                pident means Percentage of identical matches
                nident means Number of identical matches
              mismatch means Number of mismatches
              positive means Number of positive-scoring matches
               gapopen means Number of gap openings
                  gaps means Total number of gaps
                  ppos means Percentage of positive-scoring matches
                frames means Query and subject frames separated by a '/'
                qframe means Query frame
                sframe means Subject frame
                  btop means Blast traceback operations (BTOP)
               staxids means unique Subject Taxonomy ID(s), separated by a ';'
                             (in numerical order)
             sscinames means unique Subject Scientific Name(s), separated by a ';'
             scomnames means unique Subject Common Name(s), separated by a ';'
            sblastnames means unique Subject Blast Name(s), separated by a ';'
                             (in alphabetical order)
            sskingdoms means unique Subject Super Kingdom(s), separated by a ';'
                             (in alphabetical order) 
                stitle means Subject Title
            salltitles means All Subject Title(s), separated by a '<>'
               sstrand means Subject Strand
                 qcovs means Query Coverage Per Subject
               qcovhsp means Query Coverage Per HSP
       When not provided, the default value is:
       'qseqid sseqid pident length mismatch gapopen qstart qend sstart send
       evalue bitscore', which is equivalent to the keyword 'std'
       Default = `0'
    When using a multi-fasta input file, each sequence will produce an output block that will start with "Query=" line and end with the "Effective search space…" line.

    Comment

    • kmcarr
      Senior Member
      • May 2008
      • 1181

      #17
      Originally posted by hlyates View Post
      Fantastic. Many salutations and thanks. Can you please point me to the NCBI docs which talk about this? Dr. Google didn't report a hit back for me. Would like to learn more about the formatting.
      It has been so long ago I don't recall if I ever did read any formal documentation on the Pairwise (default output 0) format or just figured it out through trial and error.

      As to parsing these plain text reports I have in the past used both BioPerl Bio::SearchIO and hand rolled code; pluses and minus in each. But as a general recommendation I would say that if you foresee needing to regularly parse BLAST reports I would avoid the default "Pairwise" output format altogether. The problem is that the format wasn't really designed for automated parsing and so parsing code is easily broken. If you plan to do a lot of parsing then the two better choices are tabular or XML. If your needs are simple (e.g. hit ids, evalues, start and end locations) then tabular is the way to go. Output is small and is very simple to parse. If you have more complex needs (e.g. capturing query/target alignments) then have your BLAST job output XML and parse it using the available modules from BioPerl, BioPython, etc. Since XML is such a structured format automated parsing is more robust. Also, since the XML format retains all of the information present in the Pairwise format it is possible to convert the interesting bits of the XML output into human readable form (again using the 'Bio' modules).

      Comment

      • hlyates
        Member
        • Mar 2015
        • 29

        #18
        Thank you Geno Max and kmcarr. I am thinking I may run my job again and use tabular/xml output. I can then more easily apply a script to it. If I understand you both, it seems biopython can parse tabular quite easily. I might go that route because I just need basic information such as:

        hit id
        e-value
        input sequence (the input sequence that alignment with something in nt database)
        target sequence organism id and name

        You pros are great and if I knew you in person I would buy you both some drinks.

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #19
          Provided you have access to a cluster and if you are going to do this over then I suggest you break-up your original file into multiple smaller ones and run the blast jobs in parallel. It would speed things up significantly.
          Last edited by GenoMax; 05-11-2015, 02:08 PM.

          Comment

          • hlyates
            Member
            • Mar 2015
            • 29

            #20
            Originally posted by GenoMax View Post
            Order of operations:

            1. Download all Drosophila proteins from tax browser link (since this is what your collaborator seems to want you to do) as multi-fasta format file.
            2. Make blast database using the fasta file.
            3. Blastx with your sequences using parameters you want (e-value cutoff etc). I would just choose the tabular output format since you can grab the sequence ID's that show a hit from this table.
            4. Use faFilter utility (http://hgdownload.soe.ucsc.edu/admin...86_64/faFilter) to get a subset that contains sequences from your list that hit Drosophila proteins.
            Since I am downloading a fasta file of proteins, when I make the database should the -dbtype option be:
            1. 1. nucl? (I think this is nucleotide)
            2. 2. prot (this is for proteins and hence what I should choose)?


            I feel stupid for not having thought of that question in advance. Only ran into it as I actually started reading the
            Code:
            makeblastdb -help
            docs.

            Comment

            • hlyates
              Member
              • Mar 2015
              • 29

              #21
              Originally posted by hlyates View Post
              Since I am downloading a fasta file of proteins, when I make the database should the -dbtype option be:
              1. 1. nucl? (I think this is nucleotide)
              2. 2. prot (this is for proteins and hence what I should choose)?


              I feel stupid for not having thought of that question in advance. Only ran into it as I actually started reading the docs
              Code:
              makeblastdb -help
              .
              Thank you so much for your patience while I learn. As I indicated, I am a one dog show, so this is the only true outlet I have to learn. I am very humbled by your assistance.

              Comment

              Latest Articles

              Collapse

              • GATTACAT
                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by GATTACAT
                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                07-01-2026, 11:43 AM
              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 11:08 AM
              0 responses
              7 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-30-2026, 05:37 AM
              0 responses
              11 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              20 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              53 views
              0 reactions
              Last Post SEQadmin2  
              Working...