Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • blastn - qcovs field - or how to parse results based on % coverage of query sequence

    I have been playing around with blast+ (blastn), a local installation and various custom databases.

    I thought I had my workflow figured out, but some output is confusing me.

    Specifically the qcovs flag. As per the blast manual 'qcovs means Query coverage per subject' - i.e. how much of my query is represented in an alignment. I assumed this to be a percentage value (maxium 100%). And I have used this for filtering.

    But, now, I have done a local blast using a genome db, where the qcovs value goes up to 400! So clearly, it is not calculated in % ! Which means my previous filtering is probably crap...

    I basically want to do the following:

    Blast a set of sequences against dátabase 1. Filter blast result for: a) %idendity and b) alignment length and c) % of query sequence covered in alignment.

    I am basically not interested in alignments that cover 100% of the query, as I am doing breakpoint/insertion mapping. So I wanna filter these out and re-blast against database 2.

    Any ideas?

  • #2
    Nevermind... User error. I managed to mess up the columns while filtering the blast result...

    Comment


    • #3
      Using what options will produce the qcovs?

      Comment


      • #4
        Originally posted by okorist View Post
        Using what options will produce the qcovs?
        Manuals tend to be useful..
        savetherhino.org

        Comment


        • #5
          Indeed, using the -outfmt paramater, you can add all of the fields specified in the manual, see here from the manual:

          outfmt string 0

          alignment view options:
          0 = pairwise,
          1 = query-anchored showing identities,
          2 = query-anchored no identities,
          3 = flat query-anchored, show identities,
          4 = flat query-anchored, no identities,
          5 = XML Blast output,
          6 = tabular,
          7 = tabular with comment lines,
          8 = Text ASN.1,
          9 = Binary ASN.1
          10 = Comma-separated values
          11 = BLAST archive format (ASN.1)
          Options 6, 7, and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers.
          The supported format specifiers are:
          qseqid means Query Seq-id
          qgi means Query GI
          qacc means Query accesion
          sseqid means Subject Seq-id
          sallseqid means All subject Seq-id(s), separated by a ';'
          sgi means Subject GI
          sallgi means All subject GIs
          sacc means Subject accession
          sallacc means All subject accessions
          qstart means Start of alignment in query
          qend means End of alignment in query
          sstart means Start of alignment in subject
          send means End of alignment in subject
          qseq means Aligned part of query sequence
          sseq means Aligned part of subject sequence
          evalue means Expect value
          bitscore means Bit score
          score means Raw score
          length means Alignment length
          pident means Percentage of identical matches
          nident means Number of identical matches
          mismatch means Number of mismatches
          positive means Number of positive-scoring matches
          gapopen means Number of gap openings
          gaps means Total number of gap
          ppos means Percentage of positive-scoring matches
          frames means Query and subject frames separated by a '/'
          qframe means Query frame
          sframe means Subject frame
          btop means Blast traceback operations (BTOP)
          staxids means unique Subject Taxonomy ID(s), separated by a ';'(in numerical order)
          sscinames means unique Subject Scientific Name(s), separated by a ';'
          scomnames means unique Subject Common Name(s), separated by a ';'
          sblastnames means unique Subject Blast Name(s), separated by a ';' (in alphabetical order)
          sskingdoms means unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order)
          stitle means Subject Title
          salltitles means All Subject Title(s), separated by a '<>'
          sstrand means Subject Strand
          qcovs means Query Coverage Per Subject
          qcovhsp means Query Coverage Per HSP
          When not provided, the default value is:
          'qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std'

          Comment


          • #6
            I think the qcov sums up the HSP lengths and divide it against query-length. If there is repeats in your query, sth bigger than 100% can show up. Because HSPs are repeatedly calculated. Is that your case?

            I have no solution for this problem, it seems complicated to program and filter the result.
            It will give you a bias towards bigger qcov. But I don't mind too much about it

            I wonder about what qcovhsp does though.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM
            • seqadmin
              The Impact of AI in Genomic Medicine
              by seqadmin



              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
              02-26-2024, 02:07 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-14-2024, 06:13 AM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-08-2024, 08:03 AM
            0 responses
            71 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-07-2024, 08:13 AM
            0 responses
            80 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-06-2024, 09:51 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X