Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BLAST filtering out 'no hits' in output

    Hi,

    I am using a local install of blastn to look for hits to a custom database from a bunch of short reads. In my final output file, I really only care about reads that found a 'hit' and so want to filter out all reads with 'no hits'. My strategy is as follows:

    Read 1
    Read 2
    .
    .
    Read X

    -> Put into one big FASTA file
    -> Run blastn against the custom database
    -> Save to file reads with hits only.

    Right now (the standard blast output) would be:
    Read 1 -> hit 1
    Read 2 -> no hit
    . -> no hit
    . -> no hit
    Read X -> hit X

    But I want my output to only show:
    Read 1 -> hit 1
    Read x -> hit X

    Since I have many reads and very few hits, my output file very rapidly gets filled with all the 'no hits' reads slowing down the whole process. Is there a way to filter my output directly via the command line, or do I need to build a script to do this?

    Thanks in advance,
    Kristian

  • #2
    I would use the tabular output, which just writes nothing at all for no hits.

    Comment


    • #3
      Hmm, tried that and I get the following error:
      BLAST query/options error: No hits are being saved

      I run the following command:
      blastn -query reads.fasta -db nt -num_descriptions 1 -num_alignments 0 -out reads_blastNT.txt -num_threads 8 -outfmt 6

      Instead of "-outfmt 6", I have also used:
      "7 sseqid ssac qstart qend sstart send qseq evalue bitscore"

      but I get the same result. Any thoughts? The output file is also empty.

      Thanks very much.

      Comment


      • #4
        Okay, so I solved the error, which was caused by my "-num_alignments 0" handle. However, the output still contains my reads with no hits - it just writes '0 hits found' under the read. This is not what I am after - I want to completely ignore '0 hits' reads so they don't even get written to the file.

        Comment


        • #5
          What exactly is the new command line you are running?

          Comment


          • #6
            Hi maubp.

            Sorry, I don't understand that question - I'm not running a new command line - I'm just trying to figure out whether I can tell blastn not to store my 'no hits' hits in the output file.

            Comment


            • #7
              I would expect you to use something like this command line:

              blastn -query reads.fasta -db nt -out reads_blastNT.txt -num_threads 8 -outfmt 6

              Rather than messing about with -num_descriptions and/or -num_alignments which only really apply to the plain text and HTML output, from memory I would use -max_target_seqs instead.

              Comment


              • #8
                I tried that one, but I still have the same problem - the '0 hits' hits are still recorded in the output file, rather than discarded.

                I guess I just have to pipe it and make a script - but it would have been easier if it could have been done directly from the command line.

                Comment


                • #9
                  I am surprised. What version of BLAST+ do you have? And could you post a bit of the example output?

                  Comment


                  • #10
                    I should have the newest version:
                    blastn: 2.2.25+
                    Package: blast 2.2.25, build Mar 21 2011 12:21:32

                    An example output looks like this (tabular format):
                    # BLASTN 2.2.25+
                    # Query: ILLUMINA:203:C07LRACXX:5:1101:1348:1952 1:N:0:TAGCTT
                    # Database: custom
                    # 0 hits found
                    # BLASTN 2.2.25+
                    # Query: ILLUMINA:203:C07LRACXX:5:1101:1366:1952 1:Y:0:TAGCTT
                    # Database: custom
                    # 0 hits found
                    # BLASTN 2.2.25+
                    # Query: ILLUMINA:203:C07LRACXX:5:1101:1472:1955 1:N:0:TAGCTT
                    # Database: custom
                    # 0 hits found
                    # BLASTN 2.2.25+
                    # Query: ILLUMINA:203:C07LRACXX:5:1101:1456:1981 1:N:0:TAGCTT
                    # Database: custom
                    # 0 hits found
                    # BLASTN 2.2.25+
                    # Query: ILLUMINA:203:C07LRACXX:5:1101:1414:1981 1:N:0:TAGCTT
                    # Database: custom
                    # 0 hits found

                    Basically, these are all the reads I don't want listed in the file as they have no hits.

                    Thanks a bunch.

                    Comment


                    • #11
                      All those lines starting with a hash (#) are comments which you asked BLAST to produce by using -outfmt 7,
                      Code:
                      $ blastn -help
                      ...
                       *** Formatting options
                       -outfmt <String>
                         alignment view options:
                           0 = pairwise,
                           1 = query-anchored showing identities,
                           2 = query-anchored no identities,
                           3 = flat query-anchored, show identities,
                           4 = flat query-anchored, no identities,
                           5 = XML Blast output,
                           6 = tabular,
                           7 = tabular with comment lines,
                           8 = Text ASN.1,
                           9 = Binary ASN.1,
                          10 = Comma-separated values,
                          11 = BLAST archive format (ASN.1)
                      Use -outfmt 6 as I suggested, and they'll go away.

                      P.S. That does mean for this example as there are no hits at all, the output file will be empty.

                      Comment


                      • #12
                        Yes! That worked - thanks so much. I skipped the -outfmt 6 handle when it initially failed due to my "-num_alignments 0".

                        Thanks for your help - that's exactly what I needed.

                        Comment


                        • #13
                          Great

                          Thanks for confirming it worked.

                          Comment


                          • #14
                            I made the same mistake just now by choosing -outfmt 7

                            However, it's a rather easy fix to remove the comments:

                            Code:
                            awk '!/^\#/' file > newfile
                            Should do the trick.
                            Last edited by jmjensen; 06-01-2013, 12:50 AM.

                            Comment


                            • #15
                              This is maybe not the right thread, but I am interested in only the queries with 'no hits'. So far I can only extract the reads with hits, and then compare the differences between the original query file and the file with hits only.

                              Any good tips on how to directly extract the queries with no hits?

                              Thanks, Jon

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              47 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X