Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Blast problem

    Hi everybody, I am a bit desperate and hope someone can help me. I need to create a subset of the nr-database (for blastx) using a negative or positive gi list. There are several possibilities to do this (are there more??):

    1) read the multifasta nr-file and remove some entries; however, the file is almost 8 GB in size and this takes a lot of time (and you have to create the database afterwards)
    2) use blast+ which has a "negative_gi" option; however, another program's parser expects the old output format which seems to differ from blast+
    3) formatdb has a -L option to create a subset of the database based on a file with a positive gi-list
    4) blastall has a -l option to perform the search based on a file with a positive gi-list, which should produce the same result

    Now, the possibilities 3) and 4) seem to be what I need. Unfortunately, they don't work. The problem looks like this:

    my@computer:/tmp/blast/bin$ ls
    . blastclust drosoph.aa.phr fastacmd impala query.fa
    .. blastpgp drosoph.aa.pin formatdb makemat rpsblast
    bl2seq copymat drosoph.aa.psq formatdb.log megablast seedtop
    blastall drosoph.aa drosoph.gi.txt formatrpsdb .ncbirc
    my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F drosoph.gi.txt -L subset
    [formatdb] FATAL ERROR: Unable to find drosoph.gi.txt

    my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F ./drosoph.gi.txt -L subset
    [formatdb] FATAL ERROR: Unable to find ./drosoph.gi.txt

    my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F /tmp/blast/bin/drosoph.gi.txt -L subset
    [formatdb] FATAL ERROR: Unable to find /tmp/blast/bin/drosoph.gi.txt

    my@computer:/tmp/blast/bin$ ./formatdb -i drosoph.aa -F whatever -L subset
    [formatdb] FATAL ERROR: Unable to find whatever

    my@computer:/tmp/blast/bin$ ./blastall -p blastx -d drosoph.aa -l drosoph.gi.txt -i query.fa

    Searching[blastall] ERROR: query1[protein_gi:7290028]: Unable to open file drosoph.gi.txt
    [blastall] WARNING: query1[protein_gi:7290028]: Intersection of gilist and BLAST database ID's empty
    I tried the latest blast version (2.2.25) as well as some other ones, on Fedora and on Ubuntu. Can someone reproduce this behavior?

  • #2
    Hi,

    Both formatdb and blastall complained about drosoph.gi.txt. Can you double check the file? You may post "ls -l" output here.

    Comment


    • #3
      my@computer:/tmp/blast/bin$ ls -l
      total 94040
      ...
      -rwxr-xr-x 1 me me 8 2011-08-07 15:12 drosoph.gi.txt
      ...
      my@computer:/tmp/blast/bin$ head drosoph.gi.txt
      7290028
      The file contains only one line, which is a gi number. I tried setting permissions to 777 for this file, didn't help.

      Comment


      • #4
        1) This is a strange problem. The file belongs to me/me but the login is my. Do you know why. It should not contribute to your problem but I am just curious.
        2) Can you successfully run blast+ in this case? As to the output format, blast+ allows you to customize output fields. You may pursue this as an alternative.

        Comment


        • #5
          1) Well, that's because I changed my true name and did it inconsistently.
          2) Is it possible to change the blast+ output in a way that a parser written for the plain-text blast output can read it?? If would be really happy if this was the case. Otherwise I can't use it. (Anyway, I didn't try a blast+ run with a negative gi-list until now; will do tomorrow.)

          Comment


          • #6
            2) yes. You can specify the fields in tab-delimited format. Check the blast+ manual.

            Comment


            • #7
              Dear DZhang, thank you very much for your replies! But as far as I can see the program I use expects plain-text blast output, and not the tab-delimited format. And the plain-text blast+ output cannot be parsed.

              So I would like to use the old blast version, as it offers the option I need (according to the documentation, the -l parameter for blastall or the -L parameter for formatdb). Can someone reproduce my problem and make any suggestions?

              Comment


              • #8
                Ok, we got it. It almost drove me crazy. Finally, my colleague found out by using the - since today my very favorite - command "strace".

                So, here is the solution: the .ncbirc file has to contain the following lines.

                [BLAST]
                BLASTDB=/path/to/db
                Then the environmental variable is properly set and

                formatdb -i drosoph.aa -F drosoph.gil -L subset
                works like a charm.
                Last edited by sammy07; 08-09-2011, 05:39 AM.

                Comment


                • #9
                  sammy07, thank you for sharing the solution.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  59 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  57 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  56 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X