Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • makeblastdb protein

    $ ./makeblastdb -in ../../Phosphosite_seq.fasta -input_type
    fasta -dbtype prot -title Phosphosite_seq_db -out Phosphosite_seq


    Building a new DB, current time: 03/19/2015 16:03:22
    New DB name: Phosphosite_seq
    New DB title: Phosphosite_seq_db
    Sequence type: Protein
    Keep Linkouts: T
    Keep MBits: T
    Maximum file size: 1000000000B

    volume: Phosphosite_seq

    file: Phosphosite_seq.pin
    file: Phosphosite_seq.phr
    file: Phosphosite_seq.psq

    BLAST Database creation error: FASTA-Reader: No residues given


    Any ideas?

  • #2
    Additionally

    $ head ../../Phosphosite_seq.fasta
    >CBLN1|mouse|Q9R171
    MLGVVELLLLGTAWLAGPARGQNETEPIVLEGKCLVVCDSNPTSDPTGTALGISVRSGSA
    KVAFSAIRSTNHEPSEMSNRTMIIYFDQVLVNIGNNFDSERSTFIAPRKGIYSFNFHVVK
    VYNRQTIQVSLMLNGWPVISAFAGDQDVTREAASNGVLIQMEKGDRAYLKLERGNLMGGW
    KYSTFSGFLVFPL
    >COX7A2|mouse|P48771
    MLRNLLALRQIAQRTISTTSRRHFENKVPEKQKLFQEDNGMPVHLKGGASDALLYRATMA
    LTLGGTAYAIYLLAMAAFPKKQN
    >FAM219A|mouse|Q9D772
    MMEEIDRFQDPAAASISDRDCDAREEKQRELARKGSLKNGSMGSPVNQQPKKNNVMARTR

    Comment


    • #3
      I would expect that the problem is that one of your sequences has no residues in it, but I can't reproduce the problem with your test data:

      Code:
      lpritc@Totoro:~$ more test.fas
      >CBLN1|mouse|Q9R171
      MLGVVELLLLGTAWLAGPARGQNETEPIVLEGKCLVVCDSNPTSDPTGTALGISVRSGSA
      KVAFSAIRSTNHEPSEMSNRTMIIYFDQVLVNIGNNFDSERSTFIAPRKGIYSFNFHVVK
      VYNRQTIQVSLMLNGWPVISAFAGDQDVTREAASNGVLIQMEKGDRAYLKLERGNLMGGW
      KYSTFSGFLVFPL
      >COX7A2|mouse|P48771
      MLRNLLALRQIAQRTISTTSRRHFENKVPEKQKLFQEDNGMPVHLKGGASDALLYRATMA
      LTLGGTAYAIYLLAMAAFPKKQN
      >FAM219A|mouse|Q9D772
      MMEEIDRFQDPAAASISDRDCDAREEKQRELARKGSLKNGSMGSPVNQQPKKNNVMARTR
      lpritc@Totoro:~$ makeblastdb -in test.fas -input_type fasta -dbtype prot -title test_db -out test
      
      
      Building a new DB, current time: 03/19/2015 22:01:53
      New DB name:   test
      New DB title:  test_db
      Sequence type: Protein
      Deleted existing BLAST database with identical name.
      Keep Linkouts: T
      Keep MBits: T
      Maximum file size: 1000000000B
      Adding sequences from FASTA; added 3 sequences in 0.000775099 seconds.
      If I were you, I would inspect the complete input file for sequences with no residues (e.g.

      Code:
      >some_sequence_id_1
      
      >some_sequence_id_2
      ACGHITNKLLSMNER
      This can happen if you've been using a tool that masks repeats.
      Last edited by LeightonP; 03-19-2015, 02:06 PM.

      Comment


      • #4
        Originally posted by LeightonP View Post
        If I were you, I would inspect the complete input file for sequences with no residues (e.g.

        Code:
        >some_sequence_id_1
        
        >some_sequence_id_2
        ACGHITNKLLSMNER
        This can happen if you've been using a tool that masks repeats.
        I haven't used any program to mask repeats. It is a seq database listing the complete sequence of proteins used in the Phosphosite database. There should be no empty sequences.

        Comment


        • #5
          Originally posted by ctstackh View Post
          I haven't used any program to mask repeats. It is a seq database listing the complete sequence of proteins used in the Phosphosite database. There should be no empty sequences.
          One of the frustrations of bioinformatics is that datasets don't always contain exactly what you expect. Did you check that there are - as you expect - no empty sequences?

          I can reproduce your error with a fake dataset containing an empty sequence, and I still think that this is possibly a cause of your error:

          Code:
          lpritc@Totoro:~$ cat > test.fas
          >seq1
          ATGCTGTCAGCTAGCTGATCGATCGGC
          >seq2
          
          >seq3
          GHILKPNMACDEFGH
          lpritc@Totoro:~$ makeblastdb -in test.fas -input_type fasta -dbtype prot -title test_db -out test
          
          
          Building a new DB, current time: 03/20/2015 11:55:19
          New DB name:   test
          New DB title:  test_db
          Sequence type: Protein
          Keep Linkouts: T
          Keep MBits: T
          Maximum file size: 1000000000B
          
          volume: test
          
          file: test.pin
          file: test.phr
          file: test.psq
          
          BLAST Database creation error: FASTA-Reader: No residues given
          Last edited by LeightonP; 03-20-2015, 03:56 AM.

          Comment


          • #6
            You can count the number of blank lines in a file (filename) with:

            Code:
            grep -c "^$" filename
            or, if you consider whitespace to be "blank":

            Code:
            grep -c "^\s*$" filename
            If you have any in your FASTA file, from which you're trying to build your database, that may be the problem. You can see the surrounding context of blank lines with:

            Code:
            grep -C "^$" filename
            (note: capital 'C' this time) - this should help you find any blank line in your file and edit it.

            Comment


            • #7
              Do this (adjust file names accordingly):

              1. Using BBMap's reformat.sh remove the line wrapping from the Phosphosite_seq.txt and make the sequence names unique.

              Code:
              $ reformat.sh in=Phosphosite_seq.txt out=reform.fa uniquenames=t fastawrap=80000
              2. Build the database with makeblastdb

              Code:
              $ makeblastdb -in reform.fa -dbtype prot -out Phosphosite_seq -title Phosphosite_seq_db

              Comment


              • #8
                Originally posted by GenoMax View Post
                Do this (adjust file names accordingly):

                1. Using BBMap's reformat.sh remove the line wrapping from the Phosphosite_seq.txt and make the sequence names unique.

                Code:
                $ reformat.sh in=Phosphosite_seq.txt out=reform.fa uniquenames=t fastawrap=80000
                2. Build the database with makeblastdb

                Code:
                $ makeblastdb -in reform.fa -dbtype prot -out Phosphosite_seq -title Phosphosite_seq_db
                Sweet! That worked. Thank you!

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advancing Precision Medicine for Rare Diseases in Children
                  by seqadmin




                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                  12-16-2024, 07:57 AM
                • seqadmin
                  Recent Advances in Sequencing Technologies
                  by seqadmin



                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                  Long-Read Sequencing
                  Long-read sequencing has seen remarkable advancements,...
                  12-02-2024, 01:49 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-17-2024, 10:28 AM
                0 responses
                23 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-13-2024, 08:24 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-12-2024, 07:41 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-11-2024, 07:45 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Working...
                X