Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • blast makeblastdb problem

    Dear all,

    I had encountered some problem recently with blast makeblastdb.

    $ ./makeblastdb -in transcript.fa -dbtype nucl -hash_index -parse_seqids -out transcript

    makeblastdb protein
    Building a new DB, current time: 07/16/2015 22:06:50
    New DB name: transcript
    New DB title: transcript.fa
    Sequence type: Nucleotide
    Keep Linkouts: T
    Keep MBits: T
    Maximum file size: 1000000000B
    Segmentation fault (core dumped)

    I found that some length of id in transcript.fa file are over 80 characters.

    is there any solution for this ?

    Thanks!!!

  • #2
    What is the size of the transcript.fa file and how much RAM do you have on this machine? Do you get seg fault right away or after some time?

    Comment


    • #3
      The file size of transcript.fa is about 40MB,and the ram of my machine is 300GB.
      I got the seg fault right away.

      Comment


      • #4
        The problem is likely something other than 80 characters. Can you post an example of your fasta sequence ID's?

        Just noticed that you have

        "makeblastdb protein"

        in your first post. Is this nucleotide or protein sequence?
        Last edited by GenoMax; 07-16-2015, 10:02 AM.

        Comment


        • #5
          There are some ID listed below.

          >1017.g16854.t1_RecName|_Full=Lethal(3)malignant_brain_tumor-like_protein_1|_Short=H-l(3)mbt|_Short=H-l(3)mbt_protein|_Short=L(3)mbt-like|_AltName|_Full=L(3)mbt_protein_homolog
          >1056.g17143.t1_RecName|_Full=Lethal(3)malignant_brain_tumor-like_protein_1|_Short=H-l(3)mbt|_Short=H-l(3)mbt_protein|_Short=L(3)mbt-like|_AltName|_Full=L(3)mbt_protein_homolog
          >1017.g16884.t1_RecName|_Full=PRELI_domain-containing_protein_1|_mitochondrial|_AltName|_Full=Px19-like_protein|_Flags|_Precursor_&gt|gi|969170|gb|AAC60046.1|_px19

          It is nucleotide sequence (only a/t/c/g/n).

          Comment


          • #6
            The problem is with format of your ID's. I am able to make a nucleotide database with your ID's (using blast v.2.2.31) but if I try to retrieve the accession numbers then I get the error
            Code:
            $ blastdbcmd -entry all -db ./transcript -outfmt '%a'
            Error: [blastdbcmd] FASTA-style ID LCL|1017.G16854.T1_RECNAME|_FULL=LETHAL(3)MALIGNANT_BRAIN_TUMOR-LIKE_PROTEIN_1|_SHORT=H-L(3)MBT|_SHORT=H-L(3)MBT_PROTEIN|_SHORT=L(3)MBT-LIKE|_ALTNAME|_FULL=L(3)MBT_PROTEIN_HOMOLOG has too many parts.
            Error: [blastdbcmd] FASTA-style ID LCL|1056.G17143.T1_RECNAME|_FULL=LETHAL(3)MALIGNANT_BRAIN_TUMOR-LIKE_PROTEIN_1|_SHORT=H-L(3)MBT|_SHORT=H-L(3)MBT_PROTEIN|_SHORT=L(3)MBT-LIKE|_ALTNAME|_FULL=L(3)MBT_PROTEIN_HOMOLOG has too many parts.
            Error: [blastdbcmd] FASTA-style ID LCL|1017.G16884.T1_RECNAME|_FULL=PRELI_DOMAIN-CONTAINING_PROTEIN_1|_MITOCHONDRIAL|_ALTNAME|_FULL=PX19-LIKE_PROTEIN|_FLAGS|_PRECURSOR_&GT has too many parts.
            If you are able to live with shortened header ID's. e.g. like
            Code:
            $  awk -F "|" '{if (/^>/) print $1; else print $0;}' your_file.fa > new_file.fa
            Which now gives you short ID's

            Code:
            >1017.g16854.t1_RecName
            >1056.g17143.t1_RecName
            >1017.g16884.t1_RecName
            makeblastdb/blastdbcmd will work.

            Comment


            • #7
              Thank you so much!!!!!

              The problem is due to the length of ID (too long),right?

              what should I do ,if I want to keep ID untouched?

              Comment


              • #8
                What version of blast are you using? Have you tried using the latest (v.2.2.31)? I was able to build the database fine with that version.

                The error I saw with your ID's is similar to Peter Cock's blog entry (http://blastedbio.blogspot.com/2012/...argetonly.html) though it is not for the command I am using. the -target_only option is working fine in 2.2.31.

                Perhaps it is the leading "_" that you have in the names that is causing the problem (e.g. _Short=H-l(3)mbt). Let me see if I can find a way to remove those easily.

                Update: That does not seem to be a problem. It must be something else.

                Peter also participates on this forum and he may come along with a suggestion later today.
                Last edited by GenoMax; 07-17-2015, 09:18 AM.

                Comment


                • #9
                  Sorry to dig up an old thread.

                  I am having the same problem with the makeblastdb command. I Get a segmentation fault error even when I type makeblastdb -help. It's like the command doesn't want to run whatsoever.

                  Was there any eventual solution to this problem?

                  Cheers.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  23 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X