Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Kasfen
    Junior Member
    • Sep 2014
    • 4

    blast makeblastdb problem

    Dear all,

    I had encountered some problem recently with blast makeblastdb.

    $ ./makeblastdb -in transcript.fa -dbtype nucl -hash_index -parse_seqids -out transcript

    makeblastdb protein
    Building a new DB, current time: 07/16/2015 22:06:50
    New DB name: transcript
    New DB title: transcript.fa
    Sequence type: Nucleotide
    Keep Linkouts: T
    Keep MBits: T
    Maximum file size: 1000000000B
    Segmentation fault (core dumped)

    I found that some length of id in transcript.fa file are over 80 characters.

    is there any solution for this ?

    Thanks!!!
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    What is the size of the transcript.fa file and how much RAM do you have on this machine? Do you get seg fault right away or after some time?

    Comment

    • Kasfen
      Junior Member
      • Sep 2014
      • 4

      #3
      The file size of transcript.fa is about 40MB,and the ram of my machine is 300GB.
      I got the seg fault right away.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        The problem is likely something other than 80 characters. Can you post an example of your fasta sequence ID's?

        Just noticed that you have

        "makeblastdb protein"

        in your first post. Is this nucleotide or protein sequence?
        Last edited by GenoMax; 07-16-2015, 10:02 AM.

        Comment

        • Kasfen
          Junior Member
          • Sep 2014
          • 4

          #5
          There are some ID listed below.

          >1017.g16854.t1_RecName|_Full=Lethal(3)malignant_brain_tumor-like_protein_1|_Short=H-l(3)mbt|_Short=H-l(3)mbt_protein|_Short=L(3)mbt-like|_AltName|_Full=L(3)mbt_protein_homolog
          >1056.g17143.t1_RecName|_Full=Lethal(3)malignant_brain_tumor-like_protein_1|_Short=H-l(3)mbt|_Short=H-l(3)mbt_protein|_Short=L(3)mbt-like|_AltName|_Full=L(3)mbt_protein_homolog
          >1017.g16884.t1_RecName|_Full=PRELI_domain-containing_protein_1|_mitochondrial|_AltName|_Full=Px19-like_protein|_Flags|_Precursor_&gt|gi|969170|gb|AAC60046.1|_px19

          It is nucleotide sequence (only a/t/c/g/n).

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            The problem is with format of your ID's. I am able to make a nucleotide database with your ID's (using blast v.2.2.31) but if I try to retrieve the accession numbers then I get the error
            Code:
            $ blastdbcmd -entry all -db ./transcript -outfmt '%a'
            Error: [blastdbcmd] FASTA-style ID LCL|1017.G16854.T1_RECNAME|_FULL=LETHAL(3)MALIGNANT_BRAIN_TUMOR-LIKE_PROTEIN_1|_SHORT=H-L(3)MBT|_SHORT=H-L(3)MBT_PROTEIN|_SHORT=L(3)MBT-LIKE|_ALTNAME|_FULL=L(3)MBT_PROTEIN_HOMOLOG has too many parts.
            Error: [blastdbcmd] FASTA-style ID LCL|1056.G17143.T1_RECNAME|_FULL=LETHAL(3)MALIGNANT_BRAIN_TUMOR-LIKE_PROTEIN_1|_SHORT=H-L(3)MBT|_SHORT=H-L(3)MBT_PROTEIN|_SHORT=L(3)MBT-LIKE|_ALTNAME|_FULL=L(3)MBT_PROTEIN_HOMOLOG has too many parts.
            Error: [blastdbcmd] FASTA-style ID LCL|1017.G16884.T1_RECNAME|_FULL=PRELI_DOMAIN-CONTAINING_PROTEIN_1|_MITOCHONDRIAL|_ALTNAME|_FULL=PX19-LIKE_PROTEIN|_FLAGS|_PRECURSOR_&GT has too many parts.
            If you are able to live with shortened header ID's. e.g. like
            Code:
            $  awk -F "|" '{if (/^>/) print $1; else print $0;}' your_file.fa > new_file.fa
            Which now gives you short ID's

            Code:
            >1017.g16854.t1_RecName
            >1056.g17143.t1_RecName
            >1017.g16884.t1_RecName
            makeblastdb/blastdbcmd will work.

            Comment

            • Kasfen
              Junior Member
              • Sep 2014
              • 4

              #7
              Thank you so much!!!!!

              The problem is due to the length of ID (too long),right?

              what should I do ,if I want to keep ID untouched?

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                #8
                What version of blast are you using? Have you tried using the latest (v.2.2.31)? I was able to build the database fine with that version.

                The error I saw with your ID's is similar to Peter Cock's blog entry (http://blastedbio.blogspot.com/2012/...argetonly.html) though it is not for the command I am using. the -target_only option is working fine in 2.2.31.

                Perhaps it is the leading "_" that you have in the names that is causing the problem (e.g. _Short=H-l(3)mbt). Let me see if I can find a way to remove those easily.

                Update: That does not seem to be a problem. It must be something else.

                Peter also participates on this forum and he may come along with a suggestion later today.
                Last edited by GenoMax; 07-17-2015, 09:18 AM.

                Comment

                • GSviral
                  Member
                  • Dec 2014
                  • 38

                  #9
                  Sorry to dig up an old thread.

                  I am having the same problem with the makeblastdb command. I Get a segmentation fault error even when I type makeblastdb -help. It's like the command doesn't want to run whatsoever.

                  Was there any eventual solution to this problem?

                  Cheers.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM
                  • SEQadmin2
                    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                    by SEQadmin2


                    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                    Introduction

                    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                    05-22-2026, 06:42 AM
                  • SEQadmin2
                    Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                    by SEQadmin2

                    Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                    Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                    05-06-2026, 09:04 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Today, 08:59 AM
                  0 responses
                  9 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  21 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 11:40 AM
                  0 responses
                  17 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-28-2026, 11:40 AM
                  0 responses
                  30 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...