Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Tsuyoshi
    Member
    • Sep 2012
    • 24

    Protein ID that blast could not identify

    HI
    I downloaded a proteome in fasta formater, which contains hundreds of proteins (http://labs.umassmed.edu/chlamyfp/in...p?content=help). And I want to blast against these proteins with my data using Blast+, however, when I makeblastdb the proteome dataset, an error occurred
    *******************************************************************
    Error: NCBI C++ Exception:
    "/am/ncbiapdata/release/blast/src/2.2.26/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/objects/seq/../seqloc/Seq_id.cpp", line 1679: Error: ncbi:bjects::CSeq_id::x_Init() - Unsupported ID type C_1150005
    *******************************************************************
    I thing there must be something wrong with the proteome data, cause the blast+ just worked well when I used the data downloaded directly from NCBI.

    Therefore, I opened the proteome data with textedit, and for example, the header of each sequence was like this
    *****************************************************************
    >C_680011|168600 FAP45, Flagellar Associated Protein Weakly Similar to Nasopharyngeal Epithelium Specific Protein 1
    MPQTPPRSGGYRSGKQSYVDESLFGGSKRTGAAQVETLDSLKLTAPTRTISPKDRDVVTLTKGDLTRMLKASPIMTAEDVAAAKREAEAKREQLQAVSKA
    RKEKMLKLEEEAKKQAPPTETEILQRQLNDATRSRATHMMLEQKDPVKHMNQMMLYSKCVTIRDAQIEEKKQMLAEEEEEQRRLDLMMEIERVKALEQYE
    ARERQRVEERRKGAAVLSEQIKERERERIRQEELRDQERLQMLREIERLKEEEMQAQIEKKIQAKQLMEEVAAANSEQIKRKEGMKVREKEEDLRIADYI
    LQKEMREQ
    *****************************************************************

    Here the "C_680011|168600" should be the protein ID I think, but there was no found if I search it in NCBI. I just wonder what kind of ID it is and how should I do to make the blast+ recognise it.

    Thanks!
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    Are you using the -parse_seqids option? If so, try it without this. I only ever use this if my FASTA file identifiers follow the NCBI naming conventions.

    It would be useful to show the command you used to run makeblastdb as that might help us to understand what you are doing.

    Comment

    • Tsuyoshi
      Member
      • Sep 2012
      • 24

      #3
      Originally posted by maubp View Post
      Are you using the -parse_seqids option? If so, try it without this. I only ever use this if my FASTA file identifiers follow the NCBI naming conventions.

      It would be useful to show the command you used to run makeblastdb as that might help us to understand what you are doing.
      Dear Maubp,
      Thanks for you reply.
      Yes I used -parse_seqids, and followed your suggestion, without the -parse_seqids, another error showed up,
      *******************************************************************
      Error: (CArgException::eNoArg) Argument "dbtype". Mandatory value is missing: `String, `nucl', `prot''
      Error: (CArgException::eNoArg) Application's initialization failed
      *****************************************************************

      The command I used was
      makeblastdb -in CrFP.fasta -out CrFP

      Thanks

      Comment

      • maubp
        Peter (Biopython etc)
        • Jul 2009
        • 1544

        #4
        That error is clear isn't it? You have to tell makeblastdb if your FASTA file is protein or nucleotides. i.e. either:

        Code:
        makeblastdb -in CrFP.fasta -out CrFP -dbtype nucl
        or,

        Code:
        makeblastdb -in CrFP.fasta -out CrFP -dbtype prot

        Comment

        • Tsuyoshi
          Member
          • Sep 2012
          • 24

          #5
          Originally posted by maubp View Post
          That error is clear isn't it? You have to tell makeblastdb if your FASTA file is protein or nucleotides. i.e. either:

          Code:
          makeblastdb -in CrFP.fasta -out CrFP -dbtype nucl
          or,

          Code:
          makeblastdb -in CrFP.fasta -out CrFP -dbtype prot
          YES!
          What a stupid mistake I made. It succeeded now!

          Thank you!

          Comment

          • maubp
            Peter (Biopython etc)
            • Jul 2009
            • 1544

            #6
            Originally posted by Tsuyoshi View Post
            It succeeded now!
            Oh good. Understanding the NCBI BLAST+ error messages gets easier with practice

            Comment

            • Tsuyoshi
              Member
              • Sep 2012
              • 24

              #7
              Originally posted by maubp View Post
              Oh good. Understanding the NCBI BLAST+ error messages gets easier with practice
              YEAP!

              I couldn't agree with you anymore. Many thanks!

              Comment

              • Tsuyoshi
                Member
                • Sep 2012
                • 24

                #8
                Originally posted by maubp View Post
                Oh good. Understanding the NCBI BLAST+ error messages gets easier with practice
                HI Maubp,
                But I still have a question about the protein ID, it seems like that there is no database name the proteins in that way, I mean, take several proteins as example, they are

                C_1620015|156900
                C_10830001|152917
                C_2020008|159281
                C_510029|166481
                C_510029|166481
                C_510029|166481
                C_510029|166481

                I do not think they are accession numbers for Chlamydomonas in NCBI, but I want to identify their correct or real NCBI accession numbers, do you have any idea about that?

                Comment

                • maubp
                  Peter (Biopython etc)
                  • Jul 2009
                  • 1544

                  #9
                  That's a different question - the only way your sequences would have real NCBI accession numbers would be if they have already been submitted to one of the databases. I would explore the NCBI databases for this using Entrez search term "chlamydomonas[orgn]" and see if anything matches your dataset:


                  (square brackets in the URL confuse the forum software)

                  Or you could try BLAST'ing some of your sequences against the NR database to see if any give perfect matches?
                  Last edited by maubp; 09-10-2012, 03:10 AM. Reason: Trying to fix link

                  Comment

                  • Tsuyoshi
                    Member
                    • Sep 2012
                    • 24

                    #10
                    Originally posted by maubp View Post
                    That's a different question - the only way your sequences would have real NCBI accession numbers would be if they have already been submitted to one of the databases. I would explore the NCBI databases for this using Entrez search term "chlamydomonas[orgn]" and see if anything matches your dataset:

                    http://www.ncbi.nlm.nih.gov/sites/gq...=chlamydomonas[orgn]

                    Or you could try BLAST'ing some of your sequences against the NR database to see if any give perfect matches?
                    The sequences themselves are perfectly matched the submitted data of Chlamydomonas. I just have no idea what kind of IDs they are that the authors used.

                    Comment

                    • maubp
                      Peter (Biopython etc)
                      • Jul 2009
                      • 1544

                      #11
                      If you can work out how to get the data from the NCBI with their accessions, that might be simpler than working with the original author's private identifiers.

                      Comment

                      • Tsuyoshi
                        Member
                        • Sep 2012
                        • 24

                        #12
                        Originally posted by maubp View Post
                        If you can work out how to get the data from the NCBI with their accessions, that might be simpler than working with the original author's private identifiers.
                        That's right.
                        Anyway, I will try to extract the accession numbers from NCBI.
                        Thank you very much Maubp !

                        Comment

                        Latest Articles

                        Collapse

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, Yesterday, 10:09 AM
                        0 responses
                        10 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-04-2026, 08:59 AM
                        0 responses
                        20 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-02-2026, 12:03 PM
                        0 responses
                        27 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-02-2026, 11:40 AM
                        0 responses
                        21 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...