SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BLAST+ creating custom blast database and using blast+ filtering features deniz Bioinformatics 3 07-07-2019 08:04 AM
Functions of the protein "zinc finger and BTB domain-containing protein" Testtube General 0 08-31-2012 09:20 AM
Protein data base with blast+ detq182 Bioinformatics 2 02-03-2012 06:24 AM
BLAST+ vs BLASTALL (legacy BLAST) Symphysodon Bioinformatics 4 10-25-2011 02:52 PM
BLAST database error - when changing to new BLAST+ local program biobio Bioinformatics 4 06-15-2011 05:20 AM

Reply
 
Thread Tools
Old 09-10-2012, 01:29 AM   #1
Tsuyoshi
Member
 
Location: japan

Join Date: Sep 2012
Posts: 24
Default Protein ID that blast could not identify

HI
I downloaded a proteome in fasta formater, which contains hundreds of proteins (http://labs.umassmed.edu/chlamyfp/in...p?content=help). And I want to blast against these proteins with my data using Blast+, however, when I makeblastdb the proteome dataset, an error occurred
*******************************************************************
Error: NCBI C++ Exception:
"/am/ncbiapdata/release/blast/src/2.2.26/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/objects/seq/../seqloc/Seq_id.cpp", line 1679: Error: ncbi:bjects::CSeq_id::x_Init() - Unsupported ID type C_1150005
*******************************************************************
I thing there must be something wrong with the proteome data, cause the blast+ just worked well when I used the data downloaded directly from NCBI.

Therefore, I opened the proteome data with textedit, and for example, the header of each sequence was like this
*****************************************************************
>C_680011|168600 FAP45, Flagellar Associated Protein Weakly Similar to Nasopharyngeal Epithelium Specific Protein 1
MPQTPPRSGGYRSGKQSYVDESLFGGSKRTGAAQVETLDSLKLTAPTRTISPKDRDVVTLTKGDLTRMLKASPIMTAEDVAAAKREAEAKREQLQAVSKA
RKEKMLKLEEEAKKQAPPTETEILQRQLNDATRSRATHMMLEQKDPVKHMNQMMLYSKCVTIRDAQIEEKKQMLAEEEEEQRRLDLMMEIERVKALEQYE
ARERQRVEERRKGAAVLSEQIKERERERIRQEELRDQERLQMLREIERLKEEEMQAQIEKKIQAKQLMEEVAAANSEQIKRKEGMKVREKEEDLRIADYI
LQKEMREQ
*****************************************************************

Here the "C_680011|168600" should be the protein ID I think, but there was no found if I search it in NCBI. I just wonder what kind of ID it is and how should I do to make the blast+ recognise it.

Thanks!
Tsuyoshi is offline   Reply With Quote
Old 09-10-2012, 02:11 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Are you using the -parse_seqids option? If so, try it without this. I only ever use this if my FASTA file identifiers follow the NCBI naming conventions.

It would be useful to show the command you used to run makeblastdb as that might help us to understand what you are doing.
maubp is offline   Reply With Quote
Old 09-10-2012, 02:22 AM   #3
Tsuyoshi
Member
 
Location: japan

Join Date: Sep 2012
Posts: 24
Default

Quote:
Originally Posted by maubp View Post
Are you using the -parse_seqids option? If so, try it without this. I only ever use this if my FASTA file identifiers follow the NCBI naming conventions.

It would be useful to show the command you used to run makeblastdb as that might help us to understand what you are doing.
Dear Maubp,
Thanks for you reply.
Yes I used -parse_seqids, and followed your suggestion, without the -parse_seqids, another error showed up,
*******************************************************************
Error: (CArgException::eNoArg) Argument "dbtype". Mandatory value is missing: `String, `nucl', `prot''
Error: (CArgException::eNoArg) Application's initialization failed
*****************************************************************

The command I used was
makeblastdb -in CrFP.fasta -out CrFP

Thanks
Tsuyoshi is offline   Reply With Quote
Old 09-10-2012, 02:30 AM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

That error is clear isn't it? You have to tell makeblastdb if your FASTA file is protein or nucleotides. i.e. either:

Code:
makeblastdb -in CrFP.fasta -out CrFP -dbtype nucl
or,

Code:
makeblastdb -in CrFP.fasta -out CrFP -dbtype prot
maubp is offline   Reply With Quote
Old 09-10-2012, 02:38 AM   #5
Tsuyoshi
Member
 
Location: japan

Join Date: Sep 2012
Posts: 24
Default

Quote:
Originally Posted by maubp View Post
That error is clear isn't it? You have to tell makeblastdb if your FASTA file is protein or nucleotides. i.e. either:

Code:
makeblastdb -in CrFP.fasta -out CrFP -dbtype nucl
or,

Code:
makeblastdb -in CrFP.fasta -out CrFP -dbtype prot
YES!
What a stupid mistake I made. It succeeded now!

Thank you!
Tsuyoshi is offline   Reply With Quote
Old 09-10-2012, 02:41 AM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Quote:
Originally Posted by Tsuyoshi View Post
It succeeded now!
Oh good. Understanding the NCBI BLAST+ error messages gets easier with practice
maubp is offline   Reply With Quote
Old 09-10-2012, 02:45 AM   #7
Tsuyoshi
Member
 
Location: japan

Join Date: Sep 2012
Posts: 24
Default

Quote:
Originally Posted by maubp View Post
Oh good. Understanding the NCBI BLAST+ error messages gets easier with practice
YEAP!

I couldn't agree with you anymore. Many thanks!
Tsuyoshi is offline   Reply With Quote
Old 09-10-2012, 03:02 AM   #8
Tsuyoshi
Member
 
Location: japan

Join Date: Sep 2012
Posts: 24
Default

Quote:
Originally Posted by maubp View Post
Oh good. Understanding the NCBI BLAST+ error messages gets easier with practice
HI Maubp,
But I still have a question about the protein ID, it seems like that there is no database name the proteins in that way, I mean, take several proteins as example, they are

C_1620015|156900
C_10830001|152917
C_2020008|159281
C_510029|166481
C_510029|166481
C_510029|166481
C_510029|166481

I do not think they are accession numbers for Chlamydomonas in NCBI, but I want to identify their correct or real NCBI accession numbers, do you have any idea about that?
Tsuyoshi is offline   Reply With Quote
Old 09-10-2012, 03:09 AM   #9
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

That's a different question - the only way your sequences would have real NCBI accession numbers would be if they have already been submitted to one of the databases. I would explore the NCBI databases for this using Entrez search term "chlamydomonas[orgn]" and see if anything matches your dataset:

http://www.ncbi.nlm.nih.gov/sites/gq...[orgn\
(square brackets in the URL confuse the forum software)

Or you could try BLAST'ing some of your sequences against the NR database to see if any give perfect matches?

Last edited by maubp; 09-10-2012 at 03:10 AM. Reason: Trying to fix link
maubp is offline   Reply With Quote
Old 09-10-2012, 03:12 AM   #10
Tsuyoshi
Member
 
Location: japan

Join Date: Sep 2012
Posts: 24
Default

Quote:
Originally Posted by maubp View Post
That's a different question - the only way your sequences would have real NCBI accession numbers would be if they have already been submitted to one of the databases. I would explore the NCBI databases for this using Entrez search term "chlamydomonas[orgn]" and see if anything matches your dataset:

http://www.ncbi.nlm.nih.gov/sites/gq...=chlamydomonas[orgn]

Or you could try BLAST'ing some of your sequences against the NR database to see if any give perfect matches?
The sequences themselves are perfectly matched the submitted data of Chlamydomonas. I just have no idea what kind of IDs they are that the authors used.
Tsuyoshi is offline   Reply With Quote
Old 09-10-2012, 03:14 AM   #11
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

If you can work out how to get the data from the NCBI with their accessions, that might be simpler than working with the original author's private identifiers.
maubp is offline   Reply With Quote
Old 09-10-2012, 03:22 AM   #12
Tsuyoshi
Member
 
Location: japan

Join Date: Sep 2012
Posts: 24
Default

Quote:
Originally Posted by maubp View Post
If you can work out how to get the data from the NCBI with their accessions, that might be simpler than working with the original author's private identifiers.
That's right.
Anyway, I will try to extract the accession numbers from NCBI.
Thank you very much Maubp !
Tsuyoshi is offline   Reply With Quote
Reply

Tags
protein id

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:57 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO