SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BLAST+ creating custom blast database and using blast+ filtering features deniz Bioinformatics 3 07-07-2019 08:04 AM
Help to use Blast2go islandemiaj Bioinformatics 5 09-23-2015 07:38 AM
Blast-like program to use with MEGAN? kga1978 Bioinformatics 3 11-30-2012 06:26 AM
Processing Blast output for Blast2GO JueFish Bioinformatics 3 10-29-2011 06:37 AM
BLAST database error - when changing to new BLAST+ local program biobio Bioinformatics 4 06-15-2011 05:20 AM

Reply
 
Thread Tools
Old 01-15-2013, 03:00 AM   #1
hugh_hang
Member
 
Location: Hangzhou, China

Join Date: Jan 2013
Posts: 28
Question Is there any method to blast in other databases in blast2go program? help!?

I have about 300 sequences to blast and analys with blast2go.
BUT, In blast2go, I can only blast several DBs in the option list, can i add another db (like TAIR) into it and run the blast2go? OR can some other applications both online or offline create a blast result which can be imported into blast2go?

Thanks a lot!
hugh_hang is offline   Reply With Quote
Old 01-15-2013, 05:24 AM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

The default configuration for online BLAST through the Blast2GO GUI application uses the NCBI QBlast service which provides only those databases listed; you can not add custom databases to this method.

Your alternatives to use different databases are:

1) Set up your own WWW-BLAST service (or find access to someone's who will share) which has or can be customized with the databases you want. Edit the blast2go.properties file on your local computer to designate this WWW-BLAST server as the default source for running your online BLAST searches through the Blast2GO GUI.

2) Run your BLAST search using a standalone (command line) BLAST installation against your custom database. Be sure to configure your BLAST search to output the results in XML format. Launch Blast2GO and load your FASTA sequence file as normal. From the File menu select "Import->Import Blast Results". Select your XML file (or files) for import. Once the BLAST results have been imported proceed with Mapping and Annotation as usual.

I recommend option #2 because it is easier and more scalable.
kmcarr is offline   Reply With Quote
Old 01-15-2013, 06:12 PM   #3
hugh_hang
Member
 
Location: Hangzhou, China

Join Date: Jan 2013
Posts: 28
Smile

Quote:
Originally Posted by kmcarr View Post
The default configuration for online BLAST through the Blast2GO GUI application uses the NCBI QBlast service which provides only those databases listed; you can not add custom databases to this method.

Your alternatives to use different databases are:

1) Set up your own WWW-BLAST service (or find access to someone's who will share) which has or can be customized with the databases you want. Edit the blast2go.properties file on your local computer to designate this WWW-BLAST server as the default source for running your online BLAST searches through the Blast2GO GUI.

2) Run your BLAST search using a standalone (command line) BLAST installation against your custom database. Be sure to configure your BLAST search to output the results in XML format. Launch Blast2GO and load your FASTA sequence file as normal. From the File menu select "Import->Import Blast Results". Select your XML file (or files) for import. Once the BLAST results have been imported proceed with Mapping and Annotation as usual.

I recommend option #2 because it is easier and more scalable.
Thank you, I have thought of the option #2, but how can I get the database downloaded from TAIR in the format of .fasta with accession No. like AT4G22340
hugh_hang is offline   Reply With Quote
Old 01-15-2013, 06:23 PM   #4
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Here is the ftp site for the TAIR10 blast sets. You probably want one of the cds or cDNA files:

ftp://ftp.arabidopsis.org/home/tair/...R10_blastsets/
chadn737 is offline   Reply With Quote
Old 01-15-2013, 06:59 PM   #5
hugh_hang
Member
 
Location: Hangzhou, China

Join Date: Jan 2013
Posts: 28
Smile

Quote:
Originally Posted by chadn737 View Post
Here is the ftp site for the TAIR10 blast sets. You probably want one of the cds or cDNA files:

ftp://ftp.arabidopsis.org/home/tair/...R10_blastsets/
Thank you, I tried clicking "download" in the TAIR website, but I'm really confused by its dendroid file structure.
hugh_hang is offline   Reply With Quote
Old 01-16-2013, 05:23 AM   #6
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Quote:
Originally Posted by hugh_hang View Post
Quote:
Originally Posted by chadn737
Here is the ftp site for the TAIR10 blast sets. You probably want one of the cds or cDNA files:

ftp://ftp.arabidopsis.org/home/tair/...R10_blastsets/
Thank you, I tried clicking "download" in the TAIR website, but I'm really confused by its dendroid file structure.
Just click on the link chadn provided in his reply. It will take you directly to the correct FTP directory with various FASTA files. Read the Readme_blastdatasets_TAIR10.txt file for a description of what each one is.

I, myself would choose the "TAIR10_pep_20110103_representative_gene_model_updated" (or "TAIR10_pep_20101214_updated") and assuming you are BLASTing with nucleotide queries run BLASTX against this protein dataset.
kmcarr is offline   Reply With Quote
Old 01-17-2013, 06:09 PM   #7
hugh_hang
Member
 
Location: Hangzhou, China

Join Date: Jan 2013
Posts: 28
Smile

Quote:
Originally Posted by kmcarr View Post
Just click on the link chadn provided in his reply. It will take you directly to the correct FTP directory with various FASTA files. Read the Readme_blastdatasets_TAIR10.txt file for a description of what each one is.

I, myself would choose the "TAIR10_pep_20110103_representative_gene_model_updated" (or "TAIR10_pep_20101214_updated") and assuming you are BLASTing with nucleotide queries run BLASTX against this protein dataset.
My sequences are cDNA-AFLP results, does that mean "TAIR10_cdna_..." is a better option? By the way, what does the "pep" in "TAIR10_pep_..." mean?
hugh_hang is offline   Reply With Quote
Old 01-17-2013, 08:09 PM   #8
hugh_hang
Member
 
Location: Hangzhou, China

Join Date: Jan 2013
Posts: 28
Question

Quote:
Originally Posted by kmcarr View Post
The default configuration for online BLAST through the Blast2GO GUI application uses the NCBI QBlast service which provides only those databases listed; you can not add custom databases to this method.

Your alternatives to use different databases are:

1) Set up your own WWW-BLAST service (or find access to someone's who will share) which has or can be customized with the databases you want. Edit the blast2go.properties file on your local computer to designate this WWW-BLAST server as the default source for running your online BLAST searches through the Blast2GO GUI.

2) Run your BLAST search using a standalone (command line) BLAST installation against your custom database. Be sure to configure your BLAST search to output the results in XML format. Launch Blast2GO and load your FASTA sequence file as normal. From the File menu select "Import->Import Blast Results". Select your XML file (or files) for import. Once the BLAST results have been imported proceed with Mapping and Annotation as usual.

I recommend option #2 because it is easier and more scalable.
Sir, I've done what you recommanded me to do. But after I imported the XML file, the mapping and annotation seems not recognise the blast result and give 0 sequences mapped or annotated, my database id from TAIR10_blast_set which are nucleotide queries fasta files.
hugh_hang is offline   Reply With Quote
Old 01-18-2013, 05:36 AM   #9
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Quote:
Originally Posted by hugh_hang View Post
My sequences are cDNA-AFLP results, does that mean "TAIR10_cdna_..." is a better option? By the way, what does the "pep" in "TAIR10_pep_..." mean?
cDNA-AFLP you say? O.K.

"pep" = peptide, meaning that these FASTA files contain the protein (amino acid) sequence translated from the predicted CDS.

Given the method used to isolate your material for sequencing can be assumed that the sequences are derived from protein coding genes. Amino acid sequence is more conserved than the underlying nucleic acid sequence so comparing across species using amino acid sequences is more sensitive than comparisons based on DNA sequence. This is why I suggested using the Arabidopsis protein (TAIR10_pep_*) database as a target in a BLASTX search with your cDNA query sequences. BLASTX will translate your query sequences in all 6 possible reading frames and compare those amino acid sequences for similarity.
kmcarr is offline   Reply With Quote
Old 01-18-2013, 05:37 AM   #10
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Quote:
Originally Posted by hugh_hang View Post
Sir, I've done what you recommanded me to do. But after I imported the XML file, the mapping and annotation seems not recognise the blast result and give 0 sequences mapped or annotated, my database id from TAIR10_blast_set which are nucleotide queries fasta files.
Are you sure the output from the BLAST search is properly formatted and contains valid hits?
kmcarr is offline   Reply With Quote
Old 01-18-2013, 07:10 PM   #11
hugh_hang
Member
 
Location: Hangzhou, China

Join Date: Jan 2013
Posts: 28
Thumbs up

Quote:
Originally Posted by kmcarr View Post
cDNA-AFLP you say? O.K.

"pep" = peptide, meaning that these FASTA files contain the protein (amino acid) sequence translated from the predicted CDS.

Given the method used to isolate your material for sequencing can be assumed that the sequences are derived from protein coding genes. Amino acid sequence is more conserved than the underlying nucleic acid sequence so comparing across species using amino acid sequences is more sensitive than comparisons based on DNA sequence. This is why I suggested using the Arabidopsis protein (TAIR10_pep_*) database as a target in a BLASTX search with your cDNA query sequences. BLASTX will translate your query sequences in all 6 possible reading frames and compare those amino acid sequences for similarity.
I see. Thank you.
hugh_hang is offline   Reply With Quote
Old 01-18-2013, 07:16 PM   #12
hugh_hang
Member
 
Location: Hangzhou, China

Join Date: Jan 2013
Posts: 28
Red face

Quote:
Originally Posted by kmcarr View Post
Are you sure the output from the BLAST search is properly formatted and contains valid hits?
I cleared the unproper tabs(\t) & enters(\r\n) and it still doesn't work. I have to doubt if it's because there are too fewer sequences that have found hits or if I should use protain DBs instead of nucleotide ones.
hugh_hang is offline   Reply With Quote
Old 01-19-2013, 06:00 AM   #13
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Quote:
Originally Posted by hugh_hang View Post
I cleared the unproper tabs(\t) & enters(\r\n) and it still doesn't work. I have to doubt if it's because there are too fewer sequences that have found hits or if I should use protain DBs instead of nucleotide ones.
What made you think that there were improper tabs or returns that needed removing?

Please post an example of the BLAST output before you edited it (just a couple of dozen lines is enough).
kmcarr is offline   Reply With Quote
Old 01-19-2013, 07:54 AM   #14
hugh_hang
Member
 
Location: Hangzhou, China

Join Date: Jan 2013
Posts: 28
Default

Quote:
Originally Posted by kmcarr View Post
What made you think that there were improper tabs or returns that needed removing?

Please post an example of the BLAST output before you edited it (just a couple of dozen lines is enough).
********************
<BlastOutput_param>
<Parameters>
<Parameters_matrix>BLOSUM62</Parameters_matrix>
<Parameters_expect>1e-06</Parameters_expect>
<Parameters_gap-open>11</Parameters_gap-open>
<Parameters_gap-extend>1</Parameters_gap-extend>
<Parameters_filter>L;</Parameters_filter>
</Parameters>
</BlastOutput_param>
********************
this is what blast2go formatted and that below is what local blast formatted.
********************
<BlastOutput_param>
____<Parameters>
________<Parameters_expect>1e-06</Parameters_expect>
________<Parameters_gap-open>11</Parameters_gap-open>
________<Parameters_gap-extend>1</Parameters_gap-extend>
________<Parameters_filter>L;</Parameters_filter>
____</Parameters>
</BlastOutput_param>
********************
("_" stands for space)
additionally, blast2go uses newline(\n) to switch line and my local blast program in windows uses return & newline(\r\n), which I suspect to impact.

Last edited by hugh_hang; 01-20-2013 at 05:39 AM. Reason: format problem
hugh_hang is offline   Reply With Quote
Old 01-24-2013, 04:17 PM   #15
upendra_35
Senior Member
 
Location: USA

Join Date: Apr 2010
Posts: 102
Default

I got the same problem. When i tried to import the xml file that was blatsted against Arabidopsis TAIR10 cDNA reference, the mapping and annotation steps failed in blast2go software. So i guess it is something to do with the format of xml file generated using "nr" database and "TAIR10" database.
Does anybody know a way to modify this xml file so that i can import to blast2go?

Thanks
Upendra
upendra_35 is offline   Reply With Quote
Old 01-24-2013, 10:08 PM   #16
hugh_hang
Member
 
Location: Hangzhou, China

Join Date: Jan 2013
Posts: 28
Default

Quote:
Originally Posted by upendra_35 View Post
I got the same problem. When i tried to import the xml file that was blatsted against Arabidopsis TAIR10 cDNA reference, the mapping and annotation steps failed in blast2go software. So i guess it is something to do with the format of xml file generated using "nr" database and "TAIR10" database.
Does anybody know a way to modify this xml file so that i can import to blast2go?

Thanks
Upendra
I've tried changing format but it doesn't work. I guess we should use TAIR10_pep... for BLAST DBs which is protein files.
hugh_hang is offline   Reply With Quote
Old 01-24-2013, 11:13 PM   #17
upendra_35
Senior Member
 
Location: USA

Join Date: Apr 2010
Posts: 102
Default

Quote:
Originally Posted by hugh_hang View Post
I've tried changing format but it doesn't work. I guess we should use TAIR10_pep... for BLAST DBs which is protein files.
Sorry my bad.....i have used pep file and not cDNA file.
upendra_35 is offline   Reply With Quote
Old 01-25-2013, 10:08 AM   #18
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Quote:
Originally Posted by hugh_hang View Post
I've tried changing format but it doesn't work. I guess we should use TAIR10_pep... for BLAST DBs which is protein files.
Blast2GO is very particular about the format of the sequence ID used in FASTA files when you are creating custom BLAST databases. See this webpage for details about formatting your own BLAST DB for use with Blast2GO.

A second thing to consider is do the IDs in you custom BLAST DB ('myid' in the example on the webpage) match IDs in the Blast2GO database. If the IDs don't match, Blast2GO won't be able to map them. Even if you reformatted the TAIR FASTA like the example in the tutorial and put the AT number in the proper position I'm not sure that TAIR AT numbers are in the Blast2GO database.

A workaround is to use this file from the TAIR site: At_GB_refseq_prot.gz. It is the same protein set but uses the NCBI RefSeq IDs and GI numbers for the IDs. For example:
Code:
>gi|240256448|ref|NP_200529.4| PSD2 (phosphatidylserine decarboxylase 2); phosphatidylserine decarboxylase [Arabidopsis thaliana]
Format this file according to the tutorial. You are sacrificing the AT numbers but it's pretty much a guarantee that the NCBI GIs are in the Blast2GO database.
kmcarr is offline   Reply With Quote
Old 01-25-2013, 04:45 PM   #19
upendra_35
Senior Member
 
Location: USA

Join Date: Apr 2010
Posts: 102
Default

Quote:
Originally Posted by kmcarr View Post
Blast2GO is very particular about the format of the sequence ID used in FASTA files when you are creating custom BLAST databases. See this webpage for details about formatting your own BLAST DB for use with Blast2GO.

A second thing to consider is do the IDs in you custom BLAST DB ('myid' in the example on the webpage) match IDs in the Blast2GO database. If the IDs don't match, Blast2GO won't be able to map them. Even if you reformatted the TAIR FASTA like the example in the tutorial and put the AT number in the proper position I'm not sure that TAIR AT numbers are in the Blast2GO database.

A workaround is to use this file from the TAIR site: At_GB_refseq_prot.gz. It is the same protein set but uses the NCBI RefSeq IDs and GI numbers for the IDs. For example:
Code:
>gi|240256448|ref|NP_200529.4| PSD2 (phosphatidylserine decarboxylase 2); phosphatidylserine decarboxylase [Arabidopsis thaliana]
Format this file according to the tutorial. You are sacrificing the AT numbers but it's pretty much a guarantee that the NCBI GIs are in the Blast2GO database.
Thank you very much kmcarr, very useful tips. Can't wait to start my Blast2Go annotation now. Thanks again.
upendra_35 is offline   Reply With Quote
Reply

Tags
blast2go, databases

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:26 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO