Seqanswers Leaderboard Ad

**maubp** · 11-18-2010, 06:18 AM

Just a minor point, you can indeed run NCBI "legacy" standalone BLAST like this:

Code:

blastall -p blastn ...

If you want to use the "new" standalone BLAST+ it would be:

Code:

blastn ...

As to the fact you are getting an almost empty result file, this is probably due to using different settings compared to the web blast. Check things like the gap parameters, evalue threshold, and so on.

**Anders Myrvold Dahl** · 11-18-2010, 08:35 AM

To be more precise; the short output file that is produced only contains

BLASTN 2.2.24 [Aug-08-2010]

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= contig00001 length=3139 numreads=1128
(3139 letters)

So it seems the query is only the first contig of the fasta file, which contains >100 contigs. I need to get all the contigs to be processed.

So basically there's zero output, and the computational time is very brief.
Obviously, I'm doing something incorrectly.

Don't know if adjusting the evalue or gap score would do anything here.

Also should I go with blast+ instead of legacy?

Sorry if I'm asking obvious ?'s, but I've googled my butt off the lately, and there seems to be little info to be found.

Also, am I using the right blast program?
I'm supposed to run the nucleotide data against the nr database.
Seeing the nr database is a protein database I should be running blastx?
Only when I did the search using webblast getting ample results, I was using nucleotide blast (i.e. blastn)...

**maubp** · 11-18-2010, 08:46 AM

Originally posted by Anders Myrvold Dahl View Post

To be more precise; the short output file that is produced only contains

BLASTN 2.2.24 [Aug-08-2010]

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= contig00001 length=3139 numreads=1128
(3139 letters)

That looks truncated - you'd normally then get some matches or it would say "no hits", then the next queries, and a footer at the end.

There were no error messages? This is odd - but see below.

Originally posted by Anders Myrvold Dahl View Post

Also should I go with blast+ instead of legacy?

I would certainly recommend you try it. The NCBI are (I think) currently still supporting legacy BLAST, but only in the short term. You'll have to switch to BLAST+ at some point, so it would be sensible to do it now.

Originally posted by Anders Myrvold Dahl View Post

Also, am I using the right blast program?
I'm supposed to run the nucleotide data against the nr database.
Seeing the nr database is a protein database I should be running blastx?
Only when I did the search using webblast getting ample results, I was using nucleotide blast (i.e. blastn)...

Yes, use blastx -- blastn is for nucleotide query against nucleotide database. There is a nice summary of the different blast programs here:

BLAST: Basic Local Alignment Search Tool

http://blast.ncbi.nlm.nih.gov/Blast.cgi

The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

**Anders Myrvold Dahl** · 11-18-2010, 12:27 PM

I've tried running both blastn and blastx from the blast+ package against the nr database now.

Seems blastn gives me an indexing error ( because the database is proteins?).

Blastx executes, but nothing happens.

I.e. I have to ctrl+c to break the process. No output neither in the command window or in the output file.

And yes, I have tried letting the process run for a while...

**cascoamarillo** · 11-18-2010, 01:37 PM

Hi,

Don't you get in the output the database you're using after the query:

Database: genome.fa
139,530 sequences; 107,332,603 total letters

Maybe is the path to the database...

**Anders Myrvold Dahl** · 11-19-2010, 01:30 AM

I'm pretty confident the database path is correct.

The database should also be blast-formatted; i.e. I've downloaded the nr.00.tar.gz, etc. archives from the ftp://ftp.ncbi.nlm.nih.gov/blast/db/ site.

I've run blastdbcheck and get the following output:

Writing messages to file (test.txt) at verbosity (Summary)
ISAM testing is ENABLED.
Legacy testing is DISABLED.
By default, testing 200 randomly sampled OIDs.

Testing 5 volume(s).
/home/andersmy/Blast/db/nr.00 / MetaData: [ERROR] caught exception.
/home/andersmy/Blast/db/nr.01 / MetaData: [ERROR] caught exception.
/home/andersmy/Blast/db/nr.02 / MetaData: [ERROR] caught exception.
/home/andersmy/Blast/db/nr.03 / MetaData: [ERROR] caught exception.
/home/andersmy/Blast/db/nr.04 / MetaData: [ERROR] caught exception.
Result=FAILURE. 5 errors reported in 5 volume(s).
Testing 1 alias(es).
Result=SUCCESS. No errors reported for 1 alias(es).

Total errors: 5

Is there something wrong with the database that makes blastx crash?

blastx -query Oppgave/trh52.fna -db Blast/db/nr -out result.txt

Writes the result.txt to disk, there is no command window output, and the command window freezes.

**maubp** · 11-19-2010, 02:02 AM

Five errors from the five chunks of the NR database -- something is messed up

Can you also download the nr.*.md5 files and use the md5sum command line tool to verify the nr.*.tar.gz files downloaded correctly? They are just tiny little text files which contain a list of md5 checksums and filenames. e.g. "md5sum --check nr.00.tar.gz.md5" should calculate the md5 checksum for nr.00.tar.gz, and thus spot if it was corrupted on download.

**Anders Myrvold Dahl** · 11-19-2010, 09:41 AM

I've downloaded the nr.0*.tar.gz files once more as well as the md5 files, and reinstalled the database files.

I've performed the md5sum --check on all files and they're all ok.

Still I get the same error message from blastdbcheck after extracting these archives to my database directory.

And when I run blastx with the nr database, again the command interface just freezes.

I've tested downloading another nucleotide fasta file from NCBI, and blastx still freezes, so the input should not be to blame here. So somehow there's something funky with the database...

**maubp** · 11-19-2010, 09:44 AM

Hmm. Have you tried another database? e.g. the NCBI vector nucleotide database is very small.

**Anders Myrvold Dahl** · 11-21-2010, 05:19 AM

I've run blastn successfully with my Fasta files using the vector database.

blastn checks all the contigs in my fasta file against the vector database and produces a smooth output file!

I've been told to use the non-redundant one though, and more importantly; I've to assess which of the hits are probable contamination, and not horizontal gene transfer.

I'm pretty blank as to how to discern these two. But I was told that any eucaryotic matches would highly likely be contamination of the E.coli strains.

Perhaps I should start a new thread regarding the contamination issue?

Or any good sources I should check out on the web?

Also, seeing theres >100 contigs in each file, is there an easy way to make a truncated list with only the best hits in each contig based on some conditions, say only eucaryotic genome?

**maubp** · 11-21-2010, 06:39 AM

It is good that blastn worked with the small NCBI provided vector database. That seems to confirm your installation of BLAST+ is OK.

My guess is that your machine does not have enough RAM to do a search against a large database like NR.

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 59 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

BLAST contamination search help

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News