SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to batch download thousands of FASTA files? J:Mo Bioinformatics 6 12-18-2014 02:00 PM
blastx against UniProt papori Bioinformatics 5 12-26-2012 06:50 PM
Learn how to use Blastx angeloulivieri Bioinformatics 4 07-09-2012 12:25 AM
Blastn or Blastx? kalu Bioinformatics 11 09-23-2011 03:18 PM
Help with BLASTX command z3199001 Bioinformatics 3 10-11-2010 07:51 PM

Reply
 
Thread Tools
Old 09-27-2012, 10:05 AM   #1
angeloulivieri
Member
 
Location: Italy

Join Date: Jul 2012
Posts: 30
Default Problems with blastx and thousands of equences

Hi all, I'm running a blastx in my server. Iw had works everytime but now for a large file of sequences is doing something strange.

The query is the following:

nohup ../ncbi-blast-2.2.27+/bin/blastx -db ../bin/data/uniprot_kb_2012_06.fasta -query 42000seq.fa -evalue 0.05 -max_target_seqs 5 -outfmt 5 -num_threads 10 -out ouBlastxXML &

where uniprot_kb_2012... is a dataset containing all the protein taken from ncbi.
42000seq.fa is a file containing 42thousands sequences in fasta format

the ouput I want is in XML...

I runned it one week ago and obtained a completely empty file!

Now with nohup it's writing this:

Selenocysteine (U) at position 73 replaced by X
Selenocysteine (U) at position 40 replaced by X
Selenocysteine (U) at position 52 replaced by X
Selenocysteine (U) at position 48 replaced by X
Selenocysteine (U) at position 37 replaced by X
Selenocysteine (U) at position 40 replaced by X
Selenocysteine (U) at position 40 replaced by X

...and other similar lines...

and the xml file is still empty...

What's happening?

The same command on a query of ten sequences works well.

Someone knows where can I been wrong?

bye and thanks
Angelo
angeloulivieri is offline   Reply With Quote
Old 09-27-2012, 11:03 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Don't know what is wrong but you best bet is to run Blast with a small group of sequences and then put all of the XML files together.
westerman is offline   Reply With Quote
Old 09-27-2012, 11:15 AM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

It seems BLAST+ is a bit silly with the XML output and doesn't write it out incrementally.

I personally split the input into batches of 1000 queries (works well for spreading the work over a cluster).
maubp is offline   Reply With Quote
Old 09-27-2012, 11:18 PM   #4
angeloulivieri
Member
 
Location: Italy

Join Date: Jul 2012
Posts: 30
Default

can the reason be the fact that I use an XML file to extract informations?

Last edited by angeloulivieri; 09-28-2012 at 12:24 AM.
angeloulivieri is offline   Reply With Quote
Old 09-28-2012, 12:59 AM   #5
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by angeloulivieri View Post
can the reason be the fact that I use an XML file to extract informations?
If you use the text or tabular output, you should see results written to the file while BLAST+ is running.
maubp is offline   Reply With Quote
Old 09-28-2012, 02:40 AM   #6
angeloulivieri
Member
 
Location: Italy

Join Date: Jul 2012
Posts: 30
Default

so only with these options? The 6,7 and 8...
With other types my output will be full only at the end of the computation?
angeloulivieri is offline   Reply With Quote
Old 09-28-2012, 02:51 AM   #7
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

I've only noticed a problem of delayed output with the XML output format.
maubp is offline   Reply With Quote
Old 09-28-2012, 04:07 AM   #8
angeloulivieri
Member
 
Location: Italy

Join Date: Jul 2012
Posts: 30
Default

Ok. The first time I used blastx with these 40thousands sequences it doesn't give me nothing and for me was very strange. Now I'm trying using a not-XML output file cause I need only to use some bioPerl functions to watch for results.

Thanks
angeloulivieri is offline   Reply With Quote
Old 09-28-2012, 04:50 AM   #9
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Quote:
Originally Posted by angeloulivieri View Post

Selenocysteine (U) at position 73 replaced by X
Selenocysteine (U) at position 40 replaced by X
Selenocysteine (U) at position 52 replaced by X
Selenocysteine (U) at position 48 replaced by X
Selenocysteine (U) at position 37 replaced by X
Selenocysteine (U) at position 40 replaced by X
Selenocysteine (U) at position 40 replaced by X
Hi-
Just for information, from http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml this is due to
Quote:
For protein code, U is replaced by X first before the search since it is not specified in any scoring matrices.
so this is nothing wrong. I don't know about the rest...

Best
Dario
dariober is offline   Reply With Quote
Old 09-28-2012, 06:26 AM   #10
angeloulivieri
Member
 
Location: Italy

Join Date: Jul 2012
Posts: 30
Default

Thank you Dario
angeloulivieri is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:24 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO