View Single Post
Old 04-05-2015, 01:45 PM   #6
tomc
Member
 
Location: Oregon

Join Date: Feb 2011
Posts: 29
Default

quokka we will not know your cluster's file system or network setup but in general you want the data and the search to be as close as possible, and you want to reuse the data on hand as much as possible.

note that your five 1k query sequences are insignificant compared with the 22) 1GB blast databases
so blasting all 5 sequences against whichever shard of the nt_fa was in memory is preferable.

Blasting your sequences against each of the 22 shards can happen concurrently.

Being able to access the nt_fa database from a node, is not the same as that data being local to the node. This may mean you will see a speed up by first copying the blast database to a disk on the node perhaps a "scratch" disk. (see your cluster documentation or sysadmin about )

In summary

launch 22 jobs each which pull a 1GB database shard
(into local storage if possible to avoid refetching next query)
run a search for each of your query sequences against the local shard
combines the results

Last edited by tomc; 04-05-2015 at 01:56 PM.
tomc is offline   Reply With Quote