For our usecase we intend to load Novoalign with the reference genomes for ALL Bacteria, Human, Viral, Fungi, Mouse, Honey Bee, Tazmanian Devil, Gorilla, Microbat, Megabat, Chicken and Swine.
After playing for a couple of hours with novoalign I was left with the impression that every specie should have a separate index file - that is 1 index for a particular virus which currently results in 2419 index files so after adding all the bacterial/archaeal and reamining species I will probably have around 10k or even more index files which basically would mean that if I have an arbitrary sequence which I want to align (without knowledge of what might be in it) I will have to invoke novoalign N times where N is the number of index files created?
If this is the case I was wondering if it is possible to multiplex all my source sequences into a single index file BUT retain the ability to identify to what specie/organism my sequence has mapped or this can be done purely by having separate species/organisms in separate index files?
After playing for a couple of hours with novoalign I was left with the impression that every specie should have a separate index file - that is 1 index for a particular virus which currently results in 2419 index files so after adding all the bacterial/archaeal and reamining species I will probably have around 10k or even more index files which basically would mean that if I have an arbitrary sequence which I want to align (without knowledge of what might be in it) I will have to invoke novoalign N times where N is the number of index files created?
If this is the case I was wondering if it is possible to multiplex all my source sequences into a single index file BUT retain the ability to identify to what specie/organism my sequence has mapped or this can be done purely by having separate species/organisms in separate index files?
Comment