View Single Post
Old 07-15-2008, 04:31 AM   #9
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Hi Li Heng,
Thanks for your kind comments.
Performance slows on larger genomes as more possible alignment locations are evaluated for each read. Additional memory helps here as it makes the index more specific and while it can be run on an 8GB RAM server (Full Human) a 16G or 32G server is going to be 4 or 5 times faster.
With regard multithreading the index is memory mapped and it's quite possible to run multiple copies of novoalign (same target genome) without any increase in memory. That said multithreading wouldn't be too difficult as search classes are all designed to handle it. I need to see if there is a real demand.
The quality calculation is similar in principle to maq, it is Bayesian Posterior probability that the alignment is wrong. Some factors are estimated and one possible problem is that I rate the reference genome at 2bits of entropy/base, this may be the cause of the high qualities.

I deliberately haven't done SOLID as I'd like to it properly or not at all. That said, if someone wants to try I suggest converting the reference genome to colour space rather than the reads to nucleotide space.
sparks is offline   Reply With Quote