hello,
i do chip-seq on illumina and very much appreciate information in this forum. this is my first post and i would like to ask you possible ways to speed up bwa.
currently, i use a workstation with xeon e5420 (4 core) x 2 and 24 gb memory. it takes about 3 hours to alingn sinle lane reads from GA to human genome during which cpu usage remains 100% for all cores.
i have seen attempts to speed up bwa by cuda (called barracuda) by a cambridge group. interestingly, they show that 4 or 8 cpu cores do not make much difference in run time. performance of their cuda version aligner was not much different from that of bwa with 4-8 cpu cores.
how much difference do you predict it will make if i
1. use fast ssd drives (sata3, raid arrays) instead of hdd, because access to huge sequence data might become a bottleneck
2. run barracuda on nvidia tesla S2050 which has 1792 cuda cores or even on massively parallel supercomputer (with necessary optimization of the algorithm), if sequence alignment tasks can be effectively broken up into thausands of parallel processes
3. optimize bwa algorithm to make use of memories up to 64GB or more and cpu powers (multi-threading) up to 48 or more cores/threads. it is clear that multi-threading does not speed up things in the current bwa. howeve, i guess it must be possible to assign each 24 core/thread of cpu with individual chromosomes or long/short arms for instance.
any suggestions would be mostly appreciated,
i do chip-seq on illumina and very much appreciate information in this forum. this is my first post and i would like to ask you possible ways to speed up bwa.
currently, i use a workstation with xeon e5420 (4 core) x 2 and 24 gb memory. it takes about 3 hours to alingn sinle lane reads from GA to human genome during which cpu usage remains 100% for all cores.
i have seen attempts to speed up bwa by cuda (called barracuda) by a cambridge group. interestingly, they show that 4 or 8 cpu cores do not make much difference in run time. performance of their cuda version aligner was not much different from that of bwa with 4-8 cpu cores.
how much difference do you predict it will make if i
1. use fast ssd drives (sata3, raid arrays) instead of hdd, because access to huge sequence data might become a bottleneck
2. run barracuda on nvidia tesla S2050 which has 1792 cuda cores or even on massively parallel supercomputer (with necessary optimization of the algorithm), if sequence alignment tasks can be effectively broken up into thausands of parallel processes
3. optimize bwa algorithm to make use of memories up to 64GB or more and cpu powers (multi-threading) up to 48 or more cores/threads. it is clear that multi-threading does not speed up things in the current bwa. howeve, i guess it must be possible to assign each 24 core/thread of cpu with individual chromosomes or long/short arms for instance.
any suggestions would be mostly appreciated,
Comment