Hello bfast experts,
I have split the output of AB SOLiD reads into different "reads.j.fastq" files for a speedy parallel processing. Each fastq file ~ 100MB.
I would really like your help now to resolve an ambiguity in analysis time of the independent bfast jobs. This analysis refers to PART-B of my pervious post.
Some jobs have converged with final outputs (called *.sam files) in < 5hrs (one of them as little as 1.5 hrs).
Some jobs seem to be "progressing" much slowly - walltime is nearing 24hrs and its stuck in "bfast postprocess" step. Steps "bfast match" and "bfast localalign" have completed. The output *.sam file size is indeed incrementing slowly. I am concerned about the 5-20 fold diversity in the time duration for results to converge. The jobs are all running on single cores ( I have no choice there - it a matter of principle) - housed at central facility hosting hundreds of uniform cores. So there is uniformity of hardware on the compute nodes.
Is the diversity in computation a cause of concern indicating a poor reads library preparation or is this the norm .. sometimes results converge after many more iterations than they would otherwise ! It could be stochastic .. Can one implement a flag in bfast postprocess that can speed up computation - AND also use the color space information. I prefer not to compromise on the accuracy of aligning the reads ..
Hope you can please help,
Thanks very much,
a bfast analyzer.
I have split the output of AB SOLiD reads into different "reads.j.fastq" files for a speedy parallel processing. Each fastq file ~ 100MB.
I would really like your help now to resolve an ambiguity in analysis time of the independent bfast jobs. This analysis refers to PART-B of my pervious post.
Some jobs have converged with final outputs (called *.sam files) in < 5hrs (one of them as little as 1.5 hrs).
Some jobs seem to be "progressing" much slowly - walltime is nearing 24hrs and its stuck in "bfast postprocess" step. Steps "bfast match" and "bfast localalign" have completed. The output *.sam file size is indeed incrementing slowly. I am concerned about the 5-20 fold diversity in the time duration for results to converge. The jobs are all running on single cores ( I have no choice there - it a matter of principle) - housed at central facility hosting hundreds of uniform cores. So there is uniformity of hardware on the compute nodes.
Is the diversity in computation a cause of concern indicating a poor reads library preparation or is this the norm .. sometimes results converge after many more iterations than they would otherwise ! It could be stochastic .. Can one implement a flag in bfast postprocess that can speed up computation - AND also use the color space information. I prefer not to compromise on the accuracy of aligning the reads ..
Hope you can please help,
Thanks very much,
a bfast analyzer.
Comment