has anyone else experienced a server shut when running bwa align on more than 20 cores?
Unconfigured Ad
Collapse
X
-
jasonbcold,
Well this is a common thing I hear from many people doing analysis work on multi CPU/core machines. Well according me if the software is parallelized/optimized for shared/distributed memory architecture in the right way then it should not cup if you scale up! Let me know how are achieving the parallelism on many cores?? I think I can help you if you can share info on the setup (h/w and s/w)
Comment
-
-
Hi drio and geschickten,
the following bwa command works fine on our server:
bwa aln -e 5 -t 15 [ref.fa] [reads.fastq] > [alignment.sai]
but when we increase the processor number to 25 (of 32 cores):
bwa aln -e 5 -t 25 [ref.fa] [reads.fastq] > [alignment.sai]
the sever shuts down reproducably.
And here are the hardware software specifications:
sba@solexa:~$ uname -a
Linux solexa 2.6.30-2-amd64 #1 SMP Fri Sep 25 22:16:56 UTC 2009 x86_64 GNU/Linux
Distribution is Debian.
Machine consists of 8 Quad-Core AMD Opteron Processors 8380 (thus, a total of 32 cores).
Many hard discs attached to
RAID bus controller: 3ware Inc 9650SE SATA-II RAID PCIe (rev 01)
RAID bus controller: 3ware Inc 9690SA SAS/SATA-II RAID PCIe (rev 01)
that are configured as JBOD. The filesystem is ext4.
Comment
-
-
Hi jasonbcold,
I guess it's nothing to do with H/w; if you see the pthreads part of the BWA code you will understand.. I still have to dive deep into the code but at a high level I can guess the code does not guarantee to scale with many CPUs/cores; well that's why you have paradigms like OpenMP and MPI... anyways if you need professional help then we can customize the s/w for your needs... let me know if you are interested.
Comment
-
-
Our system is fairly similar:
Linux pipeline 2.6.18-128.el5_BITS_XFS #1 SMP Thu Sep 3 17:05:45 BST 2009 x86_64 x86_64 x86_64 GNU/Linux
16 cores Intel(R) Xeon(R) CPU X7350 @ 2.93GHz
32GB memory.
The crash also occurred during the bwa aln step when we were using 12 of the CPU.
Comment
-
-
Here's what's going on ...
BWA is built for speed. A good thing.
To make it fast, BWA skips error checking. Not good but if you *know* this, then you can deal with the problems.
I can grep for "malloc" in the source and the 2 lines after "malloc" (-A2) ...
___________________________________________________
-bash-3.00$ grep -A2 malloc *.c
bwape.c: z->a = (bwtint_t*)malloc(sizeof(bwtint_t) * z->n);
bwape.c- for (l = r->k; l <= r->l; ++l)
bwape.c- z->a[l - r->k] = r->a? bwt_sa(bwt[0], l) : bwt[1]->seq_len - (bwt_sa(bwt[1], l) + p[j]->len);
--
cs2nt.c: ta = (uint8_t*)malloc(len * 7);
cs2nt.c- nt_ref = ta;
cs2nt.c- cs_read = nt_ref + len;
--
is.c: } else if ((C = B = (int *) malloc(k * sizeof(int))) == NULL) return -2;
is.c- getCounts(T, C, n, k, cs);
is.c- getBuckets(C, B, k, 1); /* find ends of buckets */
--
is.c: } else if ((C = B = (int *) malloc(k * sizeof(int))) == NULL) return -2;
is.c- /* put all left-most S characters into their buckets */
is.c- getCounts(T, C, n, k, cs);
--
simple_dp.c: p->s = (unsigned char*)malloc(p->l + 1);
simple_dp.c- memcpy(p->s, seq->seq.s, p->l);
simple_dp.c- p->s[p->l] = 0;
--
stdaln.c: aa = (AlnAln*)malloc(sizeof(AlnAln));
stdaln.c- aa->path = 0;
stdaln.c- aa->out1 = aa->out2 = aa->outm = 0;
--
stdaln.c: dpcell = (dpcell_t**)malloc(sizeof(dpcell_t*) * (len2 + 1));
stdaln.c- for (j = 0; j <= len2; ++j)
stdaln.c: dpcell[j] = (dpcell_t*)malloc(sizeof(dpcell_t) * end);
stdaln.c- for (j = b2 + 1; j <= len2; ++j)
stdaln.c- dpcell[j] -= j - b2;
stdaln.c: curr = (dpscore_t*)malloc(sizeof(dpscore_t) * (len1 + 1));
stdaln.c: last = (dpscore_t*)malloc(sizeof(dpscore_t) * (len1 + 1));
stdaln.c-
stdaln.c- /* set first row */
--
stdaln.c: suba = (int*)malloc(sizeof(int) * (len2 + 1));
stdaln.c: eh = (NT_LOCAL_SCORE*)malloc(sizeof(NT_LOCAL_SCORE) * (len1 + 1));
stdaln.c: s_array = (int**)malloc(sizeof(int*) * N_MATRIX_ROW);
stdaln.c- for (i = 0; i != N_MATRIX_ROW; ++i)
stdaln.c: s_array[i] = (int*)malloc(sizeof(int) * len1);
stdaln.c- /* initialization */
stdaln.c- aln_init_score_array(seq1, len1, N_MATRIX_ROW, score_matrix, s_array);
--
stdaln.c: seq11 = (unsigned char*)malloc(sizeof(unsigned char) * len1);
stdaln.c: seq22 = (unsigned char*)malloc(sizeof(unsigned char) * len2);
stdaln.c: aa->path = (path_t*)malloc(sizeof(path_t) * (len1 + len2 + 1));
stdaln.c-
stdaln.c- if (ap->row < 10) { /* 4-nucleotide alignment */
--
stdaln.c: out1 = aa->out1 = (char*)malloc(sizeof(char) * (aa->path_len + 1));
stdaln.c: out2 = aa->out2 = (char*)malloc(sizeof(char) * (aa->path_len + 1));
stdaln.c: outm = aa->outm = (char*)malloc(sizeof(char) * (aa->path_len + 1));
stdaln.c-
stdaln.c- --seq1; --seq2;
--
stdaln.c: cigar = (uint32_t*)malloc(*n_cigar * 4);
stdaln.c-
stdaln.c- cigar[0] = 1u << 4 | path[path_len-1].ctype;
__________________________________________________________
Notice how the return value from malloc is not checked? If there's plenty of memory ... no problem. If you're running 8 bwas and some other users are doing other stuff and one of the input files has wierd stuff and .... memory usage spikes .. and suddenly there's no more memory: malloc fails and ... undefined.
Sad truth is, your 4 core, 8GB system can't always handle it.
When BWA locks up your system, just dial it back a little or try a bigger memoried box.
Similarly we can look at fread() function ...
_____
-bash-3.00$ grep -A3 fread *.c
bwape.c: fread(&n_aln, 4, 1, fp_sa[j]);
bwape.c- if (n_aln > kv_max(d->aln[j]))
bwape.c- kv_resize(bwt_aln1_t, d->aln[j], n_aln);
bwape.c- d->aln[j].n = n_aln;
bwape.c: fread(d->aln[j].a, sizeof(bwt_aln1_t), n_aln, fp_sa[j]);
bwape.c- kv_copy(bwt_aln1_t, buf[j][i].aln, d->aln[j]); // backup d->aln[j]
bwape.c- // generate SE alignment and mapping quality
bwape.c- bwa_aln2seq(n_aln, d->aln[j].a, p[j]);
--
bwape.c: fread(pacseq, 1, bns->l_pac/4+1, bns->fp_pac);
bwape.c- } else pacseq = (ubyte_t*)_pacseq;
bwape.c- if (!popt->is_sw || ii->avg < 0.0) return pacseq;
bwape.c-
--
bwape.c: fread(&opt, sizeof(gap_opt_t), 1, fp_sa[0]);
bwape.c- ks[0] = bwa_open_reads(opt.mode, fn_fa[0]);
bwape.c- opt0 = opt;
bwape.c: fread(&opt, sizeof(gap_opt_t), 1, fp_sa[1]); // overwritten!
bwape.c- ks[1] = bwa_open_reads(opt.mode, fn_fa[1]);
bwape.c- if (!(opt.mode & BWA_MODE_COMPREAD)) {
bwape.c- popt->type = BWA_PET_SOLID;
--
bwape.c: fread(pac, 1, bns->l_pac/4+1, bns->fp_pac);
bwape.c- }
bwape.c- }
bwape.c-
--
bwase.c: fread(ntpac, 1, ntbns->l_pac/4 + 1, ntbns->fp_pac);
bwase.c- }
bwase.c-
bwase.c- if (!_pacseq) {
--
bwase.c: fread(pacseq, 1, bns->l_pac/4+1, bns->fp_pac);
bwase.c- } else pacseq = _pacseq;
bwase.c- for (i = 0; i != n_seqs; ++i) {
bwase.c- bwa_seq_t *s = seqs + i;
--
bwase.c: fread(&opt, sizeof(gap_opt_t), 1, fp_sa);
bwase.c- if (!(opt.mode & BWA_MODE_COMPREAD)) // in color space; initialize ntpac
bwase.c- ntbns = bwa_open_nt(prefix);
bwase.c- bwa_print_sam_SQ(bns);
--
bwase.c: fread(&n_aln, 4, 1, fp_sa);
bwase.c- if (n_aln > m_aln) {
bwase.c- m_aln = n_aln;
bwase.c- aln = (bwt_aln1_t*)realloc(aln, sizeof(bwt_aln1_t) * m_aln);
--
bwase.c: fread(aln, sizeof(bwt_aln1_t), n_aln, fp_sa);
bwase.c- bwa_aln2seq_core(n_aln, aln, p, 1, n_occ);
bwase.c- }
bwase.c-
--
bwtio.c: fread(&primary, sizeof(bwtint_t), 1, fp);
bwtio.c- xassert(primary == bwt->primary, "SA-BWT inconsistency: primary is not the same.");
bwtio.c: fread(skipped, sizeof(bwtint_t), 4, fp); // skip
bwtio.c: fread(&bwt->sa_intv, sizeof(bwtint_t), 1, fp);
bwtio.c: fread(&primary, sizeof(bwtint_t), 1, fp);
bwtio.c- xassert(primary == bwt->seq_len, "SA-BWT inconsistency: seq_len is not the same.");
bwtio.c-
bwtio.c- bwt->n_sa = (bwt->seq_len + bwt->sa_intv) / bwt->sa_intv;
--
bwtio.c: fread(bwt->sa + 1, sizeof(bwtint_t), bwt->n_sa - 1, fp);
bwtio.c- fclose(fp);
bwtio.c-}
bwtio.c-
--
bwtio.c: fread(&bwt->primary, sizeof(bwtint_t), 1, fp);
bwtio.c: fread(bwt->L2+1, sizeof(bwtint_t), 4, fp);
bwtio.c: fread(bwt->bwt, 4, bwt->bwt_size, fp);
bwtio.c- bwt->seq_len = bwt->L2[4];
bwtio.c- fclose(fp);
bwtio.c- bwt_gen_cnt_table(bwt);
--
bwtmisc.c: fread(&c, 1, 1, fp);
bwtmisc.c- fclose(fp);
bwtmisc.c- return (pac_len - 1) * 4 + (int)c;
bwtmisc.c-}
--
bwtmisc.c: fread(buf2, 1, pac_size, fp);
bwtmisc.c- fclose(fp);
bwtmisc.c- memset(bwt->L2, 0, 5 * 4);
bwtmisc.c- buf = (ubyte_t*)calloc(bwt->seq_len + 1, 1);
--
bwtmisc.c: fread(bufin, 1, pac_len, fp);
bwtmisc.c- fclose(fp);
bwtmisc.c- for (i = seq_len - 1, j = 0; i >= 0; --i) {
bwtmisc.c- int c = bufin[i>>2] >> ((~i&3)<<1) & 3;
--
bwtmisc.c: fread(pac, 1, bns->l_pac/4+1, bns->fp_pac);
bwtmisc.c- rewind(bns->fp_pac);
bwtmisc.c- c1 = pac[0]>>6; cspac[0] = c1<<6;
bwtmisc.c- for (i = 1; i < bns->l_pac; ++i) {
--
bwtsw2_aux.c: fread(pac, 1, bns->l_pac/4+1, bns->fp_pac);
bwtsw2_aux.c- fp = xzopen(fn, "r");
bwtsw2_aux.c- ks = kseq_init(fp);
bwtsw2_aux.c- _seq = calloc(1, sizeof(bsw2seq_t));
___________
Note that there's no checking the return value from fread() Did it succeed? Not sure but the code assumes it does. Many many strange errors in BWA are because the user feeds bad input into it.
________ bottom line is this
1) if it locks, run on a bigger box
2) check you inputsLast edited by Richard Finney; 03-06-2012, 02:48 PM.
Comment
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
Yesterday, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
-
by SEQadmin2
Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.
Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...-
Channel: Articles
05-06-2026, 09:04 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, Yesterday, 12:03 PM
|
0 responses
19 views
0 reactions
|
Last Post
by SEQadmin2
Yesterday, 12:03 PM
|
||
|
Started by SEQadmin2, Yesterday, 11:40 AM
|
0 responses
14 views
0 reactions
|
Last Post
by SEQadmin2
Yesterday, 11:40 AM
|
||
|
Started by SEQadmin2, 05-28-2026, 11:40 AM
|
0 responses
29 views
0 reactions
|
Last Post
by SEQadmin2
05-28-2026, 11:40 AM
|
||
|
Started by SEQadmin2, 05-26-2026, 10:12 AM
|
0 responses
31 views
0 reactions
|
Last Post
by SEQadmin2
05-26-2026, 10:12 AM
|
Comment