View Single Post
Old 08-12-2010, 03:36 AM   #5
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

One more possible cookie crumb: the end of log files for hung runs generally end with either (highlighting mine)

[infer_isize] (25, 50, 75) percentile: (21322, 49144, 77320)
[infer_isize] low and high boundaries: 36 and 189316 for estimating avg and std
[infer_isize] inferred external isize from 21 pairs: 46521.000 +/- 26509.447
[infer_isize] skewness: 0.214; kurtosis: -0.983; ap_prior: 1.00e-05
[infer_isize] inferred maximum insert size: 207433 (6.07 sigma)
[bwa_sai2sam_pe_core] time elapses: 72.34 sec
[bwa_sai2sam_pe_core] changing coordinates of 3124 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...


OR

[infer_isize] fail to infer insert size: too few good pairs
[bwa_sai2sam_pe_core] time elapses: 77.51 sec
[bwa_sai2sam_pe_core] changing coordinates of 3054 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...


Which suggests that for some reason a have a large patch of sequences which don't align & are confusing the insert size calculator. However, it must be said that in failed runs there are spots like this that it gets through (but perhaps very slowly; I am letting a run go for several days extra over the weekend just to see if it ever exits).

Oddly, every time I've tried to split a file into two parts they both complete in reasonable time, even when the breakpoint is near where the full run fails.

I am curious why bwa sampe is recomputing the insert size distribution so many times -- it would be surprising if that varied through a run (but then again, I'm surprised to find such a big stretch of fragments that don't imply one). Perhaps failed infer_isize batches should cause the reuse of a previously computed batch?

I think I'll gin up some courage soon to look at the source code & perhaps even try the above suggestion.
krobison is offline   Reply With Quote