Seqanswers Leaderboard Ad

**raela** · 08-10-2010, 04:49 AM

I've had sampe hang before when the pairs were not lined up correctly in the two files. Since splitting fixes your issue, this is probably not what's going on, but it doesn't hurt to check.

**PeteH** · 08-10-2010, 04:09 PM

I'm having similar problems, but with samse. The aln step works fine for me, but samse hangs for tens of hours and only writes [bwa_read_seq] 0.0% bases are trimmed. to my *.sam file.

Prior to running BWA, my pipeline includes a python script to convert the Illumina quality scores to phred33, and a perl script to trim reads that contain adapter sequence. I have also tried it without running my perl trimming script, but samse still hangs.

Any advice on what else I can try to identify the problem is much appreciated.

**krobison** · 08-10-2010, 08:11 PM

Thanks for the feedback -- that does give me the idea of trying bwa samse on each original file to see if none, one or both of the paired end files causes trouble.

I'll also re-check that the two files have the same ids in the same order.

THANKS!!

**krobison** · 08-12-2010, 03:36 AM

One more possible cookie crumb: the end of log files for hung runs generally end with either (highlighting mine)

[infer_isize] (25, 50, 75) percentile: (21322, 49144, 77320)
[infer_isize] low and high boundaries: 36 and 189316 for estimating avg and std
[infer_isize] inferred external isize from 21 pairs: 46521.000 +/- 26509.447
[infer_isize] skewness: 0.214; kurtosis: -0.983; ap_prior: 1.00e-05
[infer_isize] inferred maximum insert size: 207433 (6.07 sigma)
[bwa_sai2sam_pe_core] time elapses: 72.34 sec
[bwa_sai2sam_pe_core] changing coordinates of 3124 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...

OR

[infer_isize] fail to infer insert size: too few good pairs
[bwa_sai2sam_pe_core] time elapses: 77.51 sec
[bwa_sai2sam_pe_core] changing coordinates of 3054 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...

Which suggests that for some reason a have a large patch of sequences which don't align & are confusing the insert size calculator. However, it must be said that in failed runs there are spots like this that it gets through (but perhaps very slowly; I am letting a run go for several days extra over the weekend just to see if it ever exits).

Oddly, every time I've tried to split a file into two parts they both complete in reasonable time, even when the breakpoint is near where the full run fails.

I am curious why bwa sampe is recomputing the insert size distribution so many times -- it would be surprising if that varied through a run (but then again, I'm surprised to find such a big stretch of fragments that don't imply one). Perhaps failed infer_isize batches should cause the reuse of a previously computed batch?

I think I'll gin up some courage soon to look at the source code & perhaps even try the above suggestion.

**modi2020** · 02-12-2013, 04:17 PM

Hi Krobison,

I am having exactly the same problem you were having.
Did you get to know how to solve it?

Thank you

Originally posted by krobison View Post

One more possible cookie crumb: the end of log files for hung runs generally end with either (highlighting mine)

[infer_isize] (25, 50, 75) percentile: (21322, 49144, 77320)
[infer_isize] low and high boundaries: 36 and 189316 for estimating avg and std
[infer_isize] inferred external isize from 21 pairs: 46521.000 +/- 26509.447
[infer_isize] skewness: 0.214; kurtosis: -0.983; ap_prior: 1.00e-05
[infer_isize] inferred maximum insert size: 207433 (6.07 sigma)
[bwa_sai2sam_pe_core] time elapses: 72.34 sec
[bwa_sai2sam_pe_core] changing coordinates of 3124 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...

OR

[infer_isize] fail to infer insert size: too few good pairs
[bwa_sai2sam_pe_core] time elapses: 77.51 sec
[bwa_sai2sam_pe_core] changing coordinates of 3054 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...

Which suggests that for some reason a have a large patch of sequences which don't align & are confusing the insert size calculator. However, it must be said that in failed runs there are spots like this that it gets through (but perhaps very slowly; I am letting a run go for several days extra over the weekend just to see if it ever exits).

Oddly, every time I've tried to split a file into two parts they both complete in reasonable time, even when the breakpoint is near where the full run fails.

I am curious why bwa sampe is recomputing the insert size distribution so many times -- it would be surprising if that varied through a run (but then again, I'm surprised to find such a big stretch of fragments that don't imply one). Perhaps failed infer_isize batches should cause the reuse of a previously computed batch?

I think I'll gin up some courage soon to look at the source code & perhaps even try the above suggestion.

**mediator** · 02-13-2013, 01:57 PM

OP, bwa sampe is a very slow step. It took my cluster more than two days to convert the two sai files into one sam file. My reads are 100 million in size.

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 33 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 48 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

bwa sampe hanging

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News