SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bwa sampe error Mali Salmon Bioinformatics 14 10-27-2014 11:25 AM
bwa sampe very slow natpokah Bioinformatics 25 08-13-2013 10:18 AM
BWA sampe problem patel Bioinformatics 9 10-24-2011 05:11 AM
bwa sampe problem sszelinger Bioinformatics 3 06-15-2011 05:34 AM
bwa sampe 0.5.7 error? rcorbett Bioinformatics 2 04-22-2010 07:13 AM

Reply
 
Thread Tools
Old 08-09-2010, 09:48 AM   #1
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default bwa sampe hanging

I'll apologize for asking what amounts to a pre-question; I know I really don't have a complete description to the problem but I'm a little stumped how to get some more useful descriptive information.

I am running bwa (0.5.8a) on 4 GAIIx lanes of paired-end human sequence. The machine is a 64bit x86 machine with 32Gb of RAM and running Oracle Enterprise Linux (aka Red Hat Enterprise Linux 5, a sore subject, but the state of things).

Generation of the .sai files using "bwa align" works fine, but for three out of the four lanes, the program is hanging during the "bwa sampe" stage. As far as I can tell, it will stay running for hours with no further output.

If I split the FASTQ files at about the last sequence which is output, then the program completes for each fraction and I can merge the alignment SAM files with samtools -- but it's definitely an extra step I'd prefer to avoid. But that does suggest that it isn't a simple aberrant FASTQ entry which is the trigger.

Any suggestions for further info I should spelunk that would be useful for troubleshooting this? Is there a good way to determine whether the .sai files are somehow corrupt? Anyone seen something (an odd character?) in a FASTQ file which can sometimes be troublesome?

thanks in advance
krobison is offline   Reply With Quote
Old 08-10-2010, 04:49 AM   #2
raela
Member
 
Location: Ithaca, NY

Join Date: Apr 2010
Posts: 39
Default

I've had sampe hang before when the pairs were not lined up correctly in the two files. Since splitting fixes your issue, this is probably not what's going on, but it doesn't hurt to check.
raela is offline   Reply With Quote
Old 08-10-2010, 04:09 PM   #3
PeteH
Member
 
Location: Melbourne

Join Date: Jun 2010
Posts: 64
Default

I'm having similar problems, but with samse. The aln step works fine for me, but samse hangs for tens of hours and only writes [bwa_read_seq] 0.0% bases are trimmed. to my *.sam file.

Prior to running BWA, my pipeline includes a python script to convert the Illumina quality scores to phred33, and a perl script to trim reads that contain adapter sequence. I have also tried it without running my perl trimming script, but samse still hangs.

Any advice on what else I can try to identify the problem is much appreciated.
PeteH is offline   Reply With Quote
Old 08-10-2010, 08:11 PM   #4
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Thanks for the feedback -- that does give me the idea of trying bwa samse on each original file to see if none, one or both of the paired end files causes trouble.

I'll also re-check that the two files have the same ids in the same order.

THANKS!!
krobison is offline   Reply With Quote
Old 08-12-2010, 03:36 AM   #5
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

One more possible cookie crumb: the end of log files for hung runs generally end with either (highlighting mine)

[infer_isize] (25, 50, 75) percentile: (21322, 49144, 77320)
[infer_isize] low and high boundaries: 36 and 189316 for estimating avg and std
[infer_isize] inferred external isize from 21 pairs: 46521.000 +/- 26509.447
[infer_isize] skewness: 0.214; kurtosis: -0.983; ap_prior: 1.00e-05
[infer_isize] inferred maximum insert size: 207433 (6.07 sigma)
[bwa_sai2sam_pe_core] time elapses: 72.34 sec
[bwa_sai2sam_pe_core] changing coordinates of 3124 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...


OR

[infer_isize] fail to infer insert size: too few good pairs
[bwa_sai2sam_pe_core] time elapses: 77.51 sec
[bwa_sai2sam_pe_core] changing coordinates of 3054 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...


Which suggests that for some reason a have a large patch of sequences which don't align & are confusing the insert size calculator. However, it must be said that in failed runs there are spots like this that it gets through (but perhaps very slowly; I am letting a run go for several days extra over the weekend just to see if it ever exits).

Oddly, every time I've tried to split a file into two parts they both complete in reasonable time, even when the breakpoint is near where the full run fails.

I am curious why bwa sampe is recomputing the insert size distribution so many times -- it would be surprising if that varied through a run (but then again, I'm surprised to find such a big stretch of fragments that don't imply one). Perhaps failed infer_isize batches should cause the reuse of a previously computed batch?

I think I'll gin up some courage soon to look at the source code & perhaps even try the above suggestion.
krobison is offline   Reply With Quote
Old 02-12-2013, 03:17 PM   #6
modi2020
Member
 
Location: New York

Join Date: May 2012
Posts: 22
Default

Hi Krobison,

I am having exactly the same problem you were having.
Did you get to know how to solve it?

Thank you
Quote:
Originally Posted by krobison View Post
One more possible cookie crumb: the end of log files for hung runs generally end with either (highlighting mine)

[infer_isize] (25, 50, 75) percentile: (21322, 49144, 77320)
[infer_isize] low and high boundaries: 36 and 189316 for estimating avg and std
[infer_isize] inferred external isize from 21 pairs: 46521.000 +/- 26509.447
[infer_isize] skewness: 0.214; kurtosis: -0.983; ap_prior: 1.00e-05
[infer_isize] inferred maximum insert size: 207433 (6.07 sigma)
[bwa_sai2sam_pe_core] time elapses: 72.34 sec
[bwa_sai2sam_pe_core] changing coordinates of 3124 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...


OR

[infer_isize] fail to infer insert size: too few good pairs
[bwa_sai2sam_pe_core] time elapses: 77.51 sec
[bwa_sai2sam_pe_core] changing coordinates of 3054 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...


Which suggests that for some reason a have a large patch of sequences which don't align & are confusing the insert size calculator. However, it must be said that in failed runs there are spots like this that it gets through (but perhaps very slowly; I am letting a run go for several days extra over the weekend just to see if it ever exits).

Oddly, every time I've tried to split a file into two parts they both complete in reasonable time, even when the breakpoint is near where the full run fails.

I am curious why bwa sampe is recomputing the insert size distribution so many times -- it would be surprising if that varied through a run (but then again, I'm surprised to find such a big stretch of fragments that don't imply one). Perhaps failed infer_isize batches should cause the reuse of a previously computed batch?

I think I'll gin up some courage soon to look at the source code & perhaps even try the above suggestion.
modi2020 is offline   Reply With Quote
Old 02-13-2013, 12:57 PM   #7
mediator
Member
 
Location: New England

Join Date: Nov 2010
Posts: 27
Default

OP, bwa sampe is a very slow step. It took my cluster more than two days to convert the two sai files into one sam file. My reads are 100 million in size.
mediator is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO