SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to Demultiplex a Nextera paired-end MiSeq run allo Illumina/Solexa 6 02-27-2012 08:10 AM
set up TOPHAT run with paired end reads PFS Bioinformatics 1 03-08-2011 05:45 PM
3000 paired end library titration run sabrinaelias Bioinformatics 1 07-01-2010 12:29 PM
paired-end run failed in pipeline v1.3.2 ttkuaile Bioinformatics 1 04-17-2009 02:30 PM
Plasmid contamination in Long Tag Paired End library kmcarr 454 Pyrosequencing 3 03-11-2009 05:29 PM

Reply
 
Thread Tools
Old 01-10-2011, 04:13 AM   #1
agc
Member
 
Location: Jerusalem

Join Date: May 2010
Posts: 26
Default How long should paired-end alignment run?

I'm aligning Illumina paired-end reads using bwa for the first time. With short reads from the same sample (yeast) alignment took 2-3 hours, but now I've been waiting for results for 4 days. It seems to be progressing, but is this normal? Should it take this long? It keeps repeatedly printing a bunch of calculations and then hanging on "align unmapped mate...".
agc is offline   Reply With Quote
Old 01-10-2011, 01:11 PM   #2
Hena
Member
 
Location: Finland

Join Date: Nov 2009
Posts: 19
Default

Is it the bwa sampe step which lasts long? I would suggest switching on the -P option (which loads the index into memory). That speeds up the execution quite a lot. It does require 4-5Gb of memory though to run (if I remember correctly on human).
Hena is offline   Reply With Quote
Old 01-10-2011, 03:31 PM   #3
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

bwa c code is notorious for not checking the return call from functions. It's pretty happy to ignore a failure and keep trying. It's a nightmare to figure out what went wrong! Not a big deal though. The upside is it runs really fast.

It is likely one of several things happened:

1) you've provided invalid input - make sure stderr from a previous run did not sneak in to your sai file. Make sure a previous run is not "bwa > out.sai 2>&1 ", make sure it's "bwa > out.sai 2> out.err".

2) you've run out of memory

3) you're command line arguments are wrong.


Do a check of your input. Try running on a bigger machine, at least 8GB free memory.
Richard Finney is offline   Reply With Quote
Old 03-29-2011, 06:39 AM   #4
dp05yk
Member
 
Location: Brock University

Join Date: Dec 2010
Posts: 66
Default

I've found bwa can hang eternally when it incorrectly infers an insert size estimate. Unfortunately if you specify a maximum insert size with the -a parameter, BWA will ignore this parameter if its inferred isize function succeeds (in its own mind). I've once specified a maximum insert size of 5000 to have BWA turn around and tell me it's going with its own calculated value of 1483238... as a result, it is likely hanging during the "align unmapped mate" stage.

To get around this, you can use the -A parameter, which ignores BWA's isize estimates. However, this as a result disables SW alignment for unmapped mates.
dp05yk is offline   Reply With Quote
Old 03-31-2011, 10:20 AM   #5
elisadouzi
Member
 
Location: US

Join Date: Mar 2011
Posts: 20
Default

I also have this problem of infer the insert size and the sampe hang on the align unmapped mate. How can I fix it ?
Quote:
Originally Posted by dp05yk View Post
I've found bwa can hang eternally when it incorrectly infers an insert size estimate. Unfortunately if you specify a maximum insert size with the -a parameter, BWA will ignore this parameter if its inferred isize function succeeds (in its own mind). I've once specified a maximum insert size of 5000 to have BWA turn around and tell me it's going with its own calculated value of 1483238... as a result, it is likely hanging during the "align unmapped mate" stage.

To get around this, you can use the -A parameter, which ignores BWA's isize estimates. However, this as a result disables SW alignment for unmapped mates.
elisadouzi is offline   Reply With Quote
Old 03-31-2011, 10:43 AM   #6
dp05yk
Member
 
Location: Brock University

Join Date: Dec 2010
Posts: 66
Default

Quote:
Originally Posted by elisadouzi View Post
I also have this problem of infer the insert size and the sampe hang on the align unmapped mate. How can I fix it ?
I do believe this was already mentioned... see quoted below:

Quote:
To get around this, you can use the -A parameter, which ignores BWA's isize estimates. However, this as a result disables SW alignment for unmapped mates.
dp05yk is offline   Reply With Quote
Old 03-31-2011, 11:33 AM   #7
elisadouzi
Member
 
Location: US

Join Date: Mar 2011
Posts: 20
Default

Thanks. I already chose this -a 500 , but it still did not work.
Quote:
Originally Posted by dp05yk View Post
I've found bwa can hang eternally when it incorrectly infers an insert size estimate. Unfortunately if you specify a maximum insert size with the -a parameter, BWA will ignore this parameter if its inferred isize function succeeds (in its own mind). I've once specified a maximum insert size of 5000 to have BWA turn around and tell me it's going with its own calculated value of 1483238... as a result, it is likely hanging during the "align unmapped mate" stage.

To get around this, you can use the -A parameter, which ignores BWA's isize estimates. However, this as a result disables SW alignment for unmapped mates.
Quote:
Originally Posted by dp05yk View Post
I do believe this was already mentioned... see quoted below:
elisadouzi is offline   Reply With Quote
Old 03-31-2011, 11:54 AM   #8
dp05yk
Member
 
Location: Brock University

Join Date: Dec 2010
Posts: 66
Default

Hi elisadouzi,

-a is not the same as -A.

As mentioned in my post, if you specify an isize with -a, BWA will ignore it if it is able to estimate (in its own mind) a different value. This is why -a 500 isn't working for you - BWA is calculating a value much higher than 500 is overriding your specification with it.

Use the -A (not a, capitalized A) parameter, and it will ignore its isize estimate (and unmapped mate alignment altogether, unfortunately).
dp05yk is offline   Reply With Quote
Old 04-10-2011, 07:05 PM   #9
kbushley
Member
 
Location: Oregon

Join Date: Jan 2010
Posts: 22
Default

Hello,

I am also running into this problem and getting some curious results. Am running resequencing assemblies (mapping with bwa, assembly with Velvet columbus module) on two different 454 assemblies of the same genome, partly to test the quality of assembly. On one assembly, sampe it completes in 2-3 hours, the other, supplied with the exact same parameters etc. has been running 4 days and hanging on the align unmapped mate stage. It is giving me an insert size estimate of around 1Mb (&$&*!%!)...I doubt it's memory, i'm running on 16G and I just checked the input files which are ok, so I am beginning to think it may have something to do with the complexity of the problem/compatibility between the two datasets that is leading it to give crazy insert sizes and hang.

I just tried the -A option but bwa complains this is not a valid option...am I missing something or which version are you using/how do you supply this option? Does anyone else have any thoughts/experiences to explain this upo (unidentified phenomena output)?


best,

Kate
kbushley is offline   Reply With Quote
Old 04-10-2011, 07:08 PM   #10
dp05yk
Member
 
Location: Brock University

Join Date: Dec 2010
Posts: 66
Default

Hi,


I think -s might help... it disables Smith-Waterman mapping for unmapped mates, and this is likely what is taking so long. If -s doesn't help or doesn't exist as an option, try getting the latest version of BWA (0.5.9) from their sourceforge page.
dp05yk is offline   Reply With Quote
Old 04-10-2011, 08:03 PM   #11
kbushley
Member
 
Location: Oregon

Join Date: Jan 2010
Posts: 22
Default

Hi,

Thanks...the -a option seems to have solved the problem this time.
kbushley is offline   Reply With Quote
Old 09-07-2011, 01:31 AM   #12
crocea
Junior Member
 
Location: Los Angeles

Join Date: Feb 2011
Posts: 1
Default

"-a" is only used when bwa doesn' have enough pairs of reads (<20) to infer insert size. The bug above happens when bwa has enough pairs (>=20) and makes an unrealistic estimate of the insert size (like 1million+ above or ~210K in my example). And then the alignment of unmapped mates takes forever because it has to try SW in a very long window. So "-a" is irrelevant to this bug.

What i did is 1. increase the read batch size from 0x40000 (~250K) to 0x60000 (or more);2. increase that crucial 20 to 80 (or more). both changes aim to make sure bwa has enough pairs to make a good inference. here's the code diff:

Quote:
...:~/script/bio-bwa$ svn diff
Index: trunk/bwa/bwape.c
===================================================================
--- trunk/bwa/bwape.c (revision 51)
+++ trunk/bwa/bwape.c (working copy)
@@ -107,7 +107,9 @@
if (p[0]->len > max_len) max_len = p[0]->len;
if (p[1]->len > max_len) max_len = p[1]->len;
}
- if (tot < 20) {
+ // 2011-9-6 yh increase the number below from 20 to 80 to make it less likely that super-big insert sizes would be inferred.
+ int minNoOfValidPairs = 80;
+ if (tot < minNoOfValidPairs) {
fprintf(stderr, "[infer_isize] fail to infer insert size: too few good pairs\n");
free(isizes);
return -1;
@@ -691,12 +693,14 @@
// core loop
bwa_print_sam_SQ(bns);
bwa_print_sam_PG();
- while ((seqs[0] = bwa_read_seq(ks[0], 0x40000, &n_seqs, opt.mode & BWA_MODE_COMPREAD, opt.trim_qual)) != 0) {
+ // 2011-9-6 increase the number below from 0x40000 to 0x60000 to make it less likely that super-big insert sizes would be inferred.
+ int n_needed = 0x60000;
+ while ((seqs[0] = bwa_read_seq(ks[0], n_needed, &n_seqs, opt.mode & BWA_MODE_COMPREAD, opt.trim_qual)) != 0) {
int cnt_chg;
isize_info_t ii;
ubyte_t *pacseq;

- seqs[1] = bwa_read_seq(ks[1], 0x40000, &n_seqs, opt.mode & BWA_MODE_COMPREAD, opt.trim_qual);
+ seqs[1] = bwa_read_seq(ks[1], n_needed, &n_seqs, opt.mode & BWA_MODE_COMPREAD, opt.trim_qual);
tot_seqs += n_seqs;
t = clock();
crocea is offline   Reply With Quote
Reply

Tags
bwa, paired-end

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:11 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO