SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bwa sampe very slow natpokah Bioinformatics 25 08-13-2013 11:18 AM
bwa sampe hanging krobison Bioinformatics 6 02-13-2013 01:57 PM
BWA sampe shows extremely large insert size oiiio Bioinformatics 7 12-26-2011 02:22 PM
samtools sort running extremely slow tsucheta Bioinformatics 2 06-11-2010 07:30 AM
bwa sampe 0.5.7 error? rcorbett Bioinformatics 2 04-22-2010 08:13 AM

Reply
 
Thread Tools
Old 04-06-2012, 06:50 PM   #1
caswater
Member
 
Location: Canada

Join Date: Jun 2011
Posts: 47
Default help~~bwa sampe extremely slow~!!

Hi, just met the situation, and not sure if it is normal.

We used 100bp pair-end and HiSeq generated ~90 million reads. However, when using bwa to map the reads onto the human reference genome, it has taken one whole day, and only ~9 million reads been mapped with bwa sampe command, which was also piped with samtools view to convert sam to bam.

I checked the log files, and everything seemed normal, and it kept reporting the progress and also the issize.... but it seemed way too slow, and I have no idea about it... is it normal?

any advice will be highly appreciated. Thanks!
caswater is offline   Reply With Quote
Old 04-06-2012, 11:00 PM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

What insert sizes does sampe report that it sees?

You could try doing samse on each half, and eyeballing the sam files, to see if the pairs look right
swbarnes2 is offline   Reply With Quote
Old 04-07-2012, 12:00 AM   #3
caswater
Member
 
Location: Canada

Join Date: Jun 2011
Posts: 47
Default

the insert site seems alright, just around 300-400 bp... but it is really slow.. do ya know if there is anything wrong with it?
Thanks so much~

Here is an examplar output -
[infer_isize] (25, 50, 75) percentile: (313, 337, 363)
[infer_isize] low and high boundaries: 213 and 463 for estimating avg and std
[infer_isize] inferred external isize from 214684 pairs: 337.152 +/- 40.108
[infer_isize] skewness: -0.131; kurtosis: 0.117; ap_prior: 2.61e-05
[infer_isize] inferred maximum insert size: 616 (6.94 sigma)
[bwa_sai2sam_pe_core] time elapses: 19.58 sec
[bwa_sai2sam_pe_core] changing coordinates of 7276 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...
[bwa_paired_sw] 7091 out of 7430 Q17 singletons are mated.
[bwa_paired_sw] 1969 out of 3915 Q17 discordant pairs are fixed.
[bwa_sai2sam_pe_core] time elapses: 5.56 sec
[bwa_sai2sam_pe_core] refine gapped alignments... 0.82 sec
[bwa_sai2sam_pe_core] print alignments... 2.48 sec
[bwa_sai2sam_pe_core] 8912896 sequences have been processed.
[bwa_read_seq] 1.0% bases are trimmed.
[bwa_read_seq] 1.6% bases are trimmed.
[bwa_sai2sam_pe_core] convert to sequence coordinate...

Last edited by caswater; 04-07-2012 at 12:09 AM.
caswater is offline   Reply With Quote
Old 04-07-2012, 04:14 PM   #4
caswater
Member
 
Location: Canada

Join Date: Jun 2011
Posts: 47
Default

can any one give any advice on this? thanks a lot!!
caswater is offline   Reply With Quote
Old 05-03-2012, 02:30 AM   #5
chkuo
Member
 
Location: Taiwan

Join Date: May 2010
Posts: 11
Default

Not sure if this helps but sampe has always been a slow step for us. We are working on a bacterial genome (only ~1.3Mb in size) and have ~80M*2 PE reads (insert size=~365bp). It takes ~4,000 sec CPU time to run the aln step for each end but we can cut this down to a couple of minutes with multi-threading. The sampe step takes ~5,100 sec and there's not much we can do to reduce this.

One possibility is to change the '-o' option to discard the reads that are involved in repeats. I guess this probably would help in the cases of human/plant genomes. There are simply too few repeats in bacterial genomes in comparison so we didn't bother to change the default.

Last edited by chkuo; 05-03-2012 at 02:36 AM.
chkuo is offline   Reply With Quote
Old 05-03-2012, 08:57 AM   #6
xied75
Senior Member
 
Location: Oxford

Join Date: Feb 2012
Posts: 129
Default

Dear all,

As you may observed, SAMPE's bottleneck 1: it's single threaded; 2: it's I/O bound.

If your I/O subsystem (i.e. your disks) is not very fast, please use -P switch, then it'll stop doing the loading files again and again (Do you know that for each batch of 214684 reads, it reads the .BWT, .SA files into memory, use it, dump it, then load .BAC, use it, dump it, and do that again for next batch).

Use -P these files will stay in the memory and you kind of see a constant memory footprint over the whole run, (note: it does use more memory than without it, you better have 8GB for human ref).

In my recent port of BWA to Windows, I added a -t switch to SAMPE, so that you could do multithreading, but I guess you guys don't use windows.

Best,

dong
xied75 is offline   Reply With Quote
Old 05-03-2012, 09:19 AM   #7
chkuo
Member
 
Location: Taiwan

Join Date: May 2010
Posts: 11
Default

Forgot to mention that we do use the '-P' option. Any chance of multi-threading sampe in the linux version soon?
chkuo is offline   Reply With Quote
Old 05-03-2012, 10:01 AM   #8
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

Quote:
Originally Posted by caswater View Post
the insert site seems alright, just around 300-400 bp... but it is really slow.. do ya know if there is anything wrong with it?
Thanks so much~

Here is an examplar output -
[infer_isize] (25, 50, 75) percentile: (313, 337, 363)
[infer_isize] low and high boundaries: 213 and 463 for estimating avg and std
[infer_isize] inferred external isize from 214684 pairs: 337.152 +/- 40.108
[infer_isize] skewness: -0.131; kurtosis: 0.117; ap_prior: 2.61e-05
[infer_isize] inferred maximum insert size: 616 (6.94 sigma)
[bwa_sai2sam_pe_core] time elapses: 19.58 sec
[bwa_sai2sam_pe_core] changing coordinates of 7276 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...
[bwa_paired_sw] 7091 out of 7430 Q17 singletons are mated.
[bwa_paired_sw] 1969 out of 3915 Q17 discordant pairs are fixed.
[bwa_sai2sam_pe_core] time elapses: 5.56 sec
[bwa_sai2sam_pe_core] refine gapped alignments... 0.82 sec
[bwa_sai2sam_pe_core] print alignments... 2.48 sec
[bwa_sai2sam_pe_core] 8912896 sequences have been processed.
[bwa_read_seq] 1.0% bases are trimmed.
[bwa_read_seq] 1.6% bases are trimmed.
[bwa_sai2sam_pe_core] convert to sequence coordinate...
Your elapsed times are similar to mine but maybe a little slower. What type of computer are you using?
sdriscoll is offline   Reply With Quote
Old 05-03-2012, 10:27 AM   #9
xied75
Senior Member
 
Location: Oxford

Join Date: Feb 2012
Posts: 129
Default

How about you go 1000G site and download a bam and run on that and give us some numbers, then we do the same so that we could compare.
xied75 is offline   Reply With Quote
Old 05-03-2012, 03:48 PM   #10
caswater
Member
 
Location: Canada

Join Date: Jun 2011
Posts: 47
Default

thanks a lot... using -P will indeed substantially reduce the computational time. Thanks a lot for all your suggestions!
caswater is offline   Reply With Quote
Old 05-04-2012, 01:54 AM   #11
bryand
Junior Member
 
Location: Goettingen, Germany

Join Date: Aug 2010
Posts: 9
Default

You could also try running sampe with the -s switch to disable smith-waterman for an unmapped mate. Obviously, it depends on the sensitivity you want and of your genome of interest, but that should speed it up as well...
bryand is offline   Reply With Quote
Old 05-04-2012, 06:03 AM   #12
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

You'll need to rule out the obvious problems first.

Is your data on a slow mounted drive?
Are other people running many jobs on your machine?
What machine type are you running? How many CPUs?

run this program from the command line:

grep bogomips /proc/cpuinfo

"bogomips" is a measure of cpu speed
Richard Finney is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:09 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO