![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
bwa sampe very slow | natpokah | Bioinformatics | 25 | 08-13-2013 11:18 AM |
bwa sampe hanging | krobison | Bioinformatics | 6 | 02-13-2013 01:57 PM |
BWA sampe shows extremely large insert size | oiiio | Bioinformatics | 7 | 12-26-2011 02:22 PM |
samtools sort running extremely slow | tsucheta | Bioinformatics | 2 | 06-11-2010 07:30 AM |
bwa sampe 0.5.7 error? | rcorbett | Bioinformatics | 2 | 04-22-2010 08:13 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Canada Join Date: Jun 2011
Posts: 47
|
![]()
Hi, just met the situation, and not sure if it is normal.
We used 100bp pair-end and HiSeq generated ~90 million reads. However, when using bwa to map the reads onto the human reference genome, it has taken one whole day, and only ~9 million reads been mapped with bwa sampe command, which was also piped with samtools view to convert sam to bam. I checked the log files, and everything seemed normal, and it kept reporting the progress and also the issize.... but it seemed way too slow, and I have no idea about it... is it normal? any advice will be highly appreciated. Thanks! |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
What insert sizes does sampe report that it sees?
You could try doing samse on each half, and eyeballing the sam files, to see if the pairs look right |
![]() |
![]() |
![]() |
#3 |
Member
Location: Canada Join Date: Jun 2011
Posts: 47
|
![]()
the insert site seems alright, just around 300-400 bp... but it is really slow.. do ya know if there is anything wrong with it?
Thanks so much~ Here is an examplar output - [infer_isize] (25, 50, 75) percentile: (313, 337, 363) [infer_isize] low and high boundaries: 213 and 463 for estimating avg and std [infer_isize] inferred external isize from 214684 pairs: 337.152 +/- 40.108 [infer_isize] skewness: -0.131; kurtosis: 0.117; ap_prior: 2.61e-05 [infer_isize] inferred maximum insert size: 616 (6.94 sigma) [bwa_sai2sam_pe_core] time elapses: 19.58 sec [bwa_sai2sam_pe_core] changing coordinates of 7276 alignments. [bwa_sai2sam_pe_core] align unmapped mate... [bwa_paired_sw] 7091 out of 7430 Q17 singletons are mated. [bwa_paired_sw] 1969 out of 3915 Q17 discordant pairs are fixed. [bwa_sai2sam_pe_core] time elapses: 5.56 sec [bwa_sai2sam_pe_core] refine gapped alignments... 0.82 sec [bwa_sai2sam_pe_core] print alignments... 2.48 sec [bwa_sai2sam_pe_core] 8912896 sequences have been processed. [bwa_read_seq] 1.0% bases are trimmed. [bwa_read_seq] 1.6% bases are trimmed. [bwa_sai2sam_pe_core] convert to sequence coordinate... Last edited by caswater; 04-07-2012 at 12:09 AM. |
![]() |
![]() |
![]() |
#4 |
Member
Location: Canada Join Date: Jun 2011
Posts: 47
|
![]()
can any one give any advice on this? thanks a lot!!
|
![]() |
![]() |
![]() |
#5 |
Member
Location: Taiwan Join Date: May 2010
Posts: 11
|
![]()
Not sure if this helps but sampe has always been a slow step for us. We are working on a bacterial genome (only ~1.3Mb in size) and have ~80M*2 PE reads (insert size=~365bp). It takes ~4,000 sec CPU time to run the aln step for each end but we can cut this down to a couple of minutes with multi-threading. The sampe step takes ~5,100 sec and there's not much we can do to reduce this.
One possibility is to change the '-o' option to discard the reads that are involved in repeats. I guess this probably would help in the cases of human/plant genomes. There are simply too few repeats in bacterial genomes in comparison so we didn't bother to change the default. Last edited by chkuo; 05-03-2012 at 02:36 AM. |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: Oxford Join Date: Feb 2012
Posts: 129
|
![]()
Dear all,
As you may observed, SAMPE's bottleneck 1: it's single threaded; 2: it's I/O bound. If your I/O subsystem (i.e. your disks) is not very fast, please use -P switch, then it'll stop doing the loading files again and again (Do you know that for each batch of 214684 reads, it reads the .BWT, .SA files into memory, use it, dump it, then load .BAC, use it, dump it, and do that again for next batch). Use -P these files will stay in the memory and you kind of see a constant memory footprint over the whole run, (note: it does use more memory than without it, you better have 8GB for human ref). In my recent port of BWA to Windows, I added a -t switch to SAMPE, so that you could do multithreading, but I guess you guys don't use windows. ![]() Best, dong |
![]() |
![]() |
![]() |
#7 |
Member
Location: Taiwan Join Date: May 2010
Posts: 11
|
![]()
Forgot to mention that we do use the '-P' option. Any chance of multi-threading sampe in the linux version soon?
![]() |
![]() |
![]() |
![]() |
#8 | |
I like code
Location: San Diego, CA, USA Join Date: Sep 2009
Posts: 438
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Oxford Join Date: Feb 2012
Posts: 129
|
![]()
How about you go 1000G site and download a bam and run on that and give us some numbers, then we do the same so that we could compare.
|
![]() |
![]() |
![]() |
#10 |
Member
Location: Canada Join Date: Jun 2011
Posts: 47
|
![]()
thanks a lot... using -P will indeed substantially reduce the computational time. Thanks a lot for all your suggestions!
|
![]() |
![]() |
![]() |
#11 |
Junior Member
Location: Goettingen, Germany Join Date: Aug 2010
Posts: 9
|
![]()
You could also try running sampe with the -s switch to disable smith-waterman for an unmapped mate. Obviously, it depends on the sensitivity you want and of your genome of interest, but that should speed it up as well...
|
![]() |
![]() |
![]() |
#12 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
You'll need to rule out the obvious problems first.
Is your data on a slow mounted drive? Are other people running many jobs on your machine? What machine type are you running? How many CPUs? run this program from the command line: grep bogomips /proc/cpuinfo "bogomips" is a measure of cpu speed |
![]() |
![]() |
![]() |
Thread Tools | |
|
|