Seqanswers Leaderboard Ad

**swbarnes2** · 04-06-2012, 10:00 PM

What insert sizes does sampe report that it sees?

You could try doing samse on each half, and eyeballing the sam files, to see if the pairs look right

**caswater** · 04-06-2012, 11:00 PM

the insert site seems alright, just around 300-400 bp... but it is really slow.. do ya know if there is anything wrong with it?
Thanks so much~

Here is an examplar output -
[infer_isize] (25, 50, 75) percentile: (313, 337, 363)
[infer_isize] low and high boundaries: 213 and 463 for estimating avg and std
[infer_isize] inferred external isize from 214684 pairs: 337.152 +/- 40.108
[infer_isize] skewness: -0.131; kurtosis: 0.117; ap_prior: 2.61e-05
[infer_isize] inferred maximum insert size: 616 (6.94 sigma)
[bwa_sai2sam_pe_core] time elapses: 19.58 sec
[bwa_sai2sam_pe_core] changing coordinates of 7276 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...
[bwa_paired_sw] 7091 out of 7430 Q17 singletons are mated.
[bwa_paired_sw] 1969 out of 3915 Q17 discordant pairs are fixed.
[bwa_sai2sam_pe_core] time elapses: 5.56 sec
[bwa_sai2sam_pe_core] refine gapped alignments... 0.82 sec
[bwa_sai2sam_pe_core] print alignments... 2.48 sec
[bwa_sai2sam_pe_core] 8912896 sequences have been processed.
[bwa_read_seq] 1.0% bases are trimmed.
[bwa_read_seq] 1.6% bases are trimmed.
[bwa_sai2sam_pe_core] convert to sequence coordinate...

**caswater** · 04-07-2012, 03:14 PM

can any one give any advice on this? thanks a lot!!

**chkuo** · 05-03-2012, 01:30 AM

Not sure if this helps but sampe has always been a slow step for us. We are working on a bacterial genome (only ~1.3Mb in size) and have ~80M*2 PE reads (insert size=~365bp). It takes ~4,000 sec CPU time to run the aln step for each end but we can cut this down to a couple of minutes with multi-threading. The sampe step takes ~5,100 sec and there's not much we can do to reduce this.

One possibility is to change the '-o' option to discard the reads that are involved in repeats. I guess this probably would help in the cases of human/plant genomes. There are simply too few repeats in bacterial genomes in comparison so we didn't bother to change the default.

**xied75** · 05-03-2012, 07:57 AM

Dear all,

As you may observed, SAMPE's bottleneck 1: it's single threaded; 2: it's I/O bound.

If your I/O subsystem (i.e. your disks) is not very fast, please use -P switch, then it'll stop doing the loading files again and again (Do you know that for each batch of 214684 reads, it reads the .BWT, .SA files into memory, use it, dump it, then load .BAC, use it, dump it, and do that again for next batch).

Use -P these files will stay in the memory and you kind of see a constant memory footprint over the whole run, (note: it does use more memory than without it, you better have 8GB for human ref).

In my recent port of BWA to Windows, I added a -t switch to SAMPE, so that you could do multithreading, but I guess you guys don't use windows.

Best,

dong

**chkuo** · 05-03-2012, 08:19 AM

Forgot to mention that we do use the '-P' option. Any chance of multi-threading sampe in the linux version soon?

**sdriscoll** · 05-03-2012, 09:01 AM

Originally posted by caswater View Post

the insert site seems alright, just around 300-400 bp... but it is really slow.. do ya know if there is anything wrong with it?
Thanks so much~

Here is an examplar output -
[infer_isize] (25, 50, 75) percentile: (313, 337, 363)
[infer_isize] low and high boundaries: 213 and 463 for estimating avg and std
[infer_isize] inferred external isize from 214684 pairs: 337.152 +/- 40.108
[infer_isize] skewness: -0.131; kurtosis: 0.117; ap_prior: 2.61e-05
[infer_isize] inferred maximum insert size: 616 (6.94 sigma)
[bwa_sai2sam_pe_core] time elapses: 19.58 sec
[bwa_sai2sam_pe_core] changing coordinates of 7276 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...
[bwa_paired_sw] 7091 out of 7430 Q17 singletons are mated.
[bwa_paired_sw] 1969 out of 3915 Q17 discordant pairs are fixed.
[bwa_sai2sam_pe_core] time elapses: 5.56 sec
[bwa_sai2sam_pe_core] refine gapped alignments... 0.82 sec
[bwa_sai2sam_pe_core] print alignments... 2.48 sec
[bwa_sai2sam_pe_core] 8912896 sequences have been processed.
[bwa_read_seq] 1.0% bases are trimmed.
[bwa_read_seq] 1.6% bases are trimmed.
[bwa_sai2sam_pe_core] convert to sequence coordinate...

Your elapsed times are similar to mine but maybe a little slower. What type of computer are you using?

**xied75** · 05-03-2012, 09:27 AM

How about you go 1000G site and download a bam and run on that and give us some numbers, then we do the same so that we could compare.

**caswater** · 05-03-2012, 02:48 PM

thanks a lot... using -P will indeed substantially reduce the computational time. Thanks a lot for all your suggestions!

**bryand** · 05-04-2012, 12:54 AM

You could also try running sampe with the -s switch to disable smith-waterman for an unmapped mate. Obviously, it depends on the sensitivity you want and of your genome of interest, but that should speed it up as well...

**Richard Finney** · 05-04-2012, 05:03 AM

You'll need to rule out the obvious problems first.

Is your data on a slow mounted drive?
Are other people running many jobs on your machine?
What machine type are you running? How many CPUs?

run this program from the command line:

grep bogomips /proc/cpuinfo

"bogomips" is a measure of cpu speed

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 25 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 28 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

help~~bwa sampe extremely slow~!!

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News