SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to filter BWA paired-end alignments? guru Bioinformatics 2 08-24-2011 12:32 PM
bowtie- paired end - no alignments madsaan Bioinformatics 1 06-27-2011 01:24 PM
Does Cufflinks support single-end and paired end data together ? ersenkavak Bioinformatics 1 10-22-2010 07:26 AM
BEDTools: new tools / support for paired-end features. quinlana Bioinformatics 3 11-19-2009 05:30 AM
why is paired-end alignment support so important found Bioinformatics 1 03-03-2009 07:05 AM

Reply
 
Thread Tools
Old 12-17-2010, 04:40 AM   #1
Fabien Campagne
Member
 
Location: New York City

Join Date: Feb 2010
Posts: 39
Default Support for parallelization of paired-end alignments with BWA

Hello,

bwa aln is multi-threaded, but the sampe step needed to combine results for paired-end reads is not. We have released a version of BWA modified to natively support Goby file formats. This version of BWA provides a strategy to parallelize the bwa sampe steps on a grid of computers for very large paired-end alignments. Here are three new use cases possible with this new version:

1. With FASTQ input to generate Goby alignment files (thread parallelization of aln step only):

bwa aln -t 10 -f alignment_0.sai BWA_INDEXED_REFERENCE paired-input_0.fastq
bwa aln -t 10 -f alignment_1.sai BWA_INDEXED_REFERENCE paired-input_1.fastq
bwa sampe -F goby -f alignment BWA_INDEXED_REFERENCE alignment_0.sai alignment_1.sai paired-input_0.fastq paired-input_1.fastq

This will produce an alignment in the Goby format. This eliminates the SAM to Goby conversion step we initially provided in the Goby align mode. Please note that -t 10 is used to run the alignment step on 10 threads, but there is no way to split conveniently the input files to align only chunks of input.

2. With Goby compact-reads input to generate Goby alignments (thread parallelization of aln step only):

bwa aln -t 10 -w 0 -f alignment_0.sai BWA_INDEXED_REFERENCE paired-input.compact-reads
bwa aln -t 10 -w 1 -f alignment_1.sai BWA_INDEXED_REFERENCE paired-input.compact-reads
bwa sampe -F goby -f alignment BWA_INDEXED_REFERENCE alignment_0.sai alignment_1.sai paired-input.compact-reads paired-input.compact-reads

This will generate the same result as step 1, but use reads in the Goby compact-reads format. The Goby reads format stores both pairs of reads in the same file, so we provide the option -w to specify which read pair should be aligned (i.e., -w 1 aligned the second read in a pair). In use case 3, we show how to parallelize this on a grid of computers:

3. With Goby compact-reads input to generate Goby alignments (full grid parallelization support, including the bwa sampe step):
We illustrate the strategy by splitting input in two chunks that can be processed independently:

1. bwa aln -t 10 -w 0 -f alignment_0.sai BWA_INDEXED_REFERENCE paired-input.compact-reads -x 0 -y 10000000
2. bwa aln -t 10 -w 1 -f alignment_1.sai BWA_INDEXED_REFERENCE paired-input.compact-reads -x 0 -y 10000000
3. bwa sampe -F goby BWA_INDEXED_REFERENCE -f chunk-1-alignment alignment_0.sai alignment_1.sai paired-input.compact-reads paired- input.compact-reads x 0 -y 10000000
4. goby 3g sort chunk-1-alignment -o chunk-1-alignment-sorted

5. bwa aln -t 10 -w 0 -f alignment_0.sai BWA_INDEXED_REFERENCE paired-input.compact-reads -x 10000000 -y 20000000
6. bwa aln -t 10 -w 1 -f alignment_1.sai BWA_INDEXED_REFERENCE paired-input.compact-reads -x 10000000 -y 20000000
7. bwa sampe -F goby BWA_INDEXED_REFERENCE -f chunk-2-alignment alignment_0.sai alignment_1.sai paired-input.compact-reads paired-input.compact-reads -x 10000000 -y 20000000
8. goby 3g sort chunk-2-alignment -o chunk-2-alignment-sorted

9. goby 3g concatenate-alignments chunk-1-alignment-sorted chunk-2-alignment-sorted -o combined-alignment-sorted

Steps 1-3 run bwa on the first chunk of input, that is on reads found between byte offset 0 and 10,000,000 in the input file. Restricting the input is done with the -x and -y options introduced in this version of BWA.
Step 4 sorts the alignment for the first chunk by reference position.
Steps 5-7 run bwa on the second chunk (up to byte offset 20,000,000). Please note that this steps can start right away, since Goby supports semi-random access to byte offset in compact-reads files. This is not possible when using a FASTQ format (as in step 1).
Step 8 sorts the alignment for the second chunk by reference position.
Step 9 concatenates the alignments produced by the parallel steps 1-4 and 5-8. In Goby, concatenating sorted alignments preserves sort order, so the resulting alignment produced in step 9 is sorted and indexed.

The strategy effectively allows one to run bwa sampe in parallel. The resulting alignment can be viewed with IGV (latest development version) or analyzed with the Goby tools or one of the Goby APIs (in Java, Python, C++ and C). Enjoy.
Fabien Campagne is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:59 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO