Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Support for parallelization of paired-end alignments with BWA

    Hello,

    bwa aln is multi-threaded, but the sampe step needed to combine results for paired-end reads is not. We have released a version of BWA modified to natively support Goby file formats. This version of BWA provides a strategy to parallelize the bwa sampe steps on a grid of computers for very large paired-end alignments. Here are three new use cases possible with this new version:

    1. With FASTQ input to generate Goby alignment files (thread parallelization of aln step only):

    bwa aln -t 10 -f alignment_0.sai BWA_INDEXED_REFERENCE paired-input_0.fastq
    bwa aln -t 10 -f alignment_1.sai BWA_INDEXED_REFERENCE paired-input_1.fastq
    bwa sampe -F goby -f alignment BWA_INDEXED_REFERENCE alignment_0.sai alignment_1.sai paired-input_0.fastq paired-input_1.fastq

    This will produce an alignment in the Goby format. This eliminates the SAM to Goby conversion step we initially provided in the Goby align mode. Please note that -t 10 is used to run the alignment step on 10 threads, but there is no way to split conveniently the input files to align only chunks of input.

    2. With Goby compact-reads input to generate Goby alignments (thread parallelization of aln step only):

    bwa aln -t 10 -w 0 -f alignment_0.sai BWA_INDEXED_REFERENCE paired-input.compact-reads
    bwa aln -t 10 -w 1 -f alignment_1.sai BWA_INDEXED_REFERENCE paired-input.compact-reads
    bwa sampe -F goby -f alignment BWA_INDEXED_REFERENCE alignment_0.sai alignment_1.sai paired-input.compact-reads paired-input.compact-reads

    This will generate the same result as step 1, but use reads in the Goby compact-reads format. The Goby reads format stores both pairs of reads in the same file, so we provide the option -w to specify which read pair should be aligned (i.e., -w 1 aligned the second read in a pair). In use case 3, we show how to parallelize this on a grid of computers:

    3. With Goby compact-reads input to generate Goby alignments (full grid parallelization support, including the bwa sampe step):
    We illustrate the strategy by splitting input in two chunks that can be processed independently:

    1. bwa aln -t 10 -w 0 -f alignment_0.sai BWA_INDEXED_REFERENCE paired-input.compact-reads -x 0 -y 10000000
    2. bwa aln -t 10 -w 1 -f alignment_1.sai BWA_INDEXED_REFERENCE paired-input.compact-reads -x 0 -y 10000000
    3. bwa sampe -F goby BWA_INDEXED_REFERENCE -f chunk-1-alignment alignment_0.sai alignment_1.sai paired-input.compact-reads paired- input.compact-reads x 0 -y 10000000
    4. goby 3g sort chunk-1-alignment -o chunk-1-alignment-sorted

    5. bwa aln -t 10 -w 0 -f alignment_0.sai BWA_INDEXED_REFERENCE paired-input.compact-reads -x 10000000 -y 20000000
    6. bwa aln -t 10 -w 1 -f alignment_1.sai BWA_INDEXED_REFERENCE paired-input.compact-reads -x 10000000 -y 20000000
    7. bwa sampe -F goby BWA_INDEXED_REFERENCE -f chunk-2-alignment alignment_0.sai alignment_1.sai paired-input.compact-reads paired-input.compact-reads -x 10000000 -y 20000000
    8. goby 3g sort chunk-2-alignment -o chunk-2-alignment-sorted

    9. goby 3g concatenate-alignments chunk-1-alignment-sorted chunk-2-alignment-sorted -o combined-alignment-sorted

    Steps 1-3 run bwa on the first chunk of input, that is on reads found between byte offset 0 and 10,000,000 in the input file. Restricting the input is done with the -x and -y options introduced in this version of BWA.
    Step 4 sorts the alignment for the first chunk by reference position.
    Steps 5-7 run bwa on the second chunk (up to byte offset 20,000,000). Please note that this steps can start right away, since Goby supports semi-random access to byte offset in compact-reads files. This is not possible when using a FASTQ format (as in step 1).
    Step 8 sorts the alignment for the second chunk by reference position.
    Step 9 concatenates the alignments produced by the parallel steps 1-4 and 5-8. In Goby, concatenating sorted alignments preserves sort order, so the resulting alignment produced in step 9 is sorted and indexed.

    The strategy effectively allows one to run bwa sampe in parallel. The resulting alignment can be viewed with IGV (latest development version) or analyzed with the Goby tools or one of the Goby APIs (in Java, Python, C++ and C). Enjoy.

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin


    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
    Yesterday, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
39 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
41 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
35 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
55 views
0 likes
Last Post seqadmin  
Working...
X