Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Arioc: GPU-accelerated short-read alignment for BS-seq data

    For folks who have a lot of BS-seq read data to align to a large reference genome, we have published an update to Arioc that aligns short bisulfite-treated sequencer reads at speeds that are at least 10 times faster than existing CPU-based BS-seq aligners without sacrificing sensitivity or accuracy.

    We have carefully validated Arioc with synthetic reads (generated from the human reference genome) as well as with Illumina sequencer reads. In particular, we imported alignment results into a relational database and carried out a read-by-read comparison between Arioc's output and the output from several other BS-seq aligners, so we are confident that Arioc is generating accurate mappings and good methylation-context calls.

    What Arioc does
    Arioc's BS-seq implementation supports both unpaired and paired-end reads. It generates SAM output that is compatible with that of Bismark (which we consider to be the "gold standard" in terms of reliable BS-seq alignment functionality). In particular, each alignment in Arioc's SAM output includes a methylation-context map (XM field) that is compatible with Bismark's bismark_methylation_extractor application, so Arioc can replace Bismark in a read-alignment processing toolchain.

    Like Bismark, Arioc uses Smith Waterman (not edit distance) to compute gapped alignments, so Arioc's mappings -- and hence its determination of methylation context -- are nearly identical with those generated by Bismark. Unlike Bismark, however, Arioc does not rely on a "unique best mapping" heuristic to exclude low-MAPQ mappings from alignment results, so Arioc can report high-scoring, high-quality mappings that would be excluded by that heuristic. (Of course, if you are accustomed to Bismark's behavior, Arioc computes valid MAPQ values that you can use for downstream filtering of alignments.)

    Perhaps the most interesting thing about Arioc's implementation is this: Rather than "wrap" an instance of a generic short-read aligner like Bowtie 2 or SOAP3-dp, Arioc computes alignment scores directly between bisulfite-treated reads and a reference genome. This eliminates several sources of small-scale inaccuracy. More importantly, however, this approach makes it possible to use optimized GPU code and concurrent CPU threads to implement the most compute-intensive logic in the BS-seq alignment pipeline. This leads to a significant improvement in throughput.

    Running Arioc
    Arioc exists to handle sequencer runs that contain hundreds of millions of short reads. With this amount of data, the ability to align hundreds of thousands of reads per second is significant. But there are a couple of caveats.

    At a minimum, you need a machine with about 100GB of RAM and an NVidia GPU with at least 5GB of onboard RAM. (We assume that this is probably not a major problem if you have enough data to be worth aligning in hours rather than days.) Arioc is fast enough that its throughput can be tangibly improved by using SSD devices rather than spinning hard disk drives.

    Also, Arioc's performance depends on available CPU and GPU hardware as well as on the number of mismatches and indels in the read data (think "error rate"). This means that, although Arioc's default configuration parameters can serve as a reasonable starting point, you should test different parameter settings to optimize the balance between speed and sensitivity for your own hardware and data.

    (And, of course, if you have billions of non-bisulfite-treated reads to align, Arioc can also handle them at speeds that are about an order of magnitude faster than Bowtie 2 or BWA.)

    The Arioc user guide can be found at https://github.com/RWilton/Arioc.

    The current Arioc release for Linux and Windows is available at https://github.com/RWilton/Arioc/releases.

    And finally: there's a preprint available at https://doi.org/10.1101/175729. It's worth a look.
    Attached Files
    Last edited by RWilton; 09-03-2017, 11:13 AM. Reason: Added link to preprint

Latest Articles

Collapse

  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM
  • seqadmin
    The Impact of AI in Genomic Medicine
    by seqadmin



    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
    02-26-2024, 02:07 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-14-2024, 06:13 AM
0 responses
33 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-08-2024, 08:03 AM
0 responses
72 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-07-2024, 08:13 AM
0 responses
81 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-06-2024, 09:51 AM
0 responses
68 views
0 likes
Last Post seqadmin  
Working...
X