Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • RWilton
    Junior Member
    • May 2012
    • 2

    Arioc: GPU-accelerated short-read alignment for BS-seq data

    For folks who have a lot of BS-seq read data to align to a large reference genome, we have published an update to Arioc that aligns short bisulfite-treated sequencer reads at speeds that are at least 10 times faster than existing CPU-based BS-seq aligners without sacrificing sensitivity or accuracy.

    We have carefully validated Arioc with synthetic reads (generated from the human reference genome) as well as with Illumina sequencer reads. In particular, we imported alignment results into a relational database and carried out a read-by-read comparison between Arioc's output and the output from several other BS-seq aligners, so we are confident that Arioc is generating accurate mappings and good methylation-context calls.

    What Arioc does
    Arioc's BS-seq implementation supports both unpaired and paired-end reads. It generates SAM output that is compatible with that of Bismark (which we consider to be the "gold standard" in terms of reliable BS-seq alignment functionality). In particular, each alignment in Arioc's SAM output includes a methylation-context map (XM field) that is compatible with Bismark's bismark_methylation_extractor application, so Arioc can replace Bismark in a read-alignment processing toolchain.

    Like Bismark, Arioc uses Smith Waterman (not edit distance) to compute gapped alignments, so Arioc's mappings -- and hence its determination of methylation context -- are nearly identical with those generated by Bismark. Unlike Bismark, however, Arioc does not rely on a "unique best mapping" heuristic to exclude low-MAPQ mappings from alignment results, so Arioc can report high-scoring, high-quality mappings that would be excluded by that heuristic. (Of course, if you are accustomed to Bismark's behavior, Arioc computes valid MAPQ values that you can use for downstream filtering of alignments.)

    Perhaps the most interesting thing about Arioc's implementation is this: Rather than "wrap" an instance of a generic short-read aligner like Bowtie 2 or SOAP3-dp, Arioc computes alignment scores directly between bisulfite-treated reads and a reference genome. This eliminates several sources of small-scale inaccuracy. More importantly, however, this approach makes it possible to use optimized GPU code and concurrent CPU threads to implement the most compute-intensive logic in the BS-seq alignment pipeline. This leads to a significant improvement in throughput.

    Running Arioc
    Arioc exists to handle sequencer runs that contain hundreds of millions of short reads. With this amount of data, the ability to align hundreds of thousands of reads per second is significant. But there are a couple of caveats.

    At a minimum, you need a machine with about 100GB of RAM and an NVidia GPU with at least 5GB of onboard RAM. (We assume that this is probably not a major problem if you have enough data to be worth aligning in hours rather than days.) Arioc is fast enough that its throughput can be tangibly improved by using SSD devices rather than spinning hard disk drives.

    Also, Arioc's performance depends on available CPU and GPU hardware as well as on the number of mismatches and indels in the read data (think "error rate"). This means that, although Arioc's default configuration parameters can serve as a reasonable starting point, you should test different parameter settings to optimize the balance between speed and sensitivity for your own hardware and data.

    (And, of course, if you have billions of non-bisulfite-treated reads to align, Arioc can also handle them at speeds that are about an order of magnitude faster than Bowtie 2 or BWA.)

    The Arioc user guide can be found at https://github.com/RWilton/Arioc.

    The current Arioc release for Linux and Windows is available at https://github.com/RWilton/Arioc/releases.

    And finally: there's a preprint available at https://doi.org/10.1101/175729. It's worth a look.
    Attached Files
    Last edited by RWilton; 09-03-2017, 11:13 AM. Reason: Added link to preprint

Latest Articles

Collapse

  • SEQadmin2
    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
    by SEQadmin2


    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
    ...
    06-02-2026, 10:05 AM
  • SEQadmin2
    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
    by SEQadmin2


    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


    Introduction

    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
    05-22-2026, 06:42 AM
  • SEQadmin2
    Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
    by SEQadmin2

    Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


    Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
    05-06-2026, 09:04 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by SEQadmin2, 06-02-2026, 12:03 PM
0 responses
19 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-02-2026, 11:40 AM
0 responses
14 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 05-28-2026, 11:40 AM
0 responses
29 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 05-26-2026, 10:12 AM
0 responses
31 views
0 reactions
Last Post SEQadmin2  
Working...