SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
PubMed: Exact and complete short read alignment to microbial genomes using GPU progra Newsbot! Literature Watch 0 04-01-2011 02:00 AM
RNA-Seq: X-MATE: A flexible system for mapping short read data. Newsbot! Literature Watch 0 01-11-2011 07:20 AM

Reply
 
Thread Tools
Old 08-13-2017, 07:17 PM   #1
RWilton
Junior Member
 
Location: Maryland

Join Date: May 2012
Posts: 2
Default Arioc: GPU-accelerated short-read alignment for BS-seq data

For folks who have a lot of BS-seq read data to align to a large reference genome, we have published an update to Arioc that aligns short bisulfite-treated sequencer reads at speeds that are at least 10 times faster than existing CPU-based BS-seq aligners without sacrificing sensitivity or accuracy.

We have carefully validated Arioc with synthetic reads (generated from the human reference genome) as well as with Illumina sequencer reads. In particular, we imported alignment results into a relational database and carried out a read-by-read comparison between Arioc's output and the output from several other BS-seq aligners, so we are confident that Arioc is generating accurate mappings and good methylation-context calls.

What Arioc does
Arioc's BS-seq implementation supports both unpaired and paired-end reads. It generates SAM output that is compatible with that of Bismark (which we consider to be the "gold standard" in terms of reliable BS-seq alignment functionality). In particular, each alignment in Arioc's SAM output includes a methylation-context map (XM field) that is compatible with Bismark's bismark_methylation_extractor application, so Arioc can replace Bismark in a read-alignment processing toolchain.

Like Bismark, Arioc uses Smith Waterman (not edit distance) to compute gapped alignments, so Arioc's mappings -- and hence its determination of methylation context -- are nearly identical with those generated by Bismark. Unlike Bismark, however, Arioc does not rely on a "unique best mapping" heuristic to exclude low-MAPQ mappings from alignment results, so Arioc can report high-scoring, high-quality mappings that would be excluded by that heuristic. (Of course, if you are accustomed to Bismark's behavior, Arioc computes valid MAPQ values that you can use for downstream filtering of alignments.)

Perhaps the most interesting thing about Arioc's implementation is this: Rather than "wrap" an instance of a generic short-read aligner like Bowtie 2 or SOAP3-dp, Arioc computes alignment scores directly between bisulfite-treated reads and a reference genome. This eliminates several sources of small-scale inaccuracy. More importantly, however, this approach makes it possible to use optimized GPU code and concurrent CPU threads to implement the most compute-intensive logic in the BS-seq alignment pipeline. This leads to a significant improvement in throughput.

Running Arioc
Arioc exists to handle sequencer runs that contain hundreds of millions of short reads. With this amount of data, the ability to align hundreds of thousands of reads per second is significant. But there are a couple of caveats.

At a minimum, you need a machine with about 100GB of RAM and an NVidia GPU with at least 5GB of onboard RAM. (We assume that this is probably not a major problem if you have enough data to be worth aligning in hours rather than days.) Arioc is fast enough that its throughput can be tangibly improved by using SSD devices rather than spinning hard disk drives.

Also, Arioc's performance depends on available CPU and GPU hardware as well as on the number of mismatches and indels in the read data (think "error rate"). This means that, although Arioc's default configuration parameters can serve as a reasonable starting point, you should test different parameter settings to optimize the balance between speed and sensitivity for your own hardware and data.

(And, of course, if you have billions of non-bisulfite-treated reads to align, Arioc can also handle them at speeds that are about an order of magnitude faster than Bowtie 2 or BWA.)

The Arioc user guide can be found at https://github.com/RWilton/Arioc.

The current Arioc release for Linux and Windows is available at https://github.com/RWilton/Arioc/releases.

And finally: there's a preprint available at https://doi.org/10.1101/175729. It's worth a look.
Attached Files
File Type: pdf Figure_3.pdf (330.8 KB, 4 views)

Last edited by RWilton; 09-03-2017 at 11:13 AM. Reason: Added link to preprint
RWilton is offline   Reply With Quote
Reply

Tags
alignment, bisulfite, bisulphite, methlyation

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:34 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO