Contrail - a hadoop-based de novo sequence assembler

samanta

Senior Member

Join Date: Feb 2010

Posts: 108
- Share
- Tweet
#1

Contrail - a hadoop-based de novo sequence assembler

09-08-2011, 11:16 AM

Hello all,

Most bioinformatics researchers get stuck with the question of how to buy a computer with enough RAM to process their NGS data, because RAM is very expensive. It is not easy to get approval for buying a $100K computer from the managers, when they think everything can be done using $1K laptop and Microsoft excel.

Internet companies like Google developed algorithms to process terabytes and petabytes of data very rapidly and give users the search results. They use clusters of commodity computers with inexpensive disks (hard drive is cheap) using an approach called MapReduce. MapReduce framework is available for free under Hadoop framework distributed by Apache foundation.

Few months back, I came across a genome assembly program called 'contrail' that uses Hadoop to assemble large quantities of NGS data, and it is scalable. When I speak to bioinformaticians about trying out Hadoop instead of buying large and expensive RAM-based machine, I usually hit a hard wall, because the words like Hadoop, MapReduce etc. are foreign to them. So, today I wrote a post to explain setting up and run contrail on your own machine using Hadoop. It is written in such a way that even if you never used Hadoop etc., you can mechanically execute the steps and will be able to assemble the reads in test library in a short time in your own Windows or Unix box. I am hoping that once researchers start to feel that Hadoop approach is easy and scalable for large data sets, they will be able to develop their own programs and the whole community will benefit.

This post discusses how to use contrail assembler -

404 Not Found

http://www.homolog.us/blogs/2011/09/08/contrail-a-de-bruijn-genome-assembler-that-uses-hadoop/

This post discusses how to set up and run Hadoop for a simple sequence analysis example -

404 Not Found

http://www.homolog.us/blogs/2011/08/31/using-hadoop-for-transcriptomics-an-example-to-get-started/

Please note that I am not associated with the researchers, who wrote contrail, and never spoke to them or met them. It is the only example I found for de Bruijn assemblers and decided to try it out.

http://homolog.us
Tags: None

Previous template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 21 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 23 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Contrail - a hadoop-based de novo sequence assembler

Latest Articles

ad_right_rmr

News