SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Human Genome Data Variation? wilson90 Bioinformatics 1 08-31-2012 06:34 AM
Whole-genome sequencing data offer insights into human demography KerryOdair Personalized Genomics 0 03-07-2012 10:05 AM
human genome data gsgs General 0 06-11-2010 09:07 PM

Reply
 
Thread Tools
Old 01-14-2015, 10:23 PM   #1
smylalwys
Junior Member
 
Location: indiana

Join Date: Jan 2015
Posts: 4
Default Hadoop for human genome data

Hello Everyone,

How do we store the human genome data using Hadoop (chromosome level) so that we can perform processing (bio-algorithm computing) on the data using Hadoop clusters?
smylalwys is offline   Reply With Quote
Old 01-14-2015, 11:18 PM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

How one best stores the data is entirely dependent on how the actual cluster is constructed and what the nature of the algorithm is. If the cluster is essentially a cloud with slow IO then you'll approach this differently than with a HPC cluster with a faster local storage array. Also, if you just need to load the genome into memory for long computations then it doesn't really matter how you store it, that's not going to be the bottleneck.
dpryan is offline   Reply With Quote
Old 01-14-2015, 11:24 PM   #3
smylalwys
Junior Member
 
Location: indiana

Join Date: Jan 2015
Posts: 4
Default

Hi Ryan,
Thanks for your reply. We do have cluster of 30 machines with hadoop. The problem is we are planning to process the human genome project using hadoop. Here the data is in the form of BAM files. I know if I load the data to hdfs, it will automatically split it into chunks and store on the name nodes. Thats is the problem here. I couldn't split the data like that. Need to split the data chromosome wise so that we can perform bio algorithm computing on them.

Can someone please give some insights on this
smylalwys is offline   Reply With Quote
Old 01-14-2015, 11:26 PM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Without knowing more detail it's impossible to give any guidance. Hadoop is a general tool to facilitate processing. How you should split things depends entirely on what you want to do with the results (and "bio algorithm computing" has absolutely no meaning).
dpryan is offline   Reply With Quote
Old 01-14-2015, 11:32 PM   #5
smylalwys
Junior Member
 
Location: indiana

Join Date: Jan 2015
Posts: 4
Default

Bio algorithm computing : for instance bisulfite methylation extraction
smylalwys is offline   Reply With Quote
Old 01-14-2015, 11:33 PM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Yes, that's one of many possible but completely unrelated tasks. I've already responded to this on one of your biostars threads. Please don't cross post.
dpryan is offline   Reply With Quote
Old 01-14-2015, 11:34 PM   #7
smylalwys
Junior Member
 
Location: indiana

Join Date: Jan 2015
Posts: 4
Default

Currently we use bismap ( python tool ). Is there a way to store the data chromosome wise on hadoop.and run the bismap tool command as map reduce jobs
smylalwys is offline   Reply With Quote
Old 01-14-2015, 11:38 PM   #8
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

How this would be done would depend entirely on the cluster, but there's generally no single command (or simple series thereof) that would allow that. The traditional way to do this would be to simply tell BSMap's methylation extractor to just process a single chromosome (and then run that simultaneously with different chromosomes on different cores). You could simply do that in a fraction of the time it's take to implement a full hadoop-based solution.
dpryan is offline   Reply With Quote
Reply

Tags
bam, genome alignment, hadoop

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO