![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Human Genome Data Variation? | wilson90 | Bioinformatics | 1 | 08-31-2012 07:34 AM |
Whole-genome sequencing data offer insights into human demography | KerryOdair | Personalized Genomics | 0 | 03-07-2012 11:05 AM |
human genome data | gsgs | General | 0 | 06-11-2010 10:07 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: indiana Join Date: Jan 2015
Posts: 4
|
![]()
Hello Everyone,
How do we store the human genome data using Hadoop (chromosome level) so that we can perform processing (bio-algorithm computing) on the data using Hadoop clusters? |
![]() |
![]() |
![]() |
#2 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
How one best stores the data is entirely dependent on how the actual cluster is constructed and what the nature of the algorithm is. If the cluster is essentially a cloud with slow IO then you'll approach this differently than with a HPC cluster with a faster local storage array. Also, if you just need to load the genome into memory for long computations then it doesn't really matter how you store it, that's not going to be the bottleneck.
|
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: indiana Join Date: Jan 2015
Posts: 4
|
![]()
Hi Ryan,
Thanks for your reply. We do have cluster of 30 machines with hadoop. The problem is we are planning to process the human genome project using hadoop. Here the data is in the form of BAM files. I know if I load the data to hdfs, it will automatically split it into chunks and store on the name nodes. Thats is the problem here. I couldn't split the data like that. Need to split the data chromosome wise so that we can perform bio algorithm computing on them. Can someone please give some insights on this |
![]() |
![]() |
![]() |
#4 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
Without knowing more detail it's impossible to give any guidance. Hadoop is a general tool to facilitate processing. How you should split things depends entirely on what you want to do with the results (and "bio algorithm computing" has absolutely no meaning).
|
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: indiana Join Date: Jan 2015
Posts: 4
|
![]()
Bio algorithm computing : for instance bisulfite methylation extraction
|
![]() |
![]() |
![]() |
#6 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
Yes, that's one of many possible but completely unrelated tasks. I've already responded to this on one of your biostars threads. Please don't cross post.
|
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: indiana Join Date: Jan 2015
Posts: 4
|
![]()
Currently we use bismap ( python tool ). Is there a way to store the data chromosome wise on hadoop.and run the bismap tool command as map reduce jobs
|
![]() |
![]() |
![]() |
#8 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
How this would be done would depend entirely on the cluster, but there's generally no single command (or simple series thereof) that would allow that. The traditional way to do this would be to simply tell BSMap's methylation extractor to just process a single chromosome (and then run that simultaneously with different chromosomes on different cores). You could simply do that in a fraction of the time it's take to implement a full hadoop-based solution.
|
![]() |
![]() |
![]() |
Tags |
bam, genome alignment, hadoop |
Thread Tools | |
|
|