Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hadoop for human genome data

    Hello Everyone,

    How do we store the human genome data using Hadoop (chromosome level) so that we can perform processing (bio-algorithm computing) on the data using Hadoop clusters?

  • #2
    How one best stores the data is entirely dependent on how the actual cluster is constructed and what the nature of the algorithm is. If the cluster is essentially a cloud with slow IO then you'll approach this differently than with a HPC cluster with a faster local storage array. Also, if you just need to load the genome into memory for long computations then it doesn't really matter how you store it, that's not going to be the bottleneck.

    Comment


    • #3
      Hi Ryan,
      Thanks for your reply. We do have cluster of 30 machines with hadoop. The problem is we are planning to process the human genome project using hadoop. Here the data is in the form of BAM files. I know if I load the data to hdfs, it will automatically split it into chunks and store on the name nodes. Thats is the problem here. I couldn't split the data like that. Need to split the data chromosome wise so that we can perform bio algorithm computing on them.

      Can someone please give some insights on this

      Comment


      • #4
        Without knowing more detail it's impossible to give any guidance. Hadoop is a general tool to facilitate processing. How you should split things depends entirely on what you want to do with the results (and "bio algorithm computing" has absolutely no meaning).

        Comment


        • #5
          Bio algorithm computing : for instance bisulfite methylation extraction

          Comment


          • #6
            Yes, that's one of many possible but completely unrelated tasks. I've already responded to this on one of your biostars threads. Please don't cross post.

            Comment


            • #7
              Currently we use bismap ( python tool ). Is there a way to store the data chromosome wise on hadoop.and run the bismap tool command as map reduce jobs

              Comment


              • #8
                How this would be done would depend entirely on the cluster, but there's generally no single command (or simple series thereof) that would allow that. The traditional way to do this would be to simply tell BSMap's methylation extractor to just process a single chromosome (and then run that simultaneously with different chromosomes on different cores). You could simply do that in a fraction of the time it's take to implement a full hadoop-based solution.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-27-2024, 06:37 PM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-27-2024, 06:07 PM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                57 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                70 views
                0 likes
                Last Post seqadmin  
                Working...
                X