Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1000 Genomes in the Amazon cloud?

    These slides (p. 12) claim that the 1000 genomes BAM files were to be made available in the Amazon cloud in Febuary. That would be hugely useful to me, if so, but I can't seem to find them mentioned in the AWS Public Data Catalog. The claimed sizes for the Ensembl data sets listed there only come to about 250G combined.

    Anyone know whether the BAM files actually were made publically available, and if so, how I can access them?

  • #2
    Originally posted by throwaway View Post
    These slides (p. 12) claim that the 1000 genomes BAM files were to be made available in the Amazon cloud in Febuary. That would be hugely useful to me, if so, but I can't seem to find them mentioned in the AWS Public Data Catalog. The claimed sizes for the Ensembl data sets listed there only come to about 250G combined.

    Anyone know whether the BAM files actually were made publically available, and if so, how I can access them?

    Have you checked http://1000genomes.org/page.php?

    Comment


    • #3
      Yes. None of the information on that page or the data access page seems pertinent to Amazon storage. Searching for "amazon" or "aws" only turns up a reference to the Ensembl dataset, and doesn't make it cleaer how to access the BAM files.

      Comment


      • #4
        FYI - having the data in the AWS Public Data Catalog/S3 would be neat for people analyzing data on AWS EC2 (their cloud computing infrastructure) because transfering data within a region is free and very fast.

        Amazon S3 is cloud object storage with industry-leading scalability, data availability, security, and performance. S3 is ideal for data lakes, mobile applications, backup and restore, archival, IoT devices, ML, AI, and analytics.

        There is no Data Transfer charge for data transferred between Amazon EC2 and Amazon S3 within the same Region or for data transferred between the Amazon EC2 Northern Virginia Region and the Amazon S3 US Standard Region.
        Last edited by spenthil; 04-27-2010, 01:38 PM.
        --
        Senthil Palanisami

        Comment


        • #5
          Location of 1000 genomes data on s3

          s3://1000genomes

          Comment


          • #6
            How does one decrypt this s3 link to actually view/download the data?
            --
            bioinfosm

            Comment


            • #7
              I would recommend installing S3fox or similar S3 browser. Since the bucket is public, just type /1000genomes into the location windows (every bucket ID in S3 is unique)

              Screenshot: http://img.skitch.com/20100622-geb3s...ngw3rrecc1.jpg

              and

              Get your point across with fewer words using annotation, shapes and sketches, so that your ideas become reality faster.


              Each individual BAM file is addressable, e.g.



              (added later)

              Also if you use curl or a browser and point to http://1000genomes.s3.amazonaws.com/ you'll get the XML response
              Last edited by mndoci; 06-22-2010, 10:34 PM. Reason: added XML response

              Comment


              • #8
                S3fox is great. I also like Bucket Explorer (commerical, but there's a 30-day trial). If the analysis tools you are using are expecting a filesystem, you could create an AMI and try using s3fs or subcloud. Due to the large size of the current dataset, EBS is just not an option as it is for other public datasets which are less than 1 TB.

                On thing to be aware of is that because S3 does not natively understand directories, it is up to the clients to infer the directory structure. Unfortunately some clients differ in this, and so when you mount the bucket using something like s3fs, the directory structure may not appear correctly.

                Comment


                • #9
                  Also, the AWS console now allows access to S3: http://aws.amazon.com/console/#s3

                  Haven't used it myself, but it looks nice enough.
                  --
                  Senthil Palanisami

                  Comment


                  • #10
                    any update on this issue?

                    thanks!

                    Comment


                    • #11
                      not directly on-topic, but you can get past the 1TB limit for ebs volumes if you use raid 0 and stripe data across multiple volumes - there's a nice article by eric hammond - http://alestic.com/2009/06/ec2-ebs-raid

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:37 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:07 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      49 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      66 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X