Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1000 Genomes in the Amazon cloud?

    These slides (p. 12) claim that the 1000 genomes BAM files were to be made available in the Amazon cloud in Febuary. That would be hugely useful to me, if so, but I can't seem to find them mentioned in the AWS Public Data Catalog. The claimed sizes for the Ensembl data sets listed there only come to about 250G combined.

    Anyone know whether the BAM files actually were made publically available, and if so, how I can access them?

  • #2
    Originally posted by throwaway View Post
    These slides (p. 12) claim that the 1000 genomes BAM files were to be made available in the Amazon cloud in Febuary. That would be hugely useful to me, if so, but I can't seem to find them mentioned in the AWS Public Data Catalog. The claimed sizes for the Ensembl data sets listed there only come to about 250G combined.

    Anyone know whether the BAM files actually were made publically available, and if so, how I can access them?

    Have you checked http://1000genomes.org/page.php?

    Comment


    • #3
      Yes. None of the information on that page or the data access page seems pertinent to Amazon storage. Searching for "amazon" or "aws" only turns up a reference to the Ensembl dataset, and doesn't make it cleaer how to access the BAM files.

      Comment


      • #4
        FYI - having the data in the AWS Public Data Catalog/S3 would be neat for people analyzing data on AWS EC2 (their cloud computing infrastructure) because transfering data within a region is free and very fast.

        Amazon S3 is cloud object storage with industry-leading scalability, data availability, security, and performance. S3 is ideal for data lakes, mobile applications, backup and restore, archival, IoT devices, ML, AI, and analytics.

        There is no Data Transfer charge for data transferred between Amazon EC2 and Amazon S3 within the same Region or for data transferred between the Amazon EC2 Northern Virginia Region and the Amazon S3 US Standard Region.
        Last edited by spenthil; 04-27-2010, 01:38 PM.
        --
        Senthil Palanisami

        Comment


        • #5
          Location of 1000 genomes data on s3

          s3://1000genomes

          Comment


          • #6
            How does one decrypt this s3 link to actually view/download the data?
            --
            bioinfosm

            Comment


            • #7
              I would recommend installing S3fox or similar S3 browser. Since the bucket is public, just type /1000genomes into the location windows (every bucket ID in S3 is unique)

              Screenshot: http://img.skitch.com/20100622-geb3s...ngw3rrecc1.jpg

              and

              Get your point across with fewer words using annotation, shapes and sketches, so that your ideas become reality faster.


              Each individual BAM file is addressable, e.g.



              (added later)

              Also if you use curl or a browser and point to http://1000genomes.s3.amazonaws.com/ you'll get the XML response
              Last edited by mndoci; 06-22-2010, 10:34 PM. Reason: added XML response

              Comment


              • #8
                S3fox is great. I also like Bucket Explorer (commerical, but there's a 30-day trial). If the analysis tools you are using are expecting a filesystem, you could create an AMI and try using s3fs or subcloud. Due to the large size of the current dataset, EBS is just not an option as it is for other public datasets which are less than 1 TB.

                On thing to be aware of is that because S3 does not natively understand directories, it is up to the clients to infer the directory structure. Unfortunately some clients differ in this, and so when you mount the bucket using something like s3fs, the directory structure may not appear correctly.

                Comment


                • #9
                  Also, the AWS console now allows access to S3: http://aws.amazon.com/console/#s3

                  Haven't used it myself, but it looks nice enough.
                  --
                  Senthil Palanisami

                  Comment


                  • #10
                    any update on this issue?

                    thanks!

                    Comment


                    • #11
                      not directly on-topic, but you can get past the 1TB limit for ebs volumes if you use raid 0 and stripe data across multiple volumes - there's a nice article by eric hammond - http://alestic.com/2009/06/ec2-ebs-raid

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Advancing Precision Medicine for Rare Diseases in Children
                        by seqadmin




                        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                        12-16-2024, 07:57 AM
                      • seqadmin
                        Recent Advances in Sequencing Technologies
                        by seqadmin



                        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                        Long-Read Sequencing
                        Long-read sequencing has seen remarkable advancements,...
                        12-02-2024, 01:49 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 12-17-2024, 10:28 AM
                      0 responses
                      26 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 12-13-2024, 08:24 AM
                      0 responses
                      42 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 12-12-2024, 07:41 AM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 12-11-2024, 07:45 AM
                      0 responses
                      42 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X