Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to make big datasets available for cooperation partners

    A few fastq and bam files quickly sum up to a few hundred GB of data.

    How do make this amount of data available for others (outside the LAN) to work with?
    Webserver/FTP to let user download the files (wget etc). Or actively copying the data via scp to external servers (-l xxx?).

    Do you limit network speed (to what speed) to avoid clogging network?

    Do you even use external harddisks on a regular basis?

    best,
    Sven

  • #2
    Putting the data up on a webserver is relatively easy and secure (if you add some authentication). If bandwidth is not an issue at your (and the collaborators) location then you should be all set. If people only need to view the data then this mechanism will also work with IGV etc.

    You could (as long as your institutional policy allows) look into putting the data up on Amazon/Box/Google storage. This would not be a cheap option but since you are asking about possibilities. There are some limits on individual file sizes etc so you may need to split the files up if you go this route.

    Some commercial entities favor harddisks for data transfer. So yes, that is still an option.

    Finally you could use SRA. Submit the data there and let them keep a copy. I know you can ask for an embargo on public release (for a year I think). Am not sure if you are able to share the data with a collaborator during that period.
    Last edited by GenoMax; 09-05-2014, 03:32 AM.

    Comment


    • #3
      Commercial storage is not an option.

      Bandwidth is an issue; we currently limit to 10Mbit/s per individual copy job. That works if one or two people copy a few (huge) files.

      Just curious how much effort others "invest" to optimize this procedure :-)
      (or what has proven practical in a day-by-day routine in other labs)

      Comment


      • #4
        We are moving towards using Globus as transfer method. www.globus.org Globus does not speed up transfers per se -- you are still limited by bandwidth. What Globus offers is a 'fire and forget' method of data transfer. I don't pay for the service and work for a university in any case thus am uncertain if this is financially feasible for you.

        Comment


        • #5
          Originally posted by westerman View Post
          We are moving towards using Globus as transfer method. www.globus.org Globus does not speed up transfers per se -- you are still limited by bandwidth. What Globus offers is a 'fire and forget' method of data transfer. I don't pay for the service and work for a university in any case thus am uncertain if this is financially feasible for you.
          Interesting approach. I'll have a look.

          Comment


          • #6
            There's always the external-hard-drive-in-the-mail option if people are in the same country. That approach has been used routinely by, for example, Complete Genomics.

            Comment


            • #7
              Originally posted by Brian Bushnell View Post
              There's always the external-hard-drive-in-the-mail option if people are in the same country. That approach has been used routinely by, for example, Complete Genomics.
              Sure, we usually use this option when we want to provide a complete Illumina run folder. But for routine (fastq/bam) data distribution it is not very feasable.

              Comment


              • #8
                Originally posted by westerman View Post
                We are moving towards using Globus as transfer method. www.globus.org Globus does not speed up transfers per se -- you are still limited by bandwidth. What Globus offers is a 'fire and forget' method of data transfer. I don't pay for the service and work for a university in any case thus am uncertain if this is financially feasible for you.
                Same here...Globus has proven to be faster than scp (barring bandwidth limitations, of course), very reliable, and easy to set up and use. I'd recommend giving it a go!

                Comment


                • #9
                  Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

                  —Tanenbaum, Andrew S. (1989). Computer Networks. New Jersey: Prentice-Hall. p. 57. ISBN 0-13-166836-6.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Today, 08:47 AM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  57 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X