Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNAseq analysis on a budget

    hi all,
    I am running RNAseq using the illumina Hiseq platform; ~50bp paired end reads, so FASTQ files estimated to be about 35-40Gb each. it is a big project. I have been told I will need at least 2Tb of disk space, 32Gb ram and at 8 cores to run the analysis.
    however, my budget for a computer to run the analysis on is $2000. any suggestions as to how to make this happen would be greatly appreciated.

    thanks much.
    groetjes Joyce

  • #2
    You could always use a cloud service, e.g., Amazon. The pricing there for a large memory "instance" is $1.00/hour or approximately $0.50/hour if you are willing to settle for 'spot' time. There is some charge for transferring of data but, still, I suspect that you could get your work done with your $2000.

    However if you want to have your own computer then ignore the 8-core requirement -- you can put up with the slow run time. While you can get a 8GB memory, 2 TB disk machine for under $2000, the 32GB requirement does exceed your budget. There is no way around this. But, honestly if you have that many data sets that you need that much memory ... well, you need to think about the balance between your biological budget and your computational budget.

    Comment


    • #3
      If your genome is well annotated and you're only interested in gene counts (as opposed to alternative splicing or transcript assembly), then paired-end reads provide little additional information. Why not use 50bp single-end reads and, with the money you save, buy a better computer? Westerman is right - you're going to spend a lot more time analyzing the data than generating it, so having sufficient computer power is critical.

      Comment


      • #4
        Since the first two respondents did not address this so I will. This may or may not apply.

        Are you planning to put a computer together from parts to do this? It is generally a good learning experience, if you have not done this sort of thing before, but can be challenging if you are not very tech savvy. You will also need to be comfortable using some variant of *nix since most of the open source software for data analysis is *nix-based.

        Have you considered asking someone for help. Do you have access to a local core facility that can perhaps do some of the analysis for you. There is also the option of using an online tool like galaxy (http://usegalaxy.org).

        How many samples are you going to have?

        Comment


        • #5
          Originally posted by pippi View Post
          hi all,
          I am running RNAseq using the illumina Hiseq platform; ~50bp paired end reads, so FASTQ files estimated to be about 35-40Gb each. it is a big project. I have been told I will need at least 2Tb of disk space, 32Gb ram and at 8 cores to run the analysis.
          however, my budget for a computer to run the analysis on is $2000. any suggestions as to how to make this happen would be greatly appreciated.

          thanks much.
          groetjes Joyce

          First question is whether you have a reference genome or not. That makes a big difference in how you run the analysis, as I explained here (http://www.homolog.us/blogs/2011/07/...ence-analysis/).

          Assuming that you have a reference genome (although it is not clear from your description), the second question is the size of the reference genome. If you are working on a mid-sized genome, I do not see why you will need 32Gb RAM and 8 cores. Bowtie needs lot less firepower.

          If you do not have a reference genome, we are talking about a different ball-game, because you are essentially trying to assemble the genes from HiSeq data. De novo assemblers typically need lot more RAM, but the size of reference genome still matters.

          People tend to assume that they will need more RAM, if they have more sequencing reads. What really changes the RAM limit is the number of distinct k-mers in the sequence. If you have only one gene sampled 500 million times to create 500 million sequencing reads, the RAM requirement is minimal. Please try these three comments (http://www.homolog.us/blogs/2011/08/...bruijn-graphs/, http://www.homolog.us/blogs/2011/07/...ijn-graphs-ii/, http://www.homolog.us/blogs/2011/07/...uijn-graphs-i/) for more explanation.

          So, it all essentially boils down to the real or tentative size of genome you are working on.
          http://homolog.us

          Comment


          • #6
            One thing I forgot to add is that if you are planning to purchase a server, look into the second hand market and then add RAM yourself. I bought two very good servers (one IBM and one HP Proliant, both quad core) last year for less than $600 total. When small companies shut down, they often give away computer servers for the price of hauling.
            Last edited by samanta; 08-24-2011, 01:16 PM.
            http://homolog.us

            Comment


            • #7
              thanks for your replies and suggestions.
              to answer some questions.
              I do not want to use a core for the analysis, I really want to get to play with the data myself. unfortunately, I have no influence over how the budget is allocated (although am trying to convince the people in charge of the budget to free up some more for the analysis).

              as for the project set up; we will use paired end reads, samples are of human origin, we plan to cover the entire transcriptome (>30 million reads per sample), I expect to have 100+ samples (= 30+ different samples in triplicate, at least).

              I hope this clarifies things a bit.
              -> does this mean I could get away with less RAM/ processors/ memory?
              -> where best to look for a computer (brand) if not building one myself?

              thanks much.

              Comment


              • #8
                I agree with GenoMAx, galaxy would be a good option to not have your infrastructure and set up an account with one of the places hosting galaxy for NGS data analysis
                --
                bioinfosm

                Comment


                • #9
                  I have been using the Galaxy AMI on the Amazon cloud to off load some of our computational load and you can really get a lot of analysis for a small monetary cost. The real cost is in storage on S3 so it is helpful to remove raw files once they have been processed.

                  You may find it useful to use the cloud for mapping and then run the analysis on a local machine one you have files of a much more manageable format.

                  Also when you build a machine make sure you add some redundant drives for backup.

                  Comment


                  • #10
                    This is idiotic! You're spending >$50K on sequencing but can't spend more than $2K on the analysis hardware, you're really shooting yourself in the foot.

                    The sequencing is the easy bit and doing the analysis is the hard bit. You must get yourself a machine with a decent amount of RAM (at least 2Gb per core) and lots of disk, and as many cores as you can afford. However, the most important is reliable back-up. $2K is woefully inadequate for any serious bioinformatics work let alone NGS.

                    I must say this sounds like a poorly managed project!

                    Comment


                    • #11
                      Originally posted by chris View Post
                      This is idiotic! You're spending >$50K on sequencing but can't spend more than $2K on the analysis hardware, you're really shooting yourself in the foot.
                      Why call him names? He already said - "unfortunately, I have no influence over how the budget is allocated (although am trying to convince the people in charge of the budget to free up some more for the analysis)".

                      Are you going to tell his boss that he is managing the project poorly? I need similar help with some of my projects !!
                      http://homolog.us

                      Comment


                      • #12
                        Originally posted by chris View Post
                        This is idiotic! You're spending >$50K on sequencing but can't spend more than $2K on the analysis hardware, you're really shooting yourself in the foot.

                        The sequencing is the easy bit and doing the analysis is the hard bit. You must get yourself a machine with a decent amount of RAM (at least 2Gb per core) and lots of disk, and as many cores as you can afford. However, the most important is reliable back-up. $2K is woefully inadequate for any serious bioinformatics work let alone NGS.

                        I must say this sounds like a poorly managed project!
                        It is typically the case that the analysis side of the project is completely underestimated, be it man power, machine power, time requirements etc... Even utilizing the best equipment, there can also be software limitations in unlocking the equipments true capabilities.

                        I look forward to when the analysis end will catch back up to the capabilities of the new ngs machines

                        Comment


                        • #13
                          Hi,

                          You can easily build a computer with six cores and 16GB memory in the $1000 range nowadays. If any of you needs a shopping list, I'd glad to share.

                          Comment


                          • #14
                            Originally posted by pippi View Post
                            thanks for your replies and suggestions.
                            to answer some questions.
                            I do not want to use a core for the analysis, I really want to get to play with the data myself. unfortunately, I have no influence over how the budget is allocated (although am trying to convince the people in charge of the budget to free up some more for the analysis).

                            as for the project set up; we will use paired end reads, samples are of human origin, we plan to cover the entire transcriptome (>30 million reads per sample), I expect to have 100+ samples (= 30+ different samples in triplicate, at least).

                            I hope this clarifies things a bit.
                            -> does this mean I could get away with less RAM/ processors/ memory?
                            -> where best to look for a computer (brand) if not building one myself?

                            thanks much.

                            Yes. You can get away with less RAM, because you have a reference genome and all you will be doing is mapping (Bowtie), not de novo assembly (Velvet/Oases). Definitely you need not look for hundreds of gigs of RAM, which is one of the biggest contributor to computer prices for bioinformatics.

                            Points to think about -

                            i) Is your computer the only place where the data will be stored, or does this project has another place for saving data long term? I would use some redundancy in storage of at least the raw data files, because if they are lost due to disk failure, all $50K or $500K of experimental data are gone. That is catastrophic loss.

                            ii) Are you flexible with your time? Is it fine with you, if a mapping step runs for 8 hours instead of 1 hour? Or does this project need to be finished by next week?

                            iii) Can you farm out some of the time or hardware-intensive calculations to Amazon cloud?

                            I bought one of my quad core, 10Gb RAM, 1TB hard drive IBM server for ~$500. $2000 goes a long way these days, if you look around.
                            Last edited by samanta; 09-06-2011, 02:54 PM.
                            http://homolog.us

                            Comment


                            • #15
                              Originally posted by samanta View Post
                              Why call him names? He already said - "unfortunately, I have no influence over how the budget is allocated (although am trying to convince the people in charge of the budget to free up some more for the analysis)".

                              Are you going to tell his boss that he is managing the project poorly? I need similar help with some of my projects !!
                              I'm not calling him names, just the project planners.

                              Are many projects really this short-sighted?? I know I'm lucking working in a dept which values Bioinformatics and backs that up with a 500 core cluster, fast network, robust back-ups and a team of IT support. But seriously, do many projects expect top-class informatics to be done on homebuilt desktops?!

                              If I was doing this myself from scratch, I'd have small rack with 2-4 compute nodes (each quad core, 8-16Gb RAM, 1Tb disk) for doing all the heavy lifting and then a basic desktop workstation for doing the analysis and visualisation. I doubt this would set you back much more that $10K, would be very flexible for any future tasks and can be easily expanded when more money is available.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              33 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              34 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X