Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • QIIME constraints and time to run for 16S Illumina

    Hi, I am looking to help analyse a dataset that consists of a number of 16S (V3-V4) amplicons sequenced on a MiSeq (in triplicate) from a number of data points, with a view to getting the relative abundances of OTUs in the samples.
    I was wondering a few things:
    1) Are there any memory and / or processor-speed constraints when running QIIME, i.e. can you run every step in the pipeline on a reasonably good desktop (i7) with reasonable memory (8GB) in a reasonable time, or do you need to start using clusters or high-memory computers?
    2)Any estimates on how long (for example) it would take to run a simple comparison between sample A and sample B (triplicated) to get relative abundances of OTUs and the appropriate pretty graphs?
    3) Are there any common stumbling blocks that people tend to encounter, either in the setup or the running? I have already set it up natively on Ubuntu 12.04 LTS with only a few problems, and I understand the VDIs are good too.

    Any help or feedback appreciated. Cross-posted in "Bioninformatics" since I'm not sure which is the most appropriate forum.

  • #2
    1) If this is a single MiSeq run, and the entire run was devoted to 16s, expect to use at or greater than 32GB of RAM. Our collaborators attempted to do this on the QIIME virtualbox, and had it crash until they fed it more than 20GB of memory. If you don't mind it, try using the AMI that QIIME has on Amazon EC2, or your local cluster (YMMV on that one, it's been hell getting it setup on ours)
    2)The longest/most memory intensive steps are initial clustering (split_libraries_fastq.py) and OTU picking- on the QIIME website they suggest subsampled OTU picking with these types of datasets. I usually do it with the SILVA dataset in a non-subsampled method, but I have lots of memory. Everything else will take you 45 minutes to an hour to complete. That's the fun bit with QIIME.
    3) Stumbling blocks? I'd say the lack of UNIX command line experience, lack of understanding of some of the metrics, and not enough patience when an error gets kicked up.

    As an aside I'd question why you only ran 6 samples- we typically run >96 on each run, with the amplicon receiving "only" 40-50 percent of the run, which still gives us 50-70,000 reads per library. Illumina has given us the ability to overkill our samples, so we try to knock it down to something reasonable (We notice rarefaction curves leveling off pretty fast even at this depth). The "rare biosphere" turned out to mostly be sequencing artefact, so from my end I'd say careful chasing ghosts if you are trying to call something from the QIIME data beyond OTUs that represent < 5% of your libraries.

    Comment


    • #3
      Thanks, very very useful answer

      I'm not running 6 samples, that was just a simple example. I'm advising others on a run that hasn't been done yet, it will be a single MiSeq run multiplexed to ~90 samples, with at least 50% PhiX spikes, probably more like 60%. I did rough calculations on the expected number of sequences that we'd get, they correspond with your results (which is good to know). We will also have to use subsampled OTU picking (or at least de novo OTU picking of some kind) since it's not a well studied biome.
      Sounds like running it on a desktop is going to be a problem due to memory. Our cluster might just handle it, but only just. Good to catch this problem now. Is there any way you could send me a private message with the email of the collaborator you mentioned, could I contact them?
      Last edited by danwiththeplan; 03-26-2013, 07:48 PM. Reason: clarity

      Comment


      • #4
        Just an fyi, with the new MiSeq software out this week they claim you can get away with only a 5% PhiX spike-in for low diversity libraries.

        Comment


        • #5
          Originally posted by RCJK View Post
          Just an fyi, with the new MiSeq software out this week they claim you can get away with only a 5% PhiX spike-in for low diversity libraries.
          Yes, the reps have been making that claim, new software that picks the clusters more accurately.. it sounds good but I'll let someone else test it

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 03-27-2024, 06:37 PM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-27-2024, 06:07 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          68 views
          0 likes
          Last Post seqadmin  
          Working...
          X