SEQanswers (
-   Metagenomics (
-   -   QIIME constraints and time to run for 16S Illumina (

danwiththeplan 03-26-2013 02:45 PM

QIIME constraints and time to run for 16S Illumina
Hi, I am looking to help analyse a dataset that consists of a number of 16S (V3-V4) amplicons sequenced on a MiSeq (in triplicate) from a number of data points, with a view to getting the relative abundances of OTUs in the samples.
I was wondering a few things:
1) Are there any memory and / or processor-speed constraints when running QIIME, i.e. can you run every step in the pipeline on a reasonably good desktop (i7) with reasonable memory (8GB) in a reasonable time, or do you need to start using clusters or high-memory computers?
2)Any estimates on how long (for example) it would take to run a simple comparison between sample A and sample B (triplicated) to get relative abundances of OTUs and the appropriate pretty graphs?
3) Are there any common stumbling blocks that people tend to encounter, either in the setup or the running? I have already set it up natively on Ubuntu 12.04 LTS with only a few problems, and I understand the VDIs are good too.

Any help or feedback appreciated. Cross-posted in "Bioninformatics" since I'm not sure which is the most appropriate forum.

bstamps 03-26-2013 08:36 PM

1) If this is a single MiSeq run, and the entire run was devoted to 16s, expect to use at or greater than 32GB of RAM. Our collaborators attempted to do this on the QIIME virtualbox, and had it crash until they fed it more than 20GB of memory. If you don't mind it, try using the AMI that QIIME has on Amazon EC2, or your local cluster (YMMV on that one, it's been hell getting it setup on ours)
2)The longest/most memory intensive steps are initial clustering ( and OTU picking- on the QIIME website they suggest subsampled OTU picking with these types of datasets. I usually do it with the SILVA dataset in a non-subsampled method, but I have lots of memory. Everything else will take you 45 minutes to an hour to complete. That's the fun bit with QIIME.
3) Stumbling blocks? I'd say the lack of UNIX command line experience, lack of understanding of some of the metrics, and not enough patience when an error gets kicked up.

As an aside I'd question why you only ran 6 samples- we typically run >96 on each run, with the amplicon receiving "only" 40-50 percent of the run, which still gives us 50-70,000 reads per library. Illumina has given us the ability to overkill our samples, so we try to knock it down to something reasonable (We notice rarefaction curves leveling off pretty fast even at this depth). The "rare biosphere" turned out to mostly be sequencing artefact, so from my end I'd say careful chasing ghosts if you are trying to call something from the QIIME data beyond OTUs that represent < 5% of your libraries.

danwiththeplan 03-26-2013 08:45 PM

Thanks, very very useful answer :)

I'm not running 6 samples, that was just a simple example. I'm advising others on a run that hasn't been done yet, it will be a single MiSeq run multiplexed to ~90 samples, with at least 50% PhiX spikes, probably more like 60%. I did rough calculations on the expected number of sequences that we'd get, they correspond with your results (which is good to know). We will also have to use subsampled OTU picking (or at least de novo OTU picking of some kind) since it's not a well studied biome.
Sounds like running it on a desktop is going to be a problem due to memory. Our cluster might just handle it, but only just. Good to catch this problem now. Is there any way you could send me a private message with the email of the collaborator you mentioned, could I contact them?

RCJK 03-26-2013 11:45 PM

Just an fyi, with the new MiSeq software out this week they claim you can get away with only a 5% PhiX spike-in for low diversity libraries.

danwiththeplan 03-27-2013 02:24 PM


Originally Posted by RCJK (Post 100052)
Just an fyi, with the new MiSeq software out this week they claim you can get away with only a 5% PhiX spike-in for low diversity libraries.

Yes, the reps have been making that claim, new software that picks the clusters more accurately.. it sounds good but I'll let someone else test it :)

All times are GMT -8. The time now is 07:40 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.