View Single Post
Old 03-26-2013, 08:36 PM   #2
bstamps
Member
 
Location: University of Oklaoma

Join Date: Oct 2012
Posts: 40
Default

1) If this is a single MiSeq run, and the entire run was devoted to 16s, expect to use at or greater than 32GB of RAM. Our collaborators attempted to do this on the QIIME virtualbox, and had it crash until they fed it more than 20GB of memory. If you don't mind it, try using the AMI that QIIME has on Amazon EC2, or your local cluster (YMMV on that one, it's been hell getting it setup on ours)
2)The longest/most memory intensive steps are initial clustering (split_libraries_fastq.py) and OTU picking- on the QIIME website they suggest subsampled OTU picking with these types of datasets. I usually do it with the SILVA dataset in a non-subsampled method, but I have lots of memory. Everything else will take you 45 minutes to an hour to complete. That's the fun bit with QIIME.
3) Stumbling blocks? I'd say the lack of UNIX command line experience, lack of understanding of some of the metrics, and not enough patience when an error gets kicked up.

As an aside I'd question why you only ran 6 samples- we typically run >96 on each run, with the amplicon receiving "only" 40-50 percent of the run, which still gives us 50-70,000 reads per library. Illumina has given us the ability to overkill our samples, so we try to knock it down to something reasonable (We notice rarefaction curves leveling off pretty fast even at this depth). The "rare biosphere" turned out to mostly be sequencing artefact, so from my end I'd say careful chasing ghosts if you are trying to call something from the QIIME data beyond OTUs that represent < 5% of your libraries.
bstamps is offline   Reply With Quote