![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Time & Cost of using 1 MiSeq Machine to do 16s rDNA (V2/V4) Seq on 300 Samples/Month | vs92 | Illumina/Solexa | 28 | 10-09-2015 12:07 PM |
Tophat Run time | bassu | Bioinformatics | 11 | 09-17-2013 10:59 PM |
454 vs MiSeq - Need to optimize time/cost for 16s analysis of 300 Stool Samples/month | vs92 | Metagenomics | 15 | 12-01-2012 07:35 AM |
Great 16S Run! | Anthony.287 | 454 Pyrosequencing | 17 | 07-19-2012 02:36 PM |
SOLiD 4 run time | Bruins | SOLiD | 3 | 03-04-2010 01:34 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Auckland Join Date: Sep 2011
Posts: 72
|
![]()
Hi, I am looking to help analyse a dataset that consists of a number of 16S (V3-V4) amplicons sequenced on a MiSeq (in triplicate) from a number of data points, with a view to getting the relative abundances of OTUs in the samples.
I was wondering a few things: 1) Are there any memory and / or processor-speed constraints when running QIIME, i.e. can you run every step in the pipeline on a reasonably good desktop (i7) with reasonable memory (8GB) in a reasonable time, or do you need to start using clusters or high-memory computers? 2)Any estimates on how long (for example) it would take to run a simple comparison between sample A and sample B (triplicated) to get relative abundances of OTUs and the appropriate pretty graphs? 3) Are there any common stumbling blocks that people tend to encounter, either in the setup or the running? I have already set it up natively on Ubuntu 12.04 LTS with only a few problems, and I understand the VDIs are good too. Any help or feedback appreciated. Cross-posted in "Bioninformatics" since I'm not sure which is the most appropriate forum. |
![]() |
![]() |
![]() |
#2 |
Member
Location: University of Oklaoma Join Date: Oct 2012
Posts: 40
|
![]()
1) If this is a single MiSeq run, and the entire run was devoted to 16s, expect to use at or greater than 32GB of RAM. Our collaborators attempted to do this on the QIIME virtualbox, and had it crash until they fed it more than 20GB of memory. If you don't mind it, try using the AMI that QIIME has on Amazon EC2, or your local cluster (YMMV on that one, it's been hell getting it setup on ours)
2)The longest/most memory intensive steps are initial clustering (split_libraries_fastq.py) and OTU picking- on the QIIME website they suggest subsampled OTU picking with these types of datasets. I usually do it with the SILVA dataset in a non-subsampled method, but I have lots of memory. Everything else will take you 45 minutes to an hour to complete. That's the fun bit with QIIME. 3) Stumbling blocks? I'd say the lack of UNIX command line experience, lack of understanding of some of the metrics, and not enough patience when an error gets kicked up. As an aside I'd question why you only ran 6 samples- we typically run >96 on each run, with the amplicon receiving "only" 40-50 percent of the run, which still gives us 50-70,000 reads per library. Illumina has given us the ability to overkill our samples, so we try to knock it down to something reasonable (We notice rarefaction curves leveling off pretty fast even at this depth). The "rare biosphere" turned out to mostly be sequencing artefact, so from my end I'd say careful chasing ghosts if you are trying to call something from the QIIME data beyond OTUs that represent < 5% of your libraries. |
![]() |
![]() |
![]() |
#3 |
Member
Location: Auckland Join Date: Sep 2011
Posts: 72
|
![]()
Thanks, very very useful answer
![]() I'm not running 6 samples, that was just a simple example. I'm advising others on a run that hasn't been done yet, it will be a single MiSeq run multiplexed to ~90 samples, with at least 50% PhiX spikes, probably more like 60%. I did rough calculations on the expected number of sequences that we'd get, they correspond with your results (which is good to know). We will also have to use subsampled OTU picking (or at least de novo OTU picking of some kind) since it's not a well studied biome. Sounds like running it on a desktop is going to be a problem due to memory. Our cluster might just handle it, but only just. Good to catch this problem now. Is there any way you could send me a private message with the email of the collaborator you mentioned, could I contact them? Last edited by danwiththeplan; 03-26-2013 at 08:48 PM. Reason: clarity |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Australia Join Date: May 2009
Posts: 155
|
![]()
Just an fyi, with the new MiSeq software out this week they claim you can get away with only a 5% PhiX spike-in for low diversity libraries.
|
![]() |
![]() |
![]() |
#5 | |
Member
Location: Auckland Join Date: Sep 2011
Posts: 72
|
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
Tags |
16s, metagenomics, miseq, qiime |
Thread Tools | |
|
|