SEQanswers

Go Back   SEQanswers > Applications Forums > Metagenomics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Time & Cost of using 1 MiSeq Machine to do 16s rDNA (V2/V4) Seq on 300 Samples/Month vs92 Illumina/Solexa 28 10-09-2015 12:07 PM
Tophat Run time bassu Bioinformatics 11 09-17-2013 10:59 PM
454 vs MiSeq - Need to optimize time/cost for 16s analysis of 300 Stool Samples/month vs92 Metagenomics 15 12-01-2012 07:35 AM
Great 16S Run! Anthony.287 454 Pyrosequencing 17 07-19-2012 02:36 PM
SOLiD 4 run time Bruins SOLiD 3 03-04-2010 01:34 AM

Reply
 
Thread Tools
Old 03-26-2013, 02:45 PM   #1
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default QIIME constraints and time to run for 16S Illumina

Hi, I am looking to help analyse a dataset that consists of a number of 16S (V3-V4) amplicons sequenced on a MiSeq (in triplicate) from a number of data points, with a view to getting the relative abundances of OTUs in the samples.
I was wondering a few things:
1) Are there any memory and / or processor-speed constraints when running QIIME, i.e. can you run every step in the pipeline on a reasonably good desktop (i7) with reasonable memory (8GB) in a reasonable time, or do you need to start using clusters or high-memory computers?
2)Any estimates on how long (for example) it would take to run a simple comparison between sample A and sample B (triplicated) to get relative abundances of OTUs and the appropriate pretty graphs?
3) Are there any common stumbling blocks that people tend to encounter, either in the setup or the running? I have already set it up natively on Ubuntu 12.04 LTS with only a few problems, and I understand the VDIs are good too.

Any help or feedback appreciated. Cross-posted in "Bioninformatics" since I'm not sure which is the most appropriate forum.
danwiththeplan is offline   Reply With Quote
Old 03-26-2013, 08:36 PM   #2
bstamps
Member
 
Location: University of Oklaoma

Join Date: Oct 2012
Posts: 40
Default

1) If this is a single MiSeq run, and the entire run was devoted to 16s, expect to use at or greater than 32GB of RAM. Our collaborators attempted to do this on the QIIME virtualbox, and had it crash until they fed it more than 20GB of memory. If you don't mind it, try using the AMI that QIIME has on Amazon EC2, or your local cluster (YMMV on that one, it's been hell getting it setup on ours)
2)The longest/most memory intensive steps are initial clustering (split_libraries_fastq.py) and OTU picking- on the QIIME website they suggest subsampled OTU picking with these types of datasets. I usually do it with the SILVA dataset in a non-subsampled method, but I have lots of memory. Everything else will take you 45 minutes to an hour to complete. That's the fun bit with QIIME.
3) Stumbling blocks? I'd say the lack of UNIX command line experience, lack of understanding of some of the metrics, and not enough patience when an error gets kicked up.

As an aside I'd question why you only ran 6 samples- we typically run >96 on each run, with the amplicon receiving "only" 40-50 percent of the run, which still gives us 50-70,000 reads per library. Illumina has given us the ability to overkill our samples, so we try to knock it down to something reasonable (We notice rarefaction curves leveling off pretty fast even at this depth). The "rare biosphere" turned out to mostly be sequencing artefact, so from my end I'd say careful chasing ghosts if you are trying to call something from the QIIME data beyond OTUs that represent < 5% of your libraries.
bstamps is offline   Reply With Quote
Old 03-26-2013, 08:45 PM   #3
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default

Thanks, very very useful answer

I'm not running 6 samples, that was just a simple example. I'm advising others on a run that hasn't been done yet, it will be a single MiSeq run multiplexed to ~90 samples, with at least 50% PhiX spikes, probably more like 60%. I did rough calculations on the expected number of sequences that we'd get, they correspond with your results (which is good to know). We will also have to use subsampled OTU picking (or at least de novo OTU picking of some kind) since it's not a well studied biome.
Sounds like running it on a desktop is going to be a problem due to memory. Our cluster might just handle it, but only just. Good to catch this problem now. Is there any way you could send me a private message with the email of the collaborator you mentioned, could I contact them?

Last edited by danwiththeplan; 03-26-2013 at 08:48 PM. Reason: clarity
danwiththeplan is offline   Reply With Quote
Old 03-26-2013, 11:45 PM   #4
RCJK
Senior Member
 
Location: Australia

Join Date: May 2009
Posts: 155
Default

Just an fyi, with the new MiSeq software out this week they claim you can get away with only a 5% PhiX spike-in for low diversity libraries.
RCJK is offline   Reply With Quote
Old 03-27-2013, 02:24 PM   #5
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default

Quote:
Originally Posted by RCJK View Post
Just an fyi, with the new MiSeq software out this week they claim you can get away with only a 5% PhiX spike-in for low diversity libraries.
Yes, the reps have been making that claim, new software that picks the clusters more accurately.. it sounds good but I'll let someone else test it
danwiththeplan is offline   Reply With Quote
Reply

Tags
16s, metagenomics, miseq, qiime

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:40 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO