SEQanswers

Go Back   SEQanswers > Applications Forums > Metagenomics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Time & Cost of using 1 MiSeq Machine to do 16s rDNA (V2/V4) Seq on 300 Samples/Month vs92 Illumina/Solexa 28 10-09-2015 12:07 PM
Webinar of the Month: Pathway Analysis in Avadis NGS Strand SI Events / Conferences 1 10-21-2014 03:28 AM
300 Stool Samples - 16s rDNA amplification & sequencing for identifying bacteria vs92 454 Pyrosequencing 5 11-20-2013 09:28 AM
Maximum number of cycles from 300-cycle MiSeq kit... ECO Illumina/Solexa 13 11-09-2012 05:51 AM
Deep Analysis of 300 Samples on 454 gavin.oliver General 4 03-04-2010 04:28 AM

Reply
 
Thread Tools
Old 07-28-2012, 10:45 AM   #1
vs92
Member
 
Location: Cambridge, MA

Join Date: Jul 2012
Posts: 10
Default 454 vs MiSeq - Need to optimize time/cost for 16s analysis of 300 Stool Samples/month

I am trying to plan for a large experiment of 300 human stool samples over the next 1 month to identify the population of bacterial species living in each stool sample (using either 1 454 sequencer machine or 1 miSeq machine). I could very much use your insights on doing this effectively. Based on time and cost, should I go with Illumina MiSeq or the 454 sequencer, especially if I have to do 300 samples again per month for the next few months? Thanks so much for your insights.

Last edited by vs92; 08-10-2012 at 03:24 AM.
vs92 is offline   Reply With Quote
Old 07-28-2012, 12:32 PM   #2
snetmcom
Senior Member
 
Location: USA

Join Date: Oct 2008
Posts: 158
Default

I would still argue that accurate 16s sequencing still requires long reads that just aren't available on other platforms. There are so many people struggling to make this wok with shorter read platforms. There are a few papers that claim to have accomplished this with PE short reads, but it's not an easy path on the analysis side.
snetmcom is offline   Reply With Quote
Old 07-29-2012, 09:58 AM   #3
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by snetmcom View Post
I would still argue that accurate 16s sequencing still requires long reads that just aren't available on other platforms. There are so many people struggling to make this wok with shorter read platforms. There are a few papers that claim to have accomplished this with PE short reads, but it's not an easy path on the analysis side.
"Other"? Than 454, you mean?

We have discontinued use of our 454, so I guess I should argue that it can be done on a MiSeq.

With the v2 MiSeq chemistry upgrade to be released soon, 2x250 base reads should make quick work of any amplicon pool with inserts less than 400 bp. Won't using PANDA, or something similar, give you sufficient length sequences to get the job done? I think we have had a good result using v1 MiSeq (2x150 base reads), see below.

But there are fairly complex technical considerations lab-side as well:

(1) Illumina has done a pretty clunky job of telling us how to make the equivalent version of "fusion primers" for the MiSeq. Possibly because various technical issues specific to Illumina sequencers making this methodology less than ideal for their platform.

(2) Number of bar codes available. This would be an issue for 300 samples. The 454 offers well over 100 official bar codes (MIDs). So you could possibly get by with a 3 regions on a 454, reusing those 100 bar codes. Where a "region" might be a region on an 8 gasket PTP. Or you could use one of the commonly available sets of 454 bar code sequences published by non-Roche sources. Of course buying the oligos would cost quite a bit.
Illumina has a couple of "dual" index adapter sequences that could be developed. That gets you out to 96 (8x12). What you really want for this project is 384, (16x24) though. Are the TruSeqHT and NexteraXT indexes compatible? If so you would have 16x24 indexes right there.

(3) Our only attempt, thus far, to do a 16S (v3 loop) run on the MiSeq did appear to work. But this success doesn't really address snetcom's objection, because analysis is ongoing. But I am pretty sure it will be good.

(4) Balance and diversity. Illumina's throbbing red Achilles heel comes to the foreground in amplicon work. The above mentioned 16S v3 loop amplicons would have failed to produce a usable cluster, in all likelihood had we not spiked in 50% genomic libraries as "ballast" into the same run. But with ballast, it seems to work fine. This is a little tricky, I think, because to get good demultiplexing we chose to run 2 genomic libraries with "balanced" indexes -- 6 and 12. That meant the investigator was stuck using only 22 indexes because we were camped out on two of the TruSeq 24.

(5) If you don't mind paying more, you could just buy a NexteraXT kit and create "tagmentation" fragmented amplicons. These do appear to avoid the balance and diversity issues. But downstream analysis may be an issue.

(6) Save money on oligo cost using "step-out" PCR. Whatever your "locus specific" primers are, just append the post-index part of an Illumina adapter to it. Then add the index-containing part of the adapter in a "step out" PCR.

For this you synthesize another set of oligos that overlap your locus specific primers just in the TruSeq adapter part. Then you have "factored" the fusion primer into to two segments that you combine multiplicatively.

That is, say, you are interested in a single locus. You amplify with your internal locus specific primers, then reamplify your products with TruSeq adapter oligoes. 24 available from the standard TruSeq set (48 if you want to use the small-RNA set). Instead of needing to synthesize 25 80-mers that will only be usable for this one purpose, you can thus synthesize 26 60-mers. The 24 of which are the TruSeq external adapter part can be reused for any other experiment.

Where it really gets powerful, is if you use dual index adapters. There, if you want 96 different indexes, you only need to synthesize 20 (8 for one side, 12 for the other). Just use the amplicon dual index sequences in the Illumina Oligo letter. Then, the obvious extension is to go to 40 adapters, 16 one side, 24 the other. Then you have up to 384 index pairs available. I don't know why Illumina has not already jumped on this obvious application.

But beware, primer-dimers are your enemies here as they are for the 454 amplicons. Particularly pernicious because they can anneal to the full-length products making them impossible to completely remove, even with a gel cut.

Anyway, by the end of the year I am confident 454 amplicons will seem like a bad dream having phased into complete obsolescence. But as things stand now it is difficult to say 454 amplicons are yet out-moded.

--
Phillip
pmiguel is offline   Reply With Quote
Old 07-29-2012, 10:08 AM   #4
vs92
Member
 
Location: Cambridge, MA

Join Date: Jul 2012
Posts: 10
Default Re: snetmcom

I am working towards sophisticated analysis, so I'd like to go with the shortest possible reads and make the analysis challenging - that is not an issue; especially if I can minimize cost and time. Can you please post the papers you have mentioned - that have used shorter reads?
vs92 is offline   Reply With Quote
Old 07-29-2012, 10:33 AM   #5
vs92
Member
 
Location: Cambridge, MA

Join Date: Jul 2012
Posts: 10
Default RE: pmiguel

Hi Phillip,

Thank you so much for the detailed response. I am also leaning towards procuring the MiSeq. I had some follow-up questions to the points you've mentioned, and hope you'd take the time to reply:

1. Do you know when the v2 MiSeq chemistry upgrade will be released? Also, with this, what would be the cost and time required to focus on a 300 or 400 nucleobase region (like V2 or V4 of 16 rDNA)? My goal is to get the sequencing time to 4-6 hours duration, which seems unlikely with 2x250 base reads (isn't the time for running this on MiSeq currently close to 27 hours?).

2. Could you please provide the link to PANDA (or the other similar tools) you have mentioned? I'd like to get the length of the sequences down to 30-40 nucleobases, so that the goal of 4-6 hour run is feasible on MiSeq... any advice on going about this would be very helpful.

3. How much does the v1 MiSeq chemistry reagents cost? Do you know how much more v2 MiSeq chemistry reagents are likely to cost?

4. Instead of using universal primers, is there merit in going with a set of different primers targeted to different loci? Not sure whether MiSeq permits multiplexing such primers, and if so what the practical maximum number of primer sets (max number of loci one can focus on) is.

5. Do you know how I can check for the compatibility of the TruSeqHT and NexteraXT indexes you have mentioned? I'm assuming that the bacteria in the stool samples I'll be analyzing are very similar to those published from the stool samples of healthy patients of the human microbiome project recently. If I an get 16x24 = 384 samples in 1 month, that would be fantastic! By the way, is the number 384 for 1 run or for the entire 1 month period? I thought a single run takes just ~ 1 day, so if one can do 384 different samples in a single run, shouldn't this be the throughput for one day itself? Perhaps I am missing something key here!

Also, I do have extra grant money to get the NexteraXT kit, so I should definitely be able to create "tagmentation" fragmented amplicons -- could you refer me to some literature on how to do this, and why this would add to the complexity of the downstream analysis? Thanks!


6. Very glad to note that your attempt to do the 16S (v3 loop) run on the MiSeq was positive and the analysis is ongoing - good luck on this front! What length of V3 loop did you guys go after? Was it as short as 30-40 nucleobases? Would love to know how you developed this approach.

7. "spiked in 50% genomic libraries as "ballast" into the same run" -- could you please refer me to some link or literature reference that describes why adding in Ballasts into a run would increase likelihood of producing a usable cluster? Also why would having 6 and 12 balances indices help with demultiplexing (on 2 TrueSeq24s)?

8. Thanks for your very helpful advice on Step Out PCR - will definitely do that! Could you give me your email address - I'd love to stay connected and potentially collaborate on my project.

Thanks!
vs92 is offline   Reply With Quote
Old 07-29-2012, 10:43 AM   #6
snetmcom
Senior Member
 
Location: USA

Join Date: Oct 2008
Posts: 158
Default

I'm with you phillip. 454 is pricey. I just dont see the same level of confidence in any of the Illumina data yet. It seems like it's a stretch, and it's quite the analysis headache. I'm intrigued about 2x250, but I am making zero assumptions until i actually see it. Most 16s projects require a high level of accuracy, and even the 150 Miseq data isn't that great towards the end. If you make it work, i'll be right behind you.
snetmcom is offline   Reply With Quote
Old 07-30-2012, 02:42 AM   #7
MrGuy
Member
 
Location: earth

Join Date: Mar 2009
Posts: 68
Default

We also ditched our 454 as the machine was finicky and really expensive to run.

Quote:
Originally Posted by pmiguel View Post
(4) Balance and diversity. Illumina's throbbing red Achilles heel comes to the foreground in amplicon work. The above mentioned 16S v3 loop amplicons would have failed to produce a usable cluster, in all likelihood had we not spiked in 50% genomic libraries as "ballast" into the same run. But with ballast, it seems to work fine. This is a little tricky, I think, because to get good demultiplexing we chose to run 2 genomic libraries with "balanced" indexes -- 6 and 12. That meant the investigator was stuck using only 22 indexes because we were camped out on two of the TruSeq 24.
... and this is the big reason why we went ion. The majority of our work is amplicon and targeted (ie, pcr) applications with the occasional whole genome with reference (<100mb) so we can design pcrs. The amplicon applications are more for population diversity that is not possible to resolve with sanger (ie <15% frequency).

Other reasons were:
-run cost vs 454 was dramatically lower
-ability to multiplex outside of what "the company" says
-error profile is predictable -- unlikely substitution errors as in sbs chemistries. Homopolymers are easier to work with in this regard and occur in specific locations.
-potential for long reads. >300bp is imminent, but I have yet to see the error profile at the end of those reads... vacation, you know.
MrGuy is offline   Reply With Quote
Old 07-30-2012, 03:00 AM   #8
TonyBrooks
Senior Member
 
Location: London

Join Date: Jun 2009
Posts: 298
Default

I'm currently looking into dual indexing for metagenomic runs on MiSeq with regards to the promised 250bp paired end runs by year end. Q scores will be vastly better with this approach as the poorer 3' scores will be boosted due the fact they will overlap (and hence be sequenced twice) on a 400bp amplicon.
Diversity is still an issue, but we may try some tricks to increase this (maybe design PCRs to both strands, use a few amplicons, custom seq primers to avoid sequencing the primer, lower cluster densities). Our Illumina rep did mention that they were trying to reduce the diversity problem, but wouldn't tell me how they are planning to do this.
TonyBrooks is offline   Reply With Quote
Old 07-30-2012, 04:38 AM   #9
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by vs92 View Post

Hi Phillip,

Thank you so much for the detailed response. I am also leaning towards procuring the MiSeq. I had some follow-up questions to the points you've mentioned, and hope you'd take the time to reply:

1. Do you know when the v2 MiSeq chemistry upgrade will be released? Also, with this, what would be the cost and time required to focus on a 300 or 400 nucleobase region (like V2 or V4 of 16 rDNA)? My goal is to get the sequencing time to 4-6 hours duration, which seems unlikely with 2x250 base reads (isn't the time for running this on MiSeq currently close to 27 hours?).
Word is, that it will be any day now. But I don't know.


Quote:
Originally Posted by vs92 View Post
2. Could you please provide the link to PANDA (or the other similar tools) you have mentioned? I'd like to get the length of the sequences down to 30-40 nucleobases, so that the goal of 4-6 hour run is feasible on MiSeq... any advice on going about this would be very helpful.
Google not working for you?
Here is the link.
Quote:
Originally Posted by vs92 View Post

3. How much does the v1 MiSeq chemistry reagents cost? Do you know how much more v2 MiSeq chemistry reagents are likely to cost?
They are slated to be less -- at least initially. I think they are shooting for around $700, $900, $1000 for the 50, 300 and 500 cycle kits.

Quote:
Originally Posted by vs92 View Post
4. Instead of using universal primers, is there merit in going with a set of different primers targeted to different loci? Not sure whether MiSeq permits multiplexing such primers, and if so what the practical maximum number of primer sets (max number of loci one can focus on) is.
I have not done this yet. But according to the MiSeq instrument manual it is possible.

Obviously, if you want to save time by avoiding the sequence of your PCR locus-specific primers, then using custom primers would be desirable. But then you are stepping off the Illumina QC/QA path.

Currently you can do custom primers for the read 1 , first index read and the read2 primers. 3 extra ports in the reagent cassettes for you to add 600 ul of your custom primer at 0.5 uM. The MiSeq runs hotter than the HiSeq so you want your Tm's pretty high -- like 65 oC.

Quote:
Originally Posted by vs92 View Post
5. Do you know how I can check for the compatibility of the TruSeqHT and NexteraXT indexes you have mentioned? I'm assuming that the bacteria in the stool samples I'll be analyzing are very similar to those published from the stool samples of healthy patients of the human microbiome project recently. If I an get 16x24 = 384 samples in 1 month, that would be fantastic! By the way, is the number 384 for 1 run or for the entire 1 month period? I thought a single run takes just ~ 1 day, so if one can do 384 different samples in a single run, shouldn't this be the throughput for one day itself? Perhaps I am missing something key here!
Depends on how many reads you need. Again, not something I have tried, but in principle it should work. (Unless you are very young that last clause should engender a certain amount of fear...) Also, I don't know what sort of read depth you will need for your purposes. Currently you can get about 5 million read-pairs per run. This is slated to double with v2 reagents. But if you are doing a single amplicon per run, you will need to drop 1/2 your reads on "ballast" (a couple of genomic libraries) to deal with Illumina's low-diversity/high-bias weakness.

Quote:
Originally Posted by vs92 View Post
Also, I do have extra grant money to get the NexteraXT kit, so I should definitely be able to create "tagmentation" fragmented amplicons -- could you refer me to some literature on how to do this, and why this would add to the complexity of the downstream analysis? Thanks!
Just go to the Illumina site and read their literature. Register for an icom or myillumina login so you can access their manuals and other info.

By more complicated, I mean that instead of all amplicons starting and ending at known positions you will have fragment libraries so your software pipeline needs to be able to deal with this.


Quote:
Originally Posted by vs92 View Post

6. Very glad to note that your attempt to do the 16S (v3 loop) run on the MiSeq was positive and the analysis is ongoing - good luck on this front! What length of V3 loop did you guys go after? Was it as short as 30-40 nucleobases? Would love to know how you developed this approach.

7. "spiked in 50% genomic libraries as "ballast" into the same run" -- could you please refer me to some link or literature reference that describes why adding in Ballasts into a run would increase likelihood of producing a usable cluster?
Again, go to the Illumina web site and register to get information on this. They have had webinars on the topic and also have written information about it. The term "ballast" is one I use, so don't expect that to be in their documentation.

Basically Illumina instruments are designed to sequence randomly fragmented genomic libraries. Anything varying from that ideal of randomness causes issues for their software. However over the years they have gotten somewhat better at tolerating low diversity/higher bias. But it is always there.

Quote:
Originally Posted by vs92 View Post
Also why would having 6 and 12 balances indices help with demultiplexing (on 2 TrueSeq24s)?

8. Thanks for your very helpful advice on Step Out PCR - will definitely do that! Could you give me your email address - I'd love to stay connected and potentially collaborate on my project.
You can email me through the site or google me. I don't try to obscure my email address. That said, I prefer to discuss sequencing on this site where it might help others and others might correct our mistakes.

Quote:
Originally Posted by vs92 View Post
Thanks!
Yeah, part of the process. If you would though, please don't post what amounts to the same message to multiple forums. Initially you did that -- it is a little abusive.

--
Phillip
pmiguel is offline   Reply With Quote
Old 08-10-2012, 03:26 AM   #10
vs92
Member
 
Location: Cambridge, MA

Join Date: Jul 2012
Posts: 10
Default

Thanks, Philip - I'll look into the PANDA paper and will also subscribe to the Illumina website you mentioned.
vs92 is offline   Reply With Quote
Old 08-10-2012, 04:53 AM   #11
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

Hi guys

We have posted a blog post which discusses some of the issues relating to low-diversity amplicons on the MiSeq, and a useful workaround for improving performance:

http://pathogenomics.bham.ac.uk/blog...llumina-miseq/

Hope it is useful
nickloman is offline   Reply With Quote
Old 10-23-2012, 10:35 PM   #12
capsicum
Member
 
Location: Earth

Join Date: Jul 2012
Posts: 13
Default

Hi Nick,

How much difference do you see when hardcoding the matrix and phasing values? Illumina is telling me not to expect anything significant.
capsicum is offline   Reply With Quote
Old 11-22-2012, 01:08 PM   #13
Jean
Member
 
Location: Canada

Join Date: Nov 2008
Posts: 37
Default

Hi Nick.
I've read your blog post from Aug 2012 and I'm just wondering about what your current strategies are for amplicon sequencing on the MiSeq? Have you found a better approach?
Jean is offline   Reply With Quote
Old 11-30-2012, 06:24 AM   #14
LVAndrews
Member
 
Location: Flagstaff, AZ

Join Date: Sep 2012
Posts: 55
Default 16S sequencing on MiSeq

Hi All,

We are doing 16S sequencing on a MiSeq using the protocol in Caporaso et al, 2012 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3400413/). It works well and you simply add your own sequencing and indexing primers to the reagent cartridge to get your data to work. The sequencing and index primers are contained in the supplementary materials. This protocol targets the V4 region with 515F/806R primers which capture both bacterial and archaeal taxa. We are pooling 192 samples per run on our instrument and getting back on average 5000 reads per sample. The cost per sample is about $12. The same data output was costing about 10 times this when using 454 previously. Using the 2x150 kit to cover amplicons that are about 252bp. We hope to make use of the 2x250 reads to look at fungal data in the near future as well.

Hope this isn't too late to help your project.

Andy
LVAndrews is offline   Reply With Quote
Old 12-01-2012, 12:48 AM   #15
capsicum
Member
 
Location: Earth

Join Date: Jul 2012
Posts: 13
Default

Hi Andy,

Are you using a V2 or V2 MiSeq? Which kit versions are you using? What yields do you get and what are the quality metrics like?

Cheers,

Scott.
capsicum is offline   Reply With Quote
Old 12-01-2012, 07:35 AM   #16
LVAndrews
Member
 
Location: Flagstaff, AZ

Join Date: Sep 2012
Posts: 55
Default

The last run was on a v2 MiSeq with v1 chemistry. 1,000,000 usable reads at >q20, which if clustering your sequences at 97% (to account for sequencing errros) should be accurate enough for most things. Next run will be v2. I'll post the metrics for that run in a week or two.
LVAndrews is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:35 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO