SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Maximum number of cycles from 300-cycle MiSeq kit... ECO Illumina/Solexa 13 11-09-2012 05:51 AM
choosing & validating RNA-Seq time course data normalization method(s) anandksrao Bioinformatics 6 10-20-2012 11:50 AM
Deep Analysis of 300 Samples on 454 gavin.oliver General 4 03-04-2010 04:28 AM
Helicos sequencing machine data & format required balamudiam Helicos / Direct Genomics 4 10-27-2009 09:28 PM
PubMed: Evaluation of the bacterial diversity in the feces of cattle using 16S rDNA b Newsbot! Literature Watch 0 07-26-2008 08:33 AM

Reply
 
Thread Tools
Old 07-28-2012, 10:35 AM   #1
vs92
Member
 
Location: Cambridge, MA

Join Date: Jul 2012
Posts: 10
Default Time & Cost of using 1 MiSeq Machine to do 16s rDNA (V2/V4) Seq on 300 Samples/Month

Hello Everyone, I am trying to plan for a large experiment of 300 human stool samples over the next 1 month to identify the population of bacterial species living in each stool sample (using 1 MiSeq machine). Based on time and cost, should I go with MiSeq or the 454 sequencer, especially if I have to do 300 samples again per month for the next few months? Thanks so much for your insights. I'll update this post every time I receive more feedback and information. Thanks!

Last edited by vs92; 08-10-2012 at 03:22 AM.
vs92 is offline   Reply With Quote
Old 07-29-2012, 08:46 AM   #2
koadman
Member
 
Location: Sydney, Australia

Join Date: May 2010
Posts: 65
Default

Well, this is a pretty massive question, but in our lab at UC Davis we have both 454 and MiSeq and we don't use the 454 for projects like this anymore. Everything from sample prep to analysis is easier on the miseq. No denoising and with careful design you can avoid chimeric amplicon issues too. Never tried the cloud analysis with MiSeq. We use custom primers so not sure whether that would work...
koadman is offline   Reply With Quote
Old 07-29-2012, 10:04 AM   #3
vs92
Member
 
Location: Cambridge, MA

Join Date: Jul 2012
Posts: 10
Default

Thanks for the helpful information

Last edited by vs92; 08-10-2012 at 03:21 AM.
vs92 is offline   Reply With Quote
Old 08-11-2012, 07:33 PM   #4
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

Hi vs92,

We've been playing around with both the 16S V4 protocol by Caporaso et al. as well as an expanded V4/5 version using a pseudo 2x250 setup for a few months now. Based on our current experience with the MiSeq compared to doing 454, I'd say the MiSeq is the better choice.

Both 454 and the MiSeq would necessitate a lot of custom indexed primers, PCR, and cleanup, but the MiSeq has the advantage of not requiring an emPCR step and much easier set up (although to be fair I haven't actually done a 454 run in over 2 years so it may have gotten better). The MiSeq also has virtually no homopolymer issue, which means you can proceed with your data analysis much faster without having to do a computationally expensive denoising step. Max throughput on 454 is ~600K reads, while even with a 50% spike of phiX, which is necessary for amplicon sequencing on the MiSeq, you should still get > 2 million reads/run. Given that a 300 cycle kit from Illumina costs ~$1000, you have a drastically reduced cost/sample compared to the 454.

Now, one active topic of debate is how well the short reads from the MiSeq are able to capture your community compared to 454. My feeling is that it's a bit of a moot point since neither are 100% accurate and have their associated error sources. Given the higher throughput and drastically reduced cost/sample, I expect a lot of people to give up on 454 and switch to Illumina. With the imminent release of 500 cycles kits capable of doing 2x250 bp reads, combined with read pair merging, you'll soon be getting high quality ~400bp 16S amplicons that will completely supplant 454.
mcnelson.phd is offline   Reply With Quote
Old 10-22-2012, 05:15 PM   #5
capsicum
Member
 
Location: Earth

Join Date: Jul 2012
Posts: 13
Default

Can anyone share some run statistics for 16S runs on the MiSeq? What cluster density are you aiming for? What PhiX spike-in proportion are you using? What cluster density, sequence yield and sequence quality are you getting back for these runs?
capsicum is offline   Reply With Quote
Old 10-22-2012, 05:31 PM   #6
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

Quote:
Originally Posted by capsicum View Post
Can anyone share some run statistics for 16S runs on the MiSeq? What cluster density are you aiming for? What PhiX spike-in proportion are you using? What cluster density, sequence yield and sequence quality are you getting back for these runs?
This is a very tricky question to answer given the software issues with the latest version of RTA since the hardware upgrade. Prior to the upgrade, we would use anywhere from 40-60% phiX (the indexed version to improve index read quality). We would always try to get a cluster density of around 800, but our best result was only 650 using 8pM for the library and phiX. Yields and quality pre-upgrade were pretty good considering the phiX takes up a fair bit of yield, but you could expect at least 1.5 million reads w/ > q25 average base quality.

Post-upgrade, it's a whole different ball game. We've been able to get good data when using 90% phiX, but as you can imagine the yield is terrible. Cost is still around $100/sample for ~75K reads based on our results, which is better than 454. There have been a number of "hacks" using hard-coded run parameters that keep the software issue from destroying run quality, but they're not supported by Illumina and our only attempt to try it ended with an instrument failure so we're currently waiting to try again.

One way to get around the current software issue is to sequence a metagenome/transcriptome along with your amplicons so you're not wasting reads on phiX. That adds costs in having to prepare those libraries, and the data generally isn't as useful compared to a HiSeq run because of the shallow coverage, but considering phiX gives you nothing it's a worthwhile step in my opinion.
mcnelson.phd is offline   Reply With Quote
Old 10-22-2012, 06:07 PM   #7
capsicum
Member
 
Location: Earth

Join Date: Jul 2012
Posts: 13
Default

So things have gotten worse since the hardware upgarde? What has caused this... hardware or software? It sounds like just a software issue (well, perhaps a methodological issue and that methodology is implemented in the software). But, why has it gotten worse with the upgrade?

Is the issue solely caused by the poor colour matrix and phasing estimates (assuming cluster identification and image registration are OK)?

Lastly, do you mean that you now have to use 90% instead of the 20-60% that I've seen mentioned before?
capsicum is offline   Reply With Quote
Old 10-22-2012, 06:19 PM   #8
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

Quote:
So things have gotten worse since the hardware upgarde?
Yup.

Quote:
What has caused this... hardware or software? It sounds like just a software issue (well, perhaps a methodological issue and that methodology is implemented in the software). But, why has it gotten worse with the upgrade?
It is indeed a software bug in the Real Time Analysis package. Basically, with low-diversity samples, of which 16S amplicons are, the phasing/pre-phasing estimates get all out of wack. Once those values hit .4, RTA go bonkers and assigns pretty much every base for every remaining cycle really crappy quality scores. The quality really shouldn't be that bad, but you can't trust the data at all when that happens.

Quote:
Is the issue solely caused by the poor colour matrix and phasing estimates (assuming cluster identification and image registration are OK)?
There's really nothing wrong with the crosstalk matrix or phasing/pre-phasing estimates themselves. It's more that once that .4 phasing/pre-phasing threshold is passed, which happens quickly on amplicons, the algorithms that RTA uses to call bases and assign quality lead to pretty much every remaining base being << q 20. Illumina says that they're working on fixing this issue, but there's no timeline for when it will be fixed.

Quote:
Lastly, do you mean that you now have to use 90% instead of the 20-60% that I've seen mentioned before?
With post-upgrade runs, in order for us to to be sure that we're going to get enough high quality reads, we've been using 90% phiX. Illumina themselves recommend at least 75%, but depending on how diverse your samples are, that may not be enough. If you're looking at something like the rumen or sewage that has a lot of inherent diversity then you can probably get away with 75%, if you're doing a lot of samples with low inherent diversity then you'll have a lot of the same species in high abundance and a very skewed base distribution for each cycle.

So far Illumina has been very good at working with my group at figuring out how to work around this RTA issue, but it's not easy and there are a lot of big labs that this is really causing issues for.

One thing I have heard is that this issue does not affect that HiSeq at all. You don't get the 2x250 read lengths, and it's a lot higher investment, but it does work with only 40% phiX from the people I've talked with who've tried it.

Last edited by mcnelson.phd; 10-22-2012 at 06:23 PM.
mcnelson.phd is offline   Reply With Quote
Old 10-23-2012, 05:50 AM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,079
Default

Quote:
Originally Posted by mcnelson.phd View Post

The quality really shouldn't be that bad, but you can't trust the data at all when that happens.
While it is true that the quality values are poor I am not sure that you can't trust the data.

With the 2 x 250 bp reads we get a significant overlap in the middle of the reads (without any errors in majority of reads). So we if set the scores aside there appears to be no problem with the sequence itself. At least in the case we are looking at (16S multiplexed, no phiX because of custom primer, hardcoded matrix/phasing).

Last edited by GenoMax; 10-23-2012 at 05:57 AM.
GenoMax is offline   Reply With Quote
Old 10-23-2012, 06:15 AM   #10
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

Quote:
Originally Posted by GenoMax View Post
With the 2 x 250 bp reads we get a significant overlap in the middle of the reads (without any errors in majority of reads). So we if set the scores aside there appears to be no problem with the sequence itself. At least in the case we are looking at (16S multiplexed, no phiX because of custom primer, hardcoded matrix/phasing).
Can you give some metrics on what percent are overlapping and how much overlap you're using? I never looked into seeing if the sequences were good but the quality was wrong because our FAS said that with the RTA error they can't make any guarantees about basecalling being accurate. I have looked at the phiX from poor runs, and do see a lot more base errors than one should normally see.

Also, how are you getting away with no phiX at all? Are you doing multiple different V-regions of the 16S so that the cluster recognition isn't affected? That's something that we are considering, but it still seems risky to not use any phiX (I'd at least use a 1% spike as a sequencing control like Illumina recommends).
mcnelson.phd is offline   Reply With Quote
Old 10-23-2012, 07:09 AM   #11
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,079
Default

Quote:
Originally Posted by mcnelson.phd View Post
Can you give some metrics on what percent are overlapping and how much overlap you're using? I never looked into seeing if the sequences were good but the quality was wrong because our FAS said that with the RTA error they can't make any guarantees about basecalling being accurate. I have looked at the phiX from poor runs, and do see a lot more base errors than one should normally see.
The overlap was in 170 bp range and more than 90% of reads overlapped for all samples.

The "poor" runs you are referring to are those based on # of reads passing filter or quality scores?

Quote:
Originally Posted by mcnelson.phd View Post
Also, how are you getting away with no phiX at all? Are you doing multiple different V-regions of the 16S so that the cluster recognition isn't affected? That's something that we are considering, but it still seems risky to not use any phiX (I'd at least use a 1% spike as a sequencing control like Illumina recommends).
It is some kind of custom primer strategy and multiplexed samples (sorry but I do not know the specifics) which precludes use of phiX. Cluster recognition is definitely working (it is probably affected compared to "normal" samples) without phiX. We got about 20 million reads from the last 2 x 250 bp run.

Last edited by GenoMax; 10-23-2012 at 07:36 AM. Reason: added info
GenoMax is offline   Reply With Quote
Old 10-23-2012, 11:40 AM   #12
bstamps
Member
 
Location: University of Oklaoma

Join Date: Oct 2012
Posts: 40
Default

Our lab was getting ready to do a MiSeq run (16s 2x250) and I had some questions as well- I planned on using barcoded primers (12bp golay), but allowing the sequencing center to index our reads (A and B tags)- is this doable? I would imagine that in post processing I should be able to overlap the reads, strip the barcodes and send it through QIIME without having to order primers similar to Caparaso et. al in which they had very large primers with Illumina adaptor/index/spacer/barcode/primer (Which look to be very, very expensive as opposed to Barcode/Spacer/Primer, then allowing our center to prep the libraries to add adaptors and indicies as necessary).
bstamps is offline   Reply With Quote
Old 10-23-2012, 04:46 PM   #13
ScottC
Senior Member
 
Location: Monash University, Melbourne, Australia.

Join Date: Jan 2008
Posts: 246
Default

GenoMax:

There will always be reads that are genuinely of poor quality that need to be trimmed or discarded, even in a 'good' run. If it's true that the sequence is actually OK, but the quality scores are just incorrect/miscalculated, and you then discard this information, then what do you do about downstream processing? If you're using this data for 16S tag sequencing, then how do pre-process the reads?

PS: Perhaps you already know about this, and/or perhaps your system also precludes the use of the Illumina sequencing primer. If not, then you can usually use PhiX, even in a custom-primed run, by simply adding the custom primer to the existing primer tube on the MiSeq cartridge, rather than one of the custom tubes. Then you're doing a sequencing reaction using several different primers at once and only the relevant primers will bind to the relevant clusters (the MiSeq cartridge already contains lots of different primers, anyway). But, maybe you don't need it.

Quote:
Our lab was getting ready to do a MiSeq run (16s 2x250) and I had some questions as well- I planned on using barcoded primers (12bp golay), but allowing the sequencing center to index our reads (A and B tags)- is this doable? I would imagine that in post processing I should be able to overlap the reads, strip the barcodes and send it through QIIME without having to order primers similar to Caparaso et. al in which they had very large primers with Illumina adaptor/index/spacer/barcode/primer (Which look to be very, very expensive as opposed to Barcode/Spacer/Primer, then allowing our center to prep the libraries to add adaptors and indicies as necessary).
You can use the Illumina indexes if you like, but then you have to pay for a library prep too. The Caporaso method allows you to simply amplify, clean and sequence. If you send unindexed PCR product, then you'll have to run it through a (probably slightly modified) sample prep. The oligos only cost a few hundred dollars for 24 or so barcodes, and there's enough there to run many, many reactions.
ScottC is offline   Reply With Quote
Old 10-29-2012, 11:21 AM   #14
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

Quote:
Originally Posted by mcnelson.phd View Post
Post-upgrade, it's a whole different ball game. We've been able to get good data when using 90% phiX, but as you can imagine the yield is terrible. Cost is still around $100/sample for ~75K reads based on our results, which is better than 454. There have been a number of "hacks" using hard-coded run parameters that keep the software issue from destroying run quality, but they're not supported by Illumina and our only attempt to try it ended with an instrument failure so we're currently waiting to try again.
Our instrument was upgraded a month ago and I just did our first 16S amplicon sequence run and the data was fine (greater than 90% over Q30).

You must be overloading. We run 5pM and 30% PhiX.
NextGenSeq is offline   Reply With Quote
Old 10-29-2012, 02:18 PM   #15
capsicum
Member
 
Location: Earth

Join Date: Jul 2012
Posts: 13
Default

Quote:
Originally Posted by NextGenSeq View Post
Our instrument was upgraded a month ago and I just did our first 16S amplicon sequence run and the data was fine (greater than 90% over Q30).

You must be overloading. We run 5pM and 30% PhiX.
What cluster density are you getting on these runs?
capsicum is offline   Reply With Quote
Old 10-30-2012, 12:42 AM   #16
Vinz
Member
 
Location: Germany

Join Date: Dec 2010
Posts: 80
Default

Quote:
Originally Posted by NextGenSeq View Post
Our instrument was upgraded a month ago and I just did our first 16S amplicon sequence run and the data was fine (greater than 90% over Q30).

You must be overloading. We run 5pM and 30% PhiX.
That sounds very promising. What setup do you use? How many different primers? Was this a 2x250bp run? Could you post a picture of your %base graph?
Vinz is offline   Reply With Quote
Old 10-30-2012, 08:10 AM   #17
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

5.5 million reads passing filter. I'll try to post a pic.
NextGenSeq is offline   Reply With Quote
Old 01-09-2013, 12:05 PM   #18
costamc
Junior Member
 
Location: Canada

Join Date: Mar 2012
Posts: 6
Default

Hi,
I am new using Illumina MiSeq. Does anybody have any suggestion on which region of the 16S rRNA gene to use for the new chemistry that amplifies 2x250bp? I was hoping to get an overlap of around 50bp.
Thanks in advance.
costamc is offline   Reply With Quote
Old 01-09-2013, 12:13 PM   #19
bstamps
Member
 
Location: University of Oklaoma

Join Date: Oct 2012
Posts: 40
Default

Just replied to this in http://seqanswers.com/forums/showthread.php?t=16812
bstamps is offline   Reply With Quote
Old 01-09-2013, 12:15 PM   #20
costamc
Junior Member
 
Location: Canada

Join Date: Mar 2012
Posts: 6
Default

Quote:
Originally Posted by mcnelson.phd View Post
Hi vs92,

We've been playing around with both the 16S V4 protocol by Caporaso et al. as well as an expanded V4/5 version using a pseudo 2x250 setup for a few months now.
mcnelson.phd,

Which set of primers are using for this region and what is the amplicon size? How many bp overlap when you use the 2X250 setup?

Many thanks.
costamc is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:20 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO