Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Time & Cost of using 1 MiSeq Machine to do 16s rDNA (V2/V4) Seq on 300 Samples/Month

    Hello Everyone, I am trying to plan for a large experiment of 300 human stool samples over the next 1 month to identify the population of bacterial species living in each stool sample (using 1 MiSeq machine). Based on time and cost, should I go with MiSeq or the 454 sequencer, especially if I have to do 300 samples again per month for the next few months? Thanks so much for your insights. I'll update this post every time I receive more feedback and information. Thanks!
    Last edited by vs92; 08-10-2012, 02:22 AM.

  • #2
    Well, this is a pretty massive question, but in our lab at UC Davis we have both 454 and MiSeq and we don't use the 454 for projects like this anymore. Everything from sample prep to analysis is easier on the miseq. No denoising and with careful design you can avoid chimeric amplicon issues too. Never tried the cloud analysis with MiSeq. We use custom primers so not sure whether that would work...

    Comment


    • #3
      Thanks for the helpful information
      Last edited by vs92; 08-10-2012, 02:21 AM.

      Comment


      • #4
        Hi vs92,

        We've been playing around with both the 16S V4 protocol by Caporaso et al. as well as an expanded V4/5 version using a pseudo 2x250 setup for a few months now. Based on our current experience with the MiSeq compared to doing 454, I'd say the MiSeq is the better choice.

        Both 454 and the MiSeq would necessitate a lot of custom indexed primers, PCR, and cleanup, but the MiSeq has the advantage of not requiring an emPCR step and much easier set up (although to be fair I haven't actually done a 454 run in over 2 years so it may have gotten better). The MiSeq also has virtually no homopolymer issue, which means you can proceed with your data analysis much faster without having to do a computationally expensive denoising step. Max throughput on 454 is ~600K reads, while even with a 50% spike of phiX, which is necessary for amplicon sequencing on the MiSeq, you should still get > 2 million reads/run. Given that a 300 cycle kit from Illumina costs ~$1000, you have a drastically reduced cost/sample compared to the 454.

        Now, one active topic of debate is how well the short reads from the MiSeq are able to capture your community compared to 454. My feeling is that it's a bit of a moot point since neither are 100% accurate and have their associated error sources. Given the higher throughput and drastically reduced cost/sample, I expect a lot of people to give up on 454 and switch to Illumina. With the imminent release of 500 cycles kits capable of doing 2x250 bp reads, combined with read pair merging, you'll soon be getting high quality ~400bp 16S amplicons that will completely supplant 454.

        Comment


        • #5
          Can anyone share some run statistics for 16S runs on the MiSeq? What cluster density are you aiming for? What PhiX spike-in proportion are you using? What cluster density, sequence yield and sequence quality are you getting back for these runs?

          Comment


          • #6
            Originally posted by capsicum View Post
            Can anyone share some run statistics for 16S runs on the MiSeq? What cluster density are you aiming for? What PhiX spike-in proportion are you using? What cluster density, sequence yield and sequence quality are you getting back for these runs?
            This is a very tricky question to answer given the software issues with the latest version of RTA since the hardware upgrade. Prior to the upgrade, we would use anywhere from 40-60% phiX (the indexed version to improve index read quality). We would always try to get a cluster density of around 800, but our best result was only 650 using 8pM for the library and phiX. Yields and quality pre-upgrade were pretty good considering the phiX takes up a fair bit of yield, but you could expect at least 1.5 million reads w/ > q25 average base quality.

            Post-upgrade, it's a whole different ball game. We've been able to get good data when using 90% phiX, but as you can imagine the yield is terrible. Cost is still around $100/sample for ~75K reads based on our results, which is better than 454. There have been a number of "hacks" using hard-coded run parameters that keep the software issue from destroying run quality, but they're not supported by Illumina and our only attempt to try it ended with an instrument failure so we're currently waiting to try again.

            One way to get around the current software issue is to sequence a metagenome/transcriptome along with your amplicons so you're not wasting reads on phiX. That adds costs in having to prepare those libraries, and the data generally isn't as useful compared to a HiSeq run because of the shallow coverage, but considering phiX gives you nothing it's a worthwhile step in my opinion.

            Comment


            • #7
              So things have gotten worse since the hardware upgarde? What has caused this... hardware or software? It sounds like just a software issue (well, perhaps a methodological issue and that methodology is implemented in the software). But, why has it gotten worse with the upgrade?

              Is the issue solely caused by the poor colour matrix and phasing estimates (assuming cluster identification and image registration are OK)?

              Lastly, do you mean that you now have to use 90% instead of the 20-60% that I've seen mentioned before?

              Comment


              • #8
                So things have gotten worse since the hardware upgarde?
                Yup.

                What has caused this... hardware or software? It sounds like just a software issue (well, perhaps a methodological issue and that methodology is implemented in the software). But, why has it gotten worse with the upgrade?
                It is indeed a software bug in the Real Time Analysis package. Basically, with low-diversity samples, of which 16S amplicons are, the phasing/pre-phasing estimates get all out of wack. Once those values hit .4, RTA go bonkers and assigns pretty much every base for every remaining cycle really crappy quality scores. The quality really shouldn't be that bad, but you can't trust the data at all when that happens.

                Is the issue solely caused by the poor colour matrix and phasing estimates (assuming cluster identification and image registration are OK)?
                There's really nothing wrong with the crosstalk matrix or phasing/pre-phasing estimates themselves. It's more that once that .4 phasing/pre-phasing threshold is passed, which happens quickly on amplicons, the algorithms that RTA uses to call bases and assign quality lead to pretty much every remaining base being << q 20. Illumina says that they're working on fixing this issue, but there's no timeline for when it will be fixed.

                Lastly, do you mean that you now have to use 90% instead of the 20-60% that I've seen mentioned before?
                With post-upgrade runs, in order for us to to be sure that we're going to get enough high quality reads, we've been using 90% phiX. Illumina themselves recommend at least 75%, but depending on how diverse your samples are, that may not be enough. If you're looking at something like the rumen or sewage that has a lot of inherent diversity then you can probably get away with 75%, if you're doing a lot of samples with low inherent diversity then you'll have a lot of the same species in high abundance and a very skewed base distribution for each cycle.

                So far Illumina has been very good at working with my group at figuring out how to work around this RTA issue, but it's not easy and there are a lot of big labs that this is really causing issues for.

                One thing I have heard is that this issue does not affect that HiSeq at all. You don't get the 2x250 read lengths, and it's a lot higher investment, but it does work with only 40% phiX from the people I've talked with who've tried it.
                Last edited by mcnelson.phd; 10-22-2012, 05:23 PM.

                Comment


                • #9
                  Originally posted by mcnelson.phd View Post

                  The quality really shouldn't be that bad, but you can't trust the data at all when that happens.
                  While it is true that the quality values are poor I am not sure that you can't trust the data.

                  With the 2 x 250 bp reads we get a significant overlap in the middle of the reads (without any errors in majority of reads). So we if set the scores aside there appears to be no problem with the sequence itself. At least in the case we are looking at (16S multiplexed, no phiX because of custom primer, hardcoded matrix/phasing).
                  Last edited by GenoMax; 10-23-2012, 04:57 AM.

                  Comment


                  • #10
                    Originally posted by GenoMax View Post
                    With the 2 x 250 bp reads we get a significant overlap in the middle of the reads (without any errors in majority of reads). So we if set the scores aside there appears to be no problem with the sequence itself. At least in the case we are looking at (16S multiplexed, no phiX because of custom primer, hardcoded matrix/phasing).
                    Can you give some metrics on what percent are overlapping and how much overlap you're using? I never looked into seeing if the sequences were good but the quality was wrong because our FAS said that with the RTA error they can't make any guarantees about basecalling being accurate. I have looked at the phiX from poor runs, and do see a lot more base errors than one should normally see.

                    Also, how are you getting away with no phiX at all? Are you doing multiple different V-regions of the 16S so that the cluster recognition isn't affected? That's something that we are considering, but it still seems risky to not use any phiX (I'd at least use a 1% spike as a sequencing control like Illumina recommends).

                    Comment


                    • #11
                      Originally posted by mcnelson.phd View Post
                      Can you give some metrics on what percent are overlapping and how much overlap you're using? I never looked into seeing if the sequences were good but the quality was wrong because our FAS said that with the RTA error they can't make any guarantees about basecalling being accurate. I have looked at the phiX from poor runs, and do see a lot more base errors than one should normally see.
                      The overlap was in 170 bp range and more than 90% of reads overlapped for all samples.

                      The "poor" runs you are referring to are those based on # of reads passing filter or quality scores?

                      Originally posted by mcnelson.phd View Post
                      Also, how are you getting away with no phiX at all? Are you doing multiple different V-regions of the 16S so that the cluster recognition isn't affected? That's something that we are considering, but it still seems risky to not use any phiX (I'd at least use a 1% spike as a sequencing control like Illumina recommends).
                      It is some kind of custom primer strategy and multiplexed samples (sorry but I do not know the specifics) which precludes use of phiX. Cluster recognition is definitely working (it is probably affected compared to "normal" samples) without phiX. We got about 20 million reads from the last 2 x 250 bp run.
                      Last edited by GenoMax; 10-23-2012, 06:36 AM. Reason: added info

                      Comment


                      • #12
                        Our lab was getting ready to do a MiSeq run (16s 2x250) and I had some questions as well- I planned on using barcoded primers (12bp golay), but allowing the sequencing center to index our reads (A and B tags)- is this doable? I would imagine that in post processing I should be able to overlap the reads, strip the barcodes and send it through QIIME without having to order primers similar to Caparaso et. al in which they had very large primers with Illumina adaptor/index/spacer/barcode/primer (Which look to be very, very expensive as opposed to Barcode/Spacer/Primer, then allowing our center to prep the libraries to add adaptors and indicies as necessary).

                        Comment


                        • #13
                          GenoMax:

                          There will always be reads that are genuinely of poor quality that need to be trimmed or discarded, even in a 'good' run. If it's true that the sequence is actually OK, but the quality scores are just incorrect/miscalculated, and you then discard this information, then what do you do about downstream processing? If you're using this data for 16S tag sequencing, then how do pre-process the reads?

                          PS: Perhaps you already know about this, and/or perhaps your system also precludes the use of the Illumina sequencing primer. If not, then you can usually use PhiX, even in a custom-primed run, by simply adding the custom primer to the existing primer tube on the MiSeq cartridge, rather than one of the custom tubes. Then you're doing a sequencing reaction using several different primers at once and only the relevant primers will bind to the relevant clusters (the MiSeq cartridge already contains lots of different primers, anyway). But, maybe you don't need it.

                          Our lab was getting ready to do a MiSeq run (16s 2x250) and I had some questions as well- I planned on using barcoded primers (12bp golay), but allowing the sequencing center to index our reads (A and B tags)- is this doable? I would imagine that in post processing I should be able to overlap the reads, strip the barcodes and send it through QIIME without having to order primers similar to Caparaso et. al in which they had very large primers with Illumina adaptor/index/spacer/barcode/primer (Which look to be very, very expensive as opposed to Barcode/Spacer/Primer, then allowing our center to prep the libraries to add adaptors and indicies as necessary).
                          You can use the Illumina indexes if you like, but then you have to pay for a library prep too. The Caporaso method allows you to simply amplify, clean and sequence. If you send unindexed PCR product, then you'll have to run it through a (probably slightly modified) sample prep. The oligos only cost a few hundred dollars for 24 or so barcodes, and there's enough there to run many, many reactions.

                          Comment


                          • #14
                            Originally posted by mcnelson.phd View Post
                            Post-upgrade, it's a whole different ball game. We've been able to get good data when using 90% phiX, but as you can imagine the yield is terrible. Cost is still around $100/sample for ~75K reads based on our results, which is better than 454. There have been a number of "hacks" using hard-coded run parameters that keep the software issue from destroying run quality, but they're not supported by Illumina and our only attempt to try it ended with an instrument failure so we're currently waiting to try again.
                            Our instrument was upgraded a month ago and I just did our first 16S amplicon sequence run and the data was fine (greater than 90% over Q30).

                            You must be overloading. We run 5pM and 30% PhiX.

                            Comment


                            • #15
                              Originally posted by NextGenSeq View Post
                              Our instrument was upgraded a month ago and I just did our first 16S amplicon sequence run and the data was fine (greater than 90% over Q30).

                              You must be overloading. We run 5pM and 30% PhiX.
                              What cluster density are you getting on these runs?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X