Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by massspecgeek View Post
    Sorry, should have said that support will continue. Only sales of new instruments affected.
    Taking out HiSeq 2500 would leave a gap in the continuum for "Illumina"verse between NextSeq 550 and HiSeq 4K/NovaSeq 5000.

    Perhaps we will see a new sequencer (or two) slot in between there, in future.
    Last edited by GenoMax; 01-12-2017, 09:40 AM.

    Comment


    • #32
      Reagent cost is $6375 per flowcell for Hi Seq X. If the price of the new reagent is 80% of Hi Seq X, then it is $5100 per flowcell for NovaSeq 6000.

      This means that the new reagent cost is $1.7/Gbp which is a huge drop from the previous $7/Gbp. Correct?

      Comment


      • #33
        I'm pretty sure they meant 80% of the running cost (per Gb), not 80% of the specific kit cost. However, we've still only seen hints at specific pricing, so we can't say for sure.
        AllSeq - The Sequencing Marketplace
        [email protected]
        www.AllSeq.com

        Comment


        • #34
          Originally posted by AllSeq View Post
          I'm pretty sure they meant 80% of the running cost (per Gb), not 80% of the specific kit cost. However, we've still only seen hints at specific pricing, so we can't say for sure.
          Thanks for your reply.

          Then from the cost perspective, it is not that impressive.

          Big jump is throughput is always welcomed by the big genome centers. However, if base accuracy is down due to the new chemistry, then that won't even be a plus.

          Anyway, I think we need to wait a little bit more to assess this new toy.

          Comment


          • #35
            Yeah, if you already have a HiSeq X then the only major advantage is that there are no library type limitations on the NovaSeq.
            What NovaSeq does is offer the average core a shot at a price per base previously only available to those with the throughput to need 5+ HiSeq X.
            That said, you would need to run S4 reagents to get that price per base and:
            (1) S4 won't be ready until late 2017
            (2) It will generate 3 Tb of data in a single run == a single lane (logically, if not physically).

            --
            Phillip

            Comment


            • #36
              Added some information from webinar to the original post.

              Comment


              • #37
                Couple things that have changed on this lately.

                1 - S4 flow cells now slated to ship in Q3 this year.
                2 - S4 reagent kits only being reduced to be 20% cheaper than HiSeq X if you buy 5 NovaSeq instruments. Bleh. Still about half the cost per Gb versus HiSeq 4000.

                Comment


                • #38
                  I did a comparison of duplicate rates on HiSeq2500 and NovaSeq, using Illumina's public data on BaseSpace:



                  NovaSeq seems to have a problem, but it's not clear why. These are not normal optical/well duplicates; they are extremely remote. It looks like during colony formation, some reads break off and reattach to an empty well somewhere else. The farthest-right point (at 25000) is not for distance 25000 but for distance infinity, including inter-tile duplicates.

                  These libraries are PCR-free WGS and thus should not really have more than a tiny fraction of duplicates, as seen on the HiSeq. Does anyone have any idea what's causing this? Does my hypothesis sound reasonable? Previous Illumina platforms had a very obvious distance cutoff where the number of duplicates increases rapidly up to a point, then plateaus (which is true for this HiSeq data, at around dist=45, but you can't see it in this graph). That is not the case for NovaSeq - it just keeps ascending, and there is no clear cutoff. It gradually bends, so there is no clear inflection point like there is on other platforms.

                  For reference, the libraries are both human NA12878 runs. NovaSeq is 2x150 and HiSeq 2500 is 2x100. Pairs are considered duplicates when the distance between colony centers is at most the stated distance, and both R1 and R2 match with some number of substitutions allowed, to account for sequencing error (8 for 150bp reads and 5 for 100bp reads). The insert sizes are quite large on average (>500bp) which reduces the rate of coincidental duplicates. HS2500 is ~10x and NovaSeq is ~30x coverage so the coincidental duplicate rate should be extremely low in both cases.

                  P.S. This is an underestimate of the duplicate rate for both platforms, as it was generated in a way that is not robust to sequencing error. I will regenerate the data, but it won't change the discrepancy, just the magnitude.
                  Attached Files
                  Last edited by Brian Bushnell; 03-01-2017, 07:43 PM.

                  Comment


                  • #39
                    Was there a higher phiX concentration in the NovaSeq run? Wouldn't phiX produce pseudo-duplicates given the small genome, especially if library prep had a biased fragmentation?

                    I agree with your "fragment break-off" possibility. We were just chatting about that idea recently over here regarding the HiSeq4000.
                    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                    Comment


                    • #40
                      There was zero PhiX in the Novaseq data. I was wondering a bit about mitochondrial content, but still, the source DNA is the same for both platforms. Anyway, coincidental duplicates won't follow the pattern in the graph, of a curve with a negative derivative. They would cause a positive derivative because the number of potential matches increases with the square of the radius, so random matches would yield a curve that looks like Y=X^2, whereas the curve I plotted looks like... nothing with which I am familiar.

                      Edit:

                      Or, maybe, I should say it looks a bit like a step function plus a linear, or square-root, or X^Y function where Y is between 0.5 and 1. The step function has a steep increase until a point (say, 2500 for NovaSeq), which models "traditional" optical- or well-duplicates. The other function models "drifters" that break off and land in remote wells.
                      Last edited by Brian Bushnell; 03-01-2017, 11:18 PM.

                      Comment


                      • #41
                        Here is a zoomed-in image of HiSeq 2500 duplicates for the same genome (it's an immortal human cell line that does not need amplification, or so I'm told).



                        This is not the same as the other image, as the x-axis is logarithmic rather than linear. But the important point in my opinion is that there is a rapid increase in duplicates detected up to a point (~45) and subsequently it is completely flat for a long time. That is what I expect from a platform that occasionally identifies oddly-shaped clusters as two clusters, or in which a well occasionally migrates to an adjacent well.

                        At ~1000, it starts going up again. I'm not sure about that - I would expect it to be sub-linear on the log scale, but then, I'm not sure what's happening in that region. The salient point is that there is a sharp increase over roughly the width of a cluster, and then a plateau, and finally another increase due to the increasing range. After dist=1000, I can't explain the slope. But, the graph only shows duplicates of less than 0.02% of reads, so it's not very important in practice. Still, it would be great if there was one less unsolved mystery.
                        Attached Files
                        Last edited by Brian Bushnell; 03-02-2017, 12:32 AM.

                        Comment


                        • #42
                          Hi Brian,
                          Are you scoring the same number of reads with HiSeq/NovaSeq? If the number of reads for the NovaSeq were an order of magnitude higher, then repetitive or mitochondrial DNA then you might be able to use up all of the possible start sites.

                          Are you scoring clusters as a duplicate only if both forward and reverse reads are the same? Or are you only checking one side?

                          BTW, yes, a typical DNA prep from cell culture would yield enough DNA to make it unnecessary to amplify the library.

                          --
                          Phillip

                          Comment


                          • #43
                            Originally posted by pmiguel View Post
                            Hi Brian,
                            Are you scoring the same number of reads with HiSeq/NovaSeq? If the number of reads for the NovaSeq were an order of magnitude higher, then repetitive or mitochondrial DNA then you might be able to use up all of the possible start sites.
                            Probably not since best NovaSeq sample posted on BaseSpace has 1.6 Billion reads (individual R1 and R2 files, if uncompressed are 300G each!, we have the possibility of having uncompressed read files of 1TB each when S4 cells roll around later this year).
                            Are you scoring clusters as a duplicate only if both forward and reverse reads are the same? Or are you only checking one side?
                            That should be a yes since @Brian is probably using clumpify which takes both reads into account.

                            I am wondering if we are sampling the libraries so thoroughly on a NovaSeq that we have duplicates showing up due to oversampling.
                            Last edited by GenoMax; 03-02-2017, 08:14 AM.

                            Comment


                            • #44
                              Brian, your hypothesis is reasonable as there is no other possibility to explain the duplicate rate. Not surprisingly, we see similar duplicates on HiSeq 4000, as this 'characteristic' of ExAmp isn't limited to NovaSeq.

                              Comment


                              • #45
                                Originally posted by pmiguel View Post
                                Hi Brian,
                                Are you scoring the same number of reads with HiSeq/NovaSeq? If the number of reads for the NovaSeq were an order of magnitude higher, then repetitive or mitochondrial DNA then you might be able to use up all of the possible start sites.
                                I might try running again after removing the mito, but it's not like mito accounts for >12% of the reads anyway. The number of reads was different, but this NovaSeq library only has twice the reads of the HiSeq library, so that doesn't explain the result.

                                Are you scoring clusters as a duplicate only if both forward and reverse reads are the same? Or are you only checking one side?
                                As Genomax indicated, yes, with this methodology both reads in a pair are required to match for the pair to be considered a duplicate. Due to the large insert size and variance this is unlikely to occur by chance.

                                Originally posted by misterc
                                Brian, your hypothesis is reasonable as there is no other possibility to explain the duplicate rate. Not surprisingly, we see similar duplicates on HiSeq 4000, as this 'characteristic' of ExAmp isn't limited to NovaSeq.
                                I wonder if this is a fundamental limitation of patterned flowcells, and made more pronounced as the dots shrink. When the colony is growing, once a dot is filled, the amplification continues but there is nowhere for the clones on the edges to attach, so some of them break off and drift around. In that case, presumably increasing the loading concentration would reduce the duplicate rate...

                                But, it makes me wonder what the duplicate rates of the high-throughput flowcells will look like.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                33 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                72 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                80 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X