Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gringer
    David Eccles (gringer)
    • May 2011
    • 845

    Re-calling the same event model -- improvements over 10 months

    People seem to be interested in the error rate of the MinION. I'd like to put this image up to demonstrate one of the reasons why error rate is a fickle beast to calculate:



    This is exactly the same event signal model (combination of current and dwell time inside the pore) recalled at three separate times over the past year. I've selected a small region covering a homopolymer sequence to make the mapping changes more impressive and easier to see. The reference sequence is shown in the middle (at the 0 line), with changes shown above and below the sequence.
  • ymc
    Senior Member
    • Mar 2010
    • 496

    #2
    How do I make sense of this graph? What software was used to generate it?

    What are the versions of Flowcell, SQK-MAP, Metrichor for each of them?

    Comment

    • gringer
      David Eccles (gringer)
      • May 2011
      • 845

      #3
      I used a custom R script for generating the graph, which works on a pairwise alignment of two sequences. The reference sequence appears at the 0 line in the graph (the top letters), and any substitutions appears underneath that, colour-coded depending on the three different types of substitutions (purine/pyrimidine, methyl/keto, strong/weak). Insertions appear as chartreuse wedges above the reference sequence, and deletions are steel blue triangles that exclude reference sequences.

      I've attached the script I used to create an earlier graph with the same appearance (Image 5 in that script).

      Flow cell and sequencing kits are obviously the same for all sequences, and were current on 2014-Oct-03: R7.3 flow cell, and I think SQK-MAP003.

      I'm not sure about Metrichor, it was just whatever was current at the time. According to the Fast5 files, the first sequence was chimaera v1.2.2, the middle sequence was chimaera v1.6.3, and the third sequence was chimaera v1.14.4 with dragonet v1.14.2.
      Attached Files
      Last edited by gringer; 09-14-2015, 06:32 PM.

      Comment

      • ymc
        Senior Member
        • Mar 2010
        • 496

        #4
        Thanks for your reply. The third graph doesn't seem to be an obvious improvement over the second one. It seems to me it just substituted one type of error with another type.

        Comment

        • gringer
          David Eccles (gringer)
          • May 2011
          • 845

          #5
          The improvement is that it has detected a single base insertion in the homopolymer region, which is a nice result given that our sample had a single base insertion in that region. There are substitution errors, and the inserted base is incorrect (T instead of A), but it suggests to me that things are moving in the right direction. It also demonstrates that it might be possible to call sequences across long homopolymer regions after all, despite the theoretical model suggesting that there should be no difference in signal between adjacent events in the middle of the region.

          Comment

          • ymc
            Senior Member
            • Mar 2010
            • 496

            #6
            Do you mean the T insertion between 9825 and 9826 is real? I thought you were just re-sequencing a reference sample. Did you actually sequence a sample from the same strain of the reference but was not the same sample?

            Comment

            • gringer
              David Eccles (gringer)
              • May 2011
              • 845

              #7
              It should be an 'A' insertion, but yes, it's real. We were sequencing 4T1 cancer cells, which have a few variants different from the reference sequence. You can see the paper for more details:



              ResearchGate link if you don't have direct access to the paper through Cell:

              Comment

              • nucacidhunter
                Jafar Jabbari
                • Jan 2013
                • 1250

                #8
                Originally posted by gringer View Post
                People seem to be interested in the error rate of the MinION. I'd like to put this image up to demonstrate one of the reasons why error rate is a fickle beast to calculate:



                This is exactly the same event signal model (combination of current and dwell time inside the pore) recalled at three separate times over the past year. I've selected a small region covering a homopolymer sequence to make the mapping changes more impressive and easier to see. The reference sequence is shown in the middle (at the 0 line), with changes shown above and below the sequence.
                I think it would be a good idea to declare conflict of interest when praising a platform. Are you involved in MinIon Analysis and Reference Consortium (MARC)?

                Comment

                • gringer
                  David Eccles (gringer)
                  • May 2011
                  • 845

                  #9
                  Originally posted by nucacidhunter View Post
                  I think it would be a good idea to declare conflict of interest when praising a platform. Are you involved in MinION Analysis and Reference Consortium (MARC)?
                  Yes, and I've also been part of the MAP since the start, and have mentioned my involvement with MAP previously on SEQanswers. It's silly to repeat that every time I talk about the MinION, because everyone who has access to a MinION sequencer has received some amount of shipping-cost-only flow cells and reagents from Oxford Nanopore.

                  The only way you're going to find an interest-free analysis is if someone from outside MAP takes some of the publically-available data and does their own analysis on that. Based on how much feedback I've got on the mitochondrial data I released last year (i.e. none), don't get your hopes up on that.

                  It's also currently impossible to re-call event data without having access to Metrichor, so unless someone from outside MAP writes their own base caller everyone is stuck with what ONT throws at them.

                  Perhaps our MARC paper will change that, because it's a bit more public and has a lot more pre-analysed and mapped data for other people to look at.
                  Last edited by gringer; 10-17-2015, 04:23 AM.

                  Comment

                  • ymc
                    Senior Member
                    • Mar 2010
                    • 496

                    #10
                    Originally posted by gringer View Post
                    It should be an 'A' insertion, but yes, it's real. We were sequencing 4T1 cancer cells, which have a few variants different from the reference sequence. You can see the paper for more details:



                    ResearchGate link if you don't have direct access to the paper through Cell:

                    https://www.researchgate.net/publication/270582858
                    Does it make sense to use long read technology to study somatic mutations?

                    I think the Illumina and X10 combo should work better because I have yet encountered a somatic repeat that can take advantage of the true long read technology.

                    Comment

                    • gringer
                      David Eccles (gringer)
                      • May 2011
                      • 845

                      #11
                      Originally posted by ymc View Post
                      Does it make sense to use long read technology to study somatic mutations?
                      Yes, because we were able to do a whole-mitochondria run on two amplified 8kb fragments of mitochondrial DNA for about $100 (approximate cost of non-ONT reagents and shipping-cost-only flow cells). Illumina is overkill for mitochondrial sequencing, so it makes sense to use something cheaper when available. Even without barcoding, we can get at least 4 mitochondrial runs done on the MinION by using wash buffer between runs and running for 1-4 hours.

                      Originally posted by ymc View Post
                      I think the Illumina and X10 combo should work better because I have yet encountered a somatic repeat that can take advantage of the true long read technology.
                      The MinION does a reasonable job with SNPs and small INDELs. It's just not (yet) great for long homopolymers as demonstrated here. I found a few other mitochondrial SNPs that did work well with the MinION, and were supported by IonTorrent sequencing.

                      Comment

                      • nucacidhunter
                        Jafar Jabbari
                        • Jan 2013
                        • 1250

                        #12
                        Originally posted by gringer View Post
                        Yes, and I've also been part of the MAP since the start, and have mentioned my involvement with MAP previously on SEQanswers. It's silly to repeat that every time I talk about the MinION, because everyone who has access to a MinION sequencer has received some amount of shipping-cost-only flow cells and reagents from Oxford Nanopore.

                        The only way you're going to find an interest-free analysis is if someone from outside MAP takes some of the publically-available data and does their own analysis on that. Based on how much feedback I've got on the mitochondrial data I released last year (i.e. none), don't get your hopes up on that.

                        It's also currently impossible to re-call event data without having access to Metrichor, so unless someone from outside MAP writes their own base caller everyone is stuck with what ONT throws at them.

                        Perhaps our MARC paper will change that, because it's a bit more public and has a lot more pre-analysed and mapped data for other people to look at.
                        Only a subset of MAP participants and ONT paid consultants are involved with MARC and I think this makes it different from ordinary MAPers.

                        Comment

                        • gringer
                          David Eccles (gringer)
                          • May 2011
                          • 845

                          #13
                          Originally posted by nucacidhunter View Post
                          Only a subset of MAP participants and ONT paid consultants are involved with MARC and I think this makes it different from ordinary MAPers.
                          This comment suggests that MARC is some exclusive club, but it's not. Anyone can be part of MARC, even those outside MAP. There's no fee to pay, and no one cares if members don't say anything on the mailing lists or check in during the meetings.

                          MARC is no different from the rest of MAP in that ONT will give free-excluding-shipping flow cells to anyone who wants to try out a big experiment and publish a paper or present at a meeting. There is some collective bargaining advantage, but we're still all waiting for flow cells to arrive, and are stuck behind the queue of commercial customers just like everyone else in MAP. Any people who pay $1000 (or $500 in bulk) for each flow cell will get faster access to ONT services than anyone in MARC.

                          Anyone inside MAP can see the results that MARC is producing (they're on the MAP wiki), and (when I've got a bit of spare time to write) can also see the minutes of the teleconferences that we have.

                          If anyone else wants to join, just let Ewan Birney know (birney at ebi.ac.uk), and he can add another email address to the mailing list.

                          Comment

                          • nucacidhunter
                            Jafar Jabbari
                            • Jan 2013
                            • 1250

                            #14
                            Thanks for providing more info on MARC.

                            Comment

                            • ymc
                              Senior Member
                              • Mar 2010
                              • 496

                              #15
                              "behind the queue of commercial customers" - What does this mean? Does it mean if you pay (how much?), then you can get a box really quick? Can you elaborate? Thanks

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              24 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              23 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...