Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ion Torrent claims of MiSeq showing post-homopolymer substitution errors

    I wanted to hopefully start some discussion here of perhaps the most interesting thing going on in the sequencing marketing world this week (while we wait for Roche to up its bid for ILMN or bail ).

    Ion Torrent posted an analysis of public MiSeq data on the Ion Community, and is presenting an analysis that describes a "clear systematic bias within MiSeq® data". A choice quote is below (PDF export of the post is attached...you know, for openness):

    "These substitution errors often fall to the last base of a homopolymer region - based on the direction of the read. For example, in a stretch of three G bases, the fourth base is often erroneously called a G. This strand-specific pattern is wide spread, and explains 49.9% and 51.8% of MiSeq® substitution errors overall in DH10B and K12, respectively. This dominant error profile that can be found so frequently next to homopolymer regions suggests a clear systematic bias within MiSeq® data.
    Keith Robison and Monkol Lek have taken a look at the claims on their respective blogs.
    Attached Files

  • #2
    I love it. The good folks at Life Tech may have in fact helped to make analysis pipelines for MiSeq better by publicizing a bias that is probably much more fixable than the homopolymer issues on their own platform. Keep up the good work Ion.

    Comment


    • #3
      Surely as this is strand specific it's not too big a problem. You just need to be sure that any SNP is visible in both forward and reverse reads. If it's only seen in reads from one direction, then you should ignore it, treat it with caution or at least give it a really low mappability score) - something I think most aligners do (correct me if I'm wrong).

      The only problem is if you had a single base flanked by homopolymers in both directions. Then the base would be miscalled on both strands.

      Comment


      • #4
        Is someone else also getting tired of companies trying to prove the weaknesses of the opponent rather than focussing on their own system?

        Comment


        • #5
          So ONLY NOW someone finally realizes weakness of opponent is not a proper subject? How convenient is the timing ...

          BTW, trading sensitivity for specificity is always a great solution.

          Comment


          • #6
            I wonder if this is related to the fast chemistry times of Illumina's newest platforms? Seems odd such a prevalent error profile would go missed.

            Comment


            • #7
              Let me discard the previous post.

              IonTorrent is finding something real. However, I think this is not caused by homopolymer run, at least not mainly caused by that, but by the "GGC" and/or the invert repeat artifact [PMID:21576222]. This region is particularly enriched with GGC on both forward and backward strands. In addition, the screenshot is exaggerating the Illumina problem a little bit: they disabled shading in IGV; the majority of mismatches have quality below 10 and are barely visible under the IGV default setting. Some mismatches do get Q20 recurrently, which is worrying.
              Last edited by lh3; 02-01-2012, 07:54 PM.

              Comment


              • #8
                Originally posted by sinaian View Post
                So ONLY NOW someone finally realizes weakness of opponent is not a proper subject? How convenient is the timing ...

                BTW, trading sensitivity for specificity is always a great solution.
                just poking through their documentation, there are several publications that have found this before.

                Comment


                • #9
                  Originally posted by snetmcom View Post
                  just poking through their documentation, there are several publications that have found this before.
                  Yes, I think MIRA creator, Bastien Chevreux, noticed it first -- and changed MIRA to compensate for the Illumina GGCxG issue. Bizarre Illumina has not fixed it themselves, but there are a handful of issues Illumina seems blind to.

                  --
                  Phillip

                  Comment


                  • #10
                    The system bias indeed exists. But it is usually very small - no more than 1/1000 detected SNVs are caused by system errors. Therefore few people realize it.

                    However it is fatal to rare disease causal novel SNP detection, because system errors occur randomly to the whole genome, and since the known SNPs occupy only 1/100 (db135 ~30M/3G) of the genome base positions, most of the errors SNVs exist in novel sites. That leads to a high false positive rate in your novel SNPs.

                    This problem could be far more worse if you want to find common novel SNPs in size>=3 population samples. Actually we found a terrible FPR (>98%) in detected common novel SNVs of a whole exome sequencing project (family samples, size=3, sequence generated by one GAII) in 2010. However, it is important to note that not all our Illumina sequence data have such a high error rate.

                    In my observation, the proceeding homopolymer leads to most of the false positives,while GGC problem is light. I think it may depend on sample properties and other factors.

                    As you guys may already find, there have been many articles introducing methods to solve the system bias problems of the NGS instruments, such as GATK variants calibration, VarScan, CRISP, SERVIC4E, and etc. Unfortunately there is no common conclusion that which method provides the best solution. No offense, I personally had bad experience with GATK's old versions, which crashed again and again and was too picky to my BAM files exported by other aligner. I did not try other tools yet, and I am still using my own scripts to filter the false positives.

                    Comment


                    • #11
                      Is
                      Originally posted by sinaian View Post
                      weakness of opponent
                      a popular strategy in the US at the moment due to your 57th presidential election?

                      Bashing opponents always makes jucier headlines than demonstrating minor improvments to your own system. I would very much prefer to hear Ion discussing the very real improvments they have made. The technology has raced forward as fast as we hoped it would.

                      Comment


                      • #12
                        Originally posted by james hadfield View Post
                        Bashing opponents always makes jucier headlines than demonstrating minor improvments to your own system. I would very much prefer to hear Ion discussing the very real improvments they have made. The technology has raced forward as fast as we hoped it would.
                        Fully agreed. But it is just intereting to compare the atmosphere when one party came out bashing the other, versus when the opponenent actually answers back.

                        Comment


                        • #13
                          Can anyone verify that this is the old "GGCxG" issue?

                          If so, I have my doubts that Illumina will address the issue on the basis of LifeTech pointing it out. Seems either to be firmly in their corporate blind spot or an intractable issue.

                          --
                          Phillip

                          Comment


                          • #14
                            Originally posted by pmiguel View Post
                            Can anyone verify that this is the old "GGCxG" issue?

                            If so, I have my doubts that Illumina will address the issue on the basis of LifeTech pointing it out. Seems either to be firmly in their corporate blind spot or an intractable issue.

                            --
                            Phillip
                            This system bias problem probably can never be completely solved. But I believe new algorithms will help distinguish the error calls.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            25 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            29 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            25 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            52 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X