Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PacBio error pattern

    Hello, I am wondering why the error pattern of PacBio raw data is dominated by InDel. Could someone help to explain it ?
    Thanks in advance.

  • #2
    Originally posted by juckdnarocks View Post
    Hello, I am wondering why the error pattern of PacBio raw data is dominated by InDel. Could someone help to explain it ?
    Thanks in advance.
    I think it's due to both the 'single molecule' and 'real time' nature of the system. The RS is essentially taking a video of the polymerase adding nucleotides in real time. As this happens very quickly, the imaging system might miss an addition (or maybe an unlabeled nucleotide slipped into the system), making it look like a small deletion. I'm not sure what would lead to a small insert (but I'm also not sure that error model really happens with PacBio).

    Comment


    • #3
      The insertion errors result from a nucleotide entering the detection zone for a significant amount of time, but without being incorporated.

      Comment


      • #4
        Originally posted by flxlex View Post
        The insertion errors result from a nucleotide entering the detection zone for a significant amount of time, but without being incorporated.
        Ah, that makes perfect sense - thanks!

        Comment


        • #5
          Indeed. The measured signal is based on the residence time of the tagged nucleotide rather than from the tagged end of a terminal incorporation. You can also have problems with pulses merging together when you hit a long homopolymer

          Comment


          • #6
            Originally posted by ELoomis View Post
            Indeed. The measured signal is based on the residence time of the tagged nucleotide rather than from the tagged end of a terminal incorporation. You can also have problems with pulses merging together when you hit a long homopolymer
            It means that SMRT has the problem with a long homopolymer like 454? I thought it is a random error model.

            Comment


            • #7
              Originally posted by juckdnarocks View Post
              It means that SMRT has the problem with a long homopolymer like 454? I thought it is a random error model.
              "Problem" is relative. You'll have a harder time calling the exact number of nt's in the homopolymer run, but polymerase keeps running through it, so you can still get accuracy improvement with CCS (if you really want to know how many nt's are there) or flanking sequence on either side (if you just want to map your read). If the exact number of nt's in a homopolymer run is your thing, you could also delve into the basecalling parameters to improve/optimize since this isn't a major priority for the default user...
              In my experience, the SMRT error profile is remarkably stable through very extreme sequence compositions (100% GC, trinucleotide repeats) and all the way to the end of the raw read.

              Comment


              • #8
                Homopolymers

                PacBio's sequencing errors in homopolymers are still stochastic (random), just at a higher rate. With enough coverage, the consensus across homopolymers approaches 100%, just like in non-homopolymer regions. This is the case for de novo assembly, resequencing, and also single molecules (circular consensus).

                By contrast, systematic errors don't go away with coverage, and limit the ultimate consensus accuracy. That's why many sequencing experiments plateau at Q40 or so. By contrast, because of the randomness of the errors, PacBio has demonstrated results greater than Q50 for a range of bacterial genomes and BACs.

                See this blog from PacBio for a better explanation:

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                47 views
                0 likes
                Last Post seqadmin  
                Working...
                X