SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
LSC - a fast PacBio long read error correction tool. LSC Bioinformatics 9 08-21-2015 07:06 AM
LSC - a fast PacBio long read error correction tool. LSC Pacific Biosciences 55 02-14-2014 06:34 AM
pacbio sequence error correction [email protected] Pacific Biosciences 5 11-22-2012 09:17 AM
genome pattern search yaximik Bioinformatics 2 07-17-2012 07:03 AM
Pattern location, Please help me happy2000 Bioinformatics 2 05-16-2011 02:00 AM

Reply
 
Thread Tools
Old 03-25-2013, 02:11 AM   #1
juckdnarocks
Member
 
Location: china

Join Date: Sep 2012
Posts: 10
Default PacBio error pattern

Hello, I am wondering why the error pattern of PacBio raw data is dominated by InDel. Could someone help to explain it ?
Thanks in advance.
juckdnarocks is offline   Reply With Quote
Old 03-25-2013, 09:09 AM   #2
scbaker
Shawn Baker
 
Location: San Diego

Join Date: Aug 2008
Posts: 84
Default

Quote:
Originally Posted by juckdnarocks View Post
Hello, I am wondering why the error pattern of PacBio raw data is dominated by InDel. Could someone help to explain it ?
Thanks in advance.
I think it's due to both the 'single molecule' and 'real time' nature of the system. The RS is essentially taking a video of the polymerase adding nucleotides in real time. As this happens very quickly, the imaging system might miss an addition (or maybe an unlabeled nucleotide slipped into the system), making it look like a small deletion. I'm not sure what would lead to a small insert (but I'm also not sure that error model really happens with PacBio).
scbaker is offline   Reply With Quote
Old 03-26-2013, 01:05 AM   #3
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

The insertion errors result from a nucleotide entering the detection zone for a significant amount of time, but without being incorporated.
flxlex is offline   Reply With Quote
Old 03-26-2013, 10:48 AM   #4
scbaker
Shawn Baker
 
Location: San Diego

Join Date: Aug 2008
Posts: 84
Default

Quote:
Originally Posted by flxlex View Post
The insertion errors result from a nucleotide entering the detection zone for a significant amount of time, but without being incorporated.
Ah, that makes perfect sense - thanks!
scbaker is offline   Reply With Quote
Old 03-26-2013, 10:47 PM   #5
ELoomis
Junior Member
 
Location: London, UK

Join Date: Sep 2011
Posts: 9
Default

Indeed. The measured signal is based on the residence time of the tagged nucleotide rather than from the tagged end of a terminal incorporation. You can also have problems with pulses merging together when you hit a long homopolymer
ELoomis is offline   Reply With Quote
Old 03-27-2013, 07:35 PM   #6
juckdnarocks
Member
 
Location: china

Join Date: Sep 2012
Posts: 10
Default

Quote:
Originally Posted by ELoomis View Post
Indeed. The measured signal is based on the residence time of the tagged nucleotide rather than from the tagged end of a terminal incorporation. You can also have problems with pulses merging together when you hit a long homopolymer
It means that SMRT has the problem with a long homopolymer like 454? I thought it is a random error model.
juckdnarocks is offline   Reply With Quote
Old 03-28-2013, 09:03 AM   #7
ELoomis
Junior Member
 
Location: London, UK

Join Date: Sep 2011
Posts: 9
Default

Quote:
Originally Posted by juckdnarocks View Post
It means that SMRT has the problem with a long homopolymer like 454? I thought it is a random error model.
"Problem" is relative. You'll have a harder time calling the exact number of nt's in the homopolymer run, but polymerase keeps running through it, so you can still get accuracy improvement with CCS (if you really want to know how many nt's are there) or flanking sequence on either side (if you just want to map your read). If the exact number of nt's in a homopolymer run is your thing, you could also delve into the basecalling parameters to improve/optimize since this isn't a major priority for the default user...
In my experience, the SMRT error profile is remarkably stable through very extreme sequence compositions (100% GC, trinucleotide repeats) and all the way to the end of the raw read.
ELoomis is offline   Reply With Quote
Old 03-28-2013, 10:08 AM   #8
jbingham
Member
 
Location: Silicon Valley

Join Date: Jul 2011
Posts: 24
Default Homopolymers

PacBio's sequencing errors in homopolymers are still stochastic (random), just at a higher rate. With enough coverage, the consensus across homopolymers approaches 100%, just like in non-homopolymer regions. This is the case for de novo assembly, resequencing, and also single molecules (circular consensus).

By contrast, systematic errors don't go away with coverage, and limit the ultimate consensus accuracy. That's why many sequencing experiments plateau at Q40 or so. By contrast, because of the randomness of the errors, PacBio has demonstrated results greater than Q50 for a range of bacterial genomes and BACs.

See this blog from PacBio for a better explanation:

http://blog.pacificbiosciences.com/2...in-pacbio.html
jbingham is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:24 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO