SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
multisample variant calling and consensus sequences andreanna05 Bioinformatics 2 08-08-2015 06:24 AM
454 Amplicon Variant Analyzer creating consensus sequence Himalaya 454 Pyrosequencing 0 10-22-2012 08:25 AM
Statistics behind variant/consensus calling ZHIHUA.LI Bioinformatics 1 06-28-2012 09:46 AM
Consensus algorithm recommendations for NGS amplicon sequencing Vinz Bioinformatics 0 05-10-2011 10:38 PM
question on SAMtools consensus calling orionzhou Bioinformatics 9 11-16-2010 02:42 PM

Reply
 
Thread Tools
Old 09-21-2015, 01:22 PM   #1
MinnSeq
Junior Member
 
Location: Midwest

Join Date: Sep 2015
Posts: 2
Default Amplicon consensus calling from sub-reads

Hi everyone,

I just got my first PacBio data in and would like some advice on how to handle the data. I submitted a 1KB amplicon library of ~ 5K variants, and requested that the library be run on CCS mode (90 min movie). Instead, the sequencing facility ran the library as a 240 min movie (P6 C2 chemistry). For this run, the P1 productivity was 56% and P2 productivity was 37%. The quality of "Reads of Insert" was 0.96. I was planning to use the Long Amplicon Analysis in the SMRT Portal when I get the CCS data, but now I am a bit unsure about how to work with the data.

Can my data (from 240 min movie) be treated the same way as a CCS run for data analysis ? Or does a CCS run and subsequent analysis treat the data differently compared to assembling a consensus from the sub-reads of a traditional run ?

Many thanks from a PacBio newbie !
MinnSeq is offline   Reply With Quote
Old 09-21-2015, 03:03 PM   #2
ndelaney
Member
 
Location: Cambridge, MA

Join Date: May 2011
Posts: 19
Default

A four hour movie will give you better data than a 1.5 hour movie, and from the softwares perspective they will be treated the same for input/output (a four hour movie just means you had more passes over the template sequence).

CCS and LAA are different programs. CCS is if you want to get the consensus sequence for each of the 1 KB templates that was sequenced in a ZMW (with ~56% loading, this should be about ~50,000 CCS reads depending on the quality and numpasses threshold that you use).

Since you only have ~5K variants, than LAA is likely more appropriate. This program will take all the reads from ZMWs with the same amplicon, and combine them to produce one polished read. The advantage of this is that if one ZMW has 20X coverage, and another has 20X coverage of the same molecule, they can be combined to get consensus using 40X, instead of giving two reads at 20X). Ideally LAA will just give you back the original ~5K sequences.

So in answer to your question, yes exactly the same from the softwares perspective, and just run LAA (https://github.com/PacificBioscience...-Documentation).
ndelaney is offline   Reply With Quote
Old 09-21-2015, 03:06 PM   #3
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 318
Default

I would argue that the sequencing facility did the correct thing, the argument for running shorter movies is that at some point you hit a plateau in the consensus accuracy of a CCS read. The longer movie gives you more of a chance to hit this plateau, the only loss is machine time. The CCS data filtered for a given accuracy will be equivalent.
LAA (Long Amplicon Analysis) does not use CCS data, for a 1kb amplicon that has many possible variants, simply generate the CCS using a high accuracy filter then cluster.
rhall is offline   Reply With Quote
Old 09-21-2015, 03:17 PM   #4
ndelaney
Member
 
Location: Cambridge, MA

Join Date: May 2011
Posts: 19
Default

Just to clarify my comment, unless the 5K variants are very divergent templates or are barcoded, CCS + clustering is the better choice as rhall said. If the templates aren't divergent enough, LAA will simply join the similar templates together rather than return different sequences.
ndelaney is offline   Reply With Quote
Old 09-21-2015, 03:44 PM   #5
MinnSeq
Junior Member
 
Location: Midwest

Join Date: Sep 2015
Posts: 2
Default

Thanks @ndelaney and @rhall for the answers ! I am quite happy to learn that I have better coverage with the longer movie than I expected with the 90 min one.

Regarding suggestions about using CCS or LAA, each member of the library is expected to differ from the rest of the library by about 10 bases (0.01% sequence variation). However, each member also has a unique 20 nucleotide barcode associated with it (so 5K barcodes total).

Would LAA be good enough to differentiate the variants just based on the barcodes ? Or am I better off performing CCS for each ZMW, and then do clustering based consensus building using the barcodes as cluster anchors ?

Thanks again !
MinnSeq is offline   Reply With Quote
Reply

Tags
amplicon sequencing, consensus, pacbio

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:09 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO