Seqanswers Leaderboard Ad

**rhall** · 06-30-2015, 01:14 PM

I'm not sure I understand, are the 1-3kb amplicons tiled, and you are trying to assemble a longer sequence? Otherwise, using the quality filter in the reads_of_insert protocol you can generate 99.9 accurate amplicons, or are you trying to cluster the amplicon sequences?

**PatrickV** · 06-30-2015, 11:01 PM

Originally posted by rhall View Post

I'm not sure I understand, are the 1-3kb amplicons tiled, and you are trying to assemble a longer sequence? Otherwise, using the quality filter in the reads_of_insert protocol you can generate 99.9 accurate amplicons, or are you trying to cluster the amplicon sequences?

No the amplicons are not tiled. After the generation of the reads I indeed would like to cluster the same amplicon sequences together.

**Brian Bushnell** · 06-30-2015, 11:47 PM

I wrote a tool for clustering PacBio reads of insert. It does not generate a consensus, but it will output the single highest-quality read per cluster... or, you can generate a consensus from the clusters, if you have a good consensus-generation tool. For my application, the single best read was much better than the consensus, which tended to be chimeric.

Syntax:

dedupe.sh in=ros.fq csf=stats.txt outbest=best.fq qin=33 am=f ac=f fo c rnc=f mcs=2 k=27 mo=1400 cc pto nam=4 e=26 pattern=cluster_%.fq

I've found those specific settings to be extremely good for 16s sequences which are ~1500bp long. But if you have variable size amplicons, you may need to first bin them by size and use a different "mo" (min overlap) and "e" (max edit distance) setting for the individual bins.

Dedupe is part of the BBTools package.

**rhall** · 07-01-2015, 08:30 AM

To generate a consensus, I would use something like Brian's clustering tool above (usearch, and CDHit are other options) then generate a reference from the best cluster representatives and use it in a quiver resequencing job. This approach works best if the diversity is limited, and clusters represent the same sequence and not closely related sequences, in which case, as is pointed out above, a representative single molecule consensus (at ~QV30) may be more useful than a heterogeneous multi-molecule quiver consensus.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

PacBio Amplicon reads assembly

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News