Old 11-06-2017, 01:16 AM   #1
Location: Poland

Join Date: Jun 2013
Posts: 36
Default PBSuite spots ambiguous insertion output value


Originally I post this question in another place but I could not find an answer.

I am using PBSuite version 15.8.24

` spots` was used to identify structure variations.

here is one line of the output showing insertion, my question is: the insertion should be in one point in the genome, so how the output contains start and end ? (does the software add the size to satrt point to find the end point? is there a shift in all genome based on that)

1 1211650 1211854 INS 353 zscore=-11.943;szMean=353.000;sz3rdQ=550.000;szCount=5.000;strandCnt=2,3;szMedian=547.000;groupName=1;coverage=14.000;sz1stQ=60.000;mqfilt=0.000
also I visited similar question in

but the output in this post is;

lambda_NEB3011 29999 29999 INS 86 zscore=-15.744;GT=1/1;seq=ATTTTCACAAGCGTTATCTTTTACAAAACCGATCTCACTCTCCTTTGATGCGAATGCCAGCGTCAGACATCATATGCAGATACTCA;szMean=89.000;szCount=16.000;sz3rdQ=96.000;consensusCreated=1.000;strandCnt=7,9;szMedian=90.000;groupName=lambda_NEB3011;coverage=16.000;sz1stQ=78.000;mqfilt=0.000;GQ=7.321
were the start and end point is the same `29999`.

is there any explanation?

Medhat
Old 11-06-2017, 07:33 AM   #2
Location: Cambridge

Join Date: Sep 2010
Posts: 97
Lightbulb Some insetrions are flanked with duplication or deletion events...

Quite a few insertional events (esp transposase induced are flanked by the target sequence duplication).
In other cases it can be accompanied by the region deletion/replacement, so it would have both start and stop.

PS: If you have enough reads coverage, assemble the sequences de novo and compare the region of interest in both assemblies using mummer/dotplot/(www)BLAST in master/slave mode.

Also you can use raw read(s) spanning the region of interest and map them using BLAST/BLASR or similar alignment tool which is able to handle the raw pacbio error rate...
Markiyan

