Using the alignment of the raw pacbio reads on a reference genome it is possible to count the number of insertions and deletions at each position (from the mpileup file) and to sum these figures for homopolymers. It is then possible to calculate the mean error rates per variation type and homopolymer length.
The results for one of our genomes of interest is presented hereunder.
We clearly see that the insertion proportion decreases with the homopolymer length and on the contrary the deletion proportion increases.
http://genoweb.toulouse.inra.fr/~klo...ion_pacbio.png
NB1 : this genome having only few homopolymers larger than 7 base pairs the corresponding figures have to be taken with caution.
NB2 : We have the same shapes for other genomes.
PacBio data is meant to have random errors which does not seem to be true looking at these figures.
I'm interested in any finding or comment about this issue.
The results for one of our genomes of interest is presented hereunder.
We clearly see that the insertion proportion decreases with the homopolymer length and on the contrary the deletion proportion increases.
http://genoweb.toulouse.inra.fr/~klo...ion_pacbio.png
NB1 : this genome having only few homopolymers larger than 7 base pairs the corresponding figures have to be taken with caution.
NB2 : We have the same shapes for other genomes.
PacBio data is meant to have random errors which does not seem to be true looking at these figures.
I'm interested in any finding or comment about this issue.
Comment