Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • phleroy
    Junior Member
    • Jan 2019
    • 4

    PacBio consensus quality

    Hello

    I have sequenced a BAC clone with PacBio RSII
    To make the assembly I used Facon through pbbioconda and for polishing I used quiver

    To have an estimation of the consensus quality I re map the original bam reads file against the consensus

    How to estimate a mean quality value, in other world a consensus Phred score for the base calls of the consensus ... :-)

    Thank you in advance
    Philippe
    Last edited by phleroy; 01-25-2019, 01:07 AM.
  • SNPsaurus
    Registered Vendor
    • May 2013
    • 525

    #2
    We polish with arrow and just list one of the outputs as fastq "-o sample_consensus.fastq" and it generates a fastq file with a consensus for each contig and the quality score. You might check if quiver has the same option, or switch to arrow (here's a blog about doing so https://dazzlerblog.wordpress.com/tag/arrow/ ).
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment

    • phleroy
      Junior Member
      • Jan 2019
      • 4

      #3
      Thank you very much for this suggestion
      We do have the possibility to obtain a fastq file with quiver with the option -o out.fastq as you have mentionned for arrow

      The question, is then, how you recover the mean QV for the consensus?

      into the fastq file we can see :
      @000000F|quiver
      ATCATTGTTACTACTAGAGGAAGAATCTTTCTTG ...
      +
      "RQQPQQQRQQQQQSRRQSTSSQRQRSSSRRRQQRSRRRQRSRQ ...

      I guess the quality value for each consensus nucleotide is the second line ? but how to calculate it ?

      Thank you again for any help
      Philippe

      Comment

      • Magdoll
        Member
        • Aug 2011
        • 30

        #4
        You can convert the Phred QV scores to probabilities then sum over the probabilities over the entire sequence to get the expected number of errors.

        You can use this Python script to calculate expected acc from a FASTQ files (though this is in a repo meant for PacBio transcriptome data, this script is generic):
        Miscellaneous collection of Python and R scripts for processing Iso-Seq data - Magdoll/cDNA_Cupcake

        Comment

        • phleroy
          Junior Member
          • Jan 2019
          • 4

          #5
          Thank you so much, I will try this option as soon as possible and tell you :-)

          Comment

          • phleroy
            Junior Member
            • Jan 2019
            • 4

            #6
            I tried the python script (calc_expected_accuracy_from_fastq.py) on our fastq consensus sequence which was obtained with quiver and obtained as expected the "expected_accurancy" which was : expected_accuracy=0.997

            In a previous analysis I used two smrtlink python scripts to estimate the mean_QV
            - summarize_coverage.py to obtain a alignment summary gff file
            - polished_assembly.py to obtain the csv file which gives the a mean_qv of 48.65

            I have the feeling that the two values estimate different metrics ? I am not a specialist of this area and I am curious to have any remarks or suggestion

            Nevertheless, these two values : mean_qv and expected_accuracy should give an estimation of the quality of the consensus assembly. I just need to understand precisely what interpretation to have for each value

            Thank you in advance
            Philippe

            Comment

            • rhall
              Senior Member
              • Aug 2012
              • 324

              #7
              If you assemble a set of reads, then use them to polish the assembly, there is no way to measure any truly meaningful consensus quality without an orthogonal datatype, or knowledge of ground truth. The expected accuracy from the fastq that results from polishing is highly dependent on the consensus algorithm and may not be a true indication of the quality of the consensus.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM
              • SEQadmin2
                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                by SEQadmin2

                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                05-06-2026, 09:04 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              19 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              14 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-28-2026, 11:40 AM
              0 responses
              29 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-26-2026, 10:12 AM
              0 responses
              31 views
              0 reactions
              Last Post SEQadmin2  
              Working...