Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina quality score

    Hi everyone,

    Currently I started to study of DNA assembly and I found something I could not understand in the following paper:



    The figure 1 in the paper shows the relationship between the quality score and the number of errors in reads. They said that the quality score 40 meant 0.01% of error probability and that was true it the following equation was used:

    Q = -10log(p/(1-p))

    However, according to the graph, only about 65% bases which have the score 40 are correct. Moreover, the percentage of correct bases which have smaller than 40 is almost 0 for all values. I wonder whether this trend is usual or not.
    Thank you.

  • #2
    I can see your confusion. There is something very strange about figure 1 in their paper. Panel (b) might make sense if what they were plotting was error rate relative to base position for a 40bp read. How can panel (a) even go up to 41 if the max Illumina quality score is 40 as they themselves state in the text?

    Look at these papers for better treatment of the question of quality scores vs error rates:
    An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms


    Comment


    • #3
      Hi obig,

      The largest number in the graph (a) may be 40 and 41 may be displayed because of the wrong setting in Excel. Anyway, they did not explain the number.
      Thank you for recommending papers. I appreciate it.

      Yun

      Comment


      • #4
        Illumina quality scores and deviations from ref sequence

        I have collected some data on the distribution of the Illumina quality scores as function of the base position and the actual number of deviations from the reference sequence for an exome enriched sample (101 read length). The Illumina quality score assigns for base 2 the lowest score in 0.04% of the cases. For baseposition 75 this is already 19%, and for base position 100 this figure is 49%. This correlates not very well with observed rates of differences with the Human ref37 genome, with 80% of reads mapped with an exact four single base maximum error model. For baseposition 2 there are 0.5% deviations, for base position 75 1 % and for base position 100 3.5%. You may interpolate the intermediate positions for a reasonable fit. My conclusion is that the Illumina quality score has a very limited relation with observed deviations from the reference sequence. Most deviations are actually errors because the mutational load in the human exome is much lower than the observed rate in exome sequencing. A quality score should differentiate much better in the lower regions of quality to be useful for base calling.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        46 views
        0 likes
        Last Post seqadmin  
        Working...
        X