Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions about solexa quality score!

    reads.fq file:

    @4:1:518:715
    GATACCATAAAAGCTGGATCCTTCTTCAAGCATAA
    +4:1:518:715
    hhhhhhhhhhhhhhhdhhhhhhhhhhhdRehdhhP

    1. How to change character (like 'e' or 'h') to quality score?

    2. What's the meaning of this score? How to compute this score ( formula )?

  • #2
    For a Fastq file, if the quality character is $q the corresponding Phred quality can be calculated with the following Perl code:

    $Q = ord($q) - 33;
    Farhat Habib

    Comment


    • #3
      This is correct if you are using quality scores encoded in "fastq" format. I believe the Illimina pipeline used a different ascii offset (64) according to their pipeline documentation. A value of zero = ascii 64 ('@'). The ascii value for a qv is therefore qv+64. So "h" = 104 - 64 = 40

      Comment


      • #4
        Dupe. Deleted.
        Last edited by Farhat; 06-17-2008, 07:20 AM.
        Farhat Habib

        Comment


        • #5
          Originally posted by SoupDragon View Post
          This is correct if you are using quality scores encoded in "fastq" format. I believe the Illimina pipeline used a different ascii offset (64) according to their pipeline documentation. A value of zero = ascii 64 ('@'). The ascii value for a qv is therefore qv+64. So "h" = 104 - 64 = 40
          You are right. 'h' would make the quality way beyond 40 by my calculation.
          Farhat Habib

          Comment


          • #6
            Thanks.

            what's the range of this score ? (0---40 ?)

            what's the meaning of this score?

            Comment


            • #7
              Solexa Quality Score

              The range is from -5 to 40

              If P is probability of base then Solexa quality is 10 log10(P/(1-P))

              A quality of -5 corresponds to P=0.25

              Comment


              • #8
                Originally posted by sparks View Post
                The range is from -5 to 40

                If P is probability of base then Solexa quality is 10 log10(P/(1-P))

                A quality of -5 corresponds to P=0.25
                In my datasets the range has been from -40 to 40.
                Farhat Habib

                Comment


                • #9
                  Quality Score Range

                  Farhats right for Solexa prb file formats from the base caller but for fastq format files the OP asked about, the range should be -5 to 40

                  Comment


                  • #10
                    Originally posted by sparks View Post
                    Farhats right for Solexa prb file formats from the base caller but for fastq format files the OP asked about, the range should be -5 to 40
                    Yes, that's right, because for solexa PRB file the probability of A,C,G or T is given separately, and can be really low, whereas for fastq the lowest probability is 0.25 implying equal probability for any nucleotide.
                    Farhat Habib

                    Comment


                    • #11
                      $sQ = -10 * log($e / (1 - $e))

                      when $sQ =40, $e=0.0001

                      when $sQ=0, $e=0.5

                      0.5>0.25


                      when $sQ=-4 $e=0.72



                      what's the probalibity of error?????????????????????????????

                      Comment


                      • #12
                        If you are talking fastq format and have a quality of -4 then the probability of the base called is 0.28 and probability it is anyone of the other 3 bases is 0.72.

                        If you see a -4 in a prb format file then the probability of the base is 0.28 and the other bases will each have their own prb/qual value.

                        Comment


                        • #13
                          "quala" files

                          The output of a Solexa run generated a "quala" file of the following format:

                          >sequence_0
                          40 40 40 19 7 40 40 40 40 40 31 40 40 40 40 40 40 40 40 40 40 11 40 40 40
                          36 40 12 40 21 39 1 4 40 40 15 40 40 4 40 40 10 40 40 40 40 40 2 4 10
                          1

                          >sequence_1
                          40 40 8 13 12 40 40 40 40 17 27 40 25 17 4 40 40 40 21 40 40 37 40 40 37
                          4 40 33 40 25 40 3 20 40 40 20 40 40 4 40 8 7 40 40 15 4 10 1 5 20
                          1

                          etc...

                          Does anybody know what those numbers mean? Are those simply the Solexa quality scores per base-pair? The range seems to be 1-40 --- why isn't it -5 to 40 as in fasq?

                          Comment


                          • #14
                            Fastq file outside of GERALD

                            Hi,
                            Does anyone know an easy way or an existing program to convert all the .prb files from one particular lane into one fastq file? Similar to the s_1_sequence.txt file but with no filters applied?
                            We have trying hacking around the Perl scripts within GERALD but looks like you need an intermediate seqpre.tmp file which I think gets deleted after the completion of GERALD.

                            We know this is possible by just running GERALD with the fastq parameter. However, we would like to generate a fastq file that is not affected by GERALD's filters. That way we can set up our own quality filters.

                            Any ideas?
                            Do I go ahead and write one?

                            Thanks,
                            Victor

                            Comment


                            • #15
                              I made my own very simple script, but here's a script of James Bonfield's here:

                              Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                              The only problem ithat I see is this line

                              foreach (glob("$fn/*seq.txt")) {

                              which is going to get every single .seq in the directory, not just the ones from a single lane. So you'll have to fix that.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              71 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              80 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X