Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • baohua100
    Senior Member
    • Jun 2008
    • 103

    Questions about solexa quality score!

    reads.fq file:

    @4:1:518:715
    GATACCATAAAAGCTGGATCCTTCTTCAAGCATAA
    +4:1:518:715
    hhhhhhhhhhhhhhhdhhhhhhhhhhhdRehdhhP

    1. How to change character (like 'e' or 'h') to quality score?

    2. What's the meaning of this score? How to compute this score ( formula )?
  • Farhat
    Member
    • Apr 2008
    • 21

    #2
    For a Fastq file, if the quality character is $q the corresponding Phred quality can be calculated with the following Perl code:

    $Q = ord($q) - 33;
    Farhat Habib

    Comment

    • SoupDragon
      Junior Member
      • Jun 2008
      • 1

      #3
      This is correct if you are using quality scores encoded in "fastq" format. I believe the Illimina pipeline used a different ascii offset (64) according to their pipeline documentation. A value of zero = ascii 64 ('@'). The ascii value for a qv is therefore qv+64. So "h" = 104 - 64 = 40

      Comment

      • Farhat
        Member
        • Apr 2008
        • 21

        #4
        Dupe. Deleted.
        Last edited by Farhat; 06-17-2008, 07:20 AM.
        Farhat Habib

        Comment

        • Farhat
          Member
          • Apr 2008
          • 21

          #5
          Originally posted by SoupDragon View Post
          This is correct if you are using quality scores encoded in "fastq" format. I believe the Illimina pipeline used a different ascii offset (64) according to their pipeline documentation. A value of zero = ascii 64 ('@'). The ascii value for a qv is therefore qv+64. So "h" = 104 - 64 = 40
          You are right. 'h' would make the quality way beyond 40 by my calculation.
          Farhat Habib

          Comment

          • baohua100
            Senior Member
            • Jun 2008
            • 103

            #6
            Thanks.

            what's the range of this score ? (0---40 ?)

            what's the meaning of this score?

            Comment

            • sparks
              Senior Member
              • Mar 2008
              • 126

              #7
              Solexa Quality Score

              The range is from -5 to 40

              If P is probability of base then Solexa quality is 10 log10(P/(1-P))

              A quality of -5 corresponds to P=0.25

              Comment

              • Farhat
                Member
                • Apr 2008
                • 21

                #8
                Originally posted by sparks View Post
                The range is from -5 to 40

                If P is probability of base then Solexa quality is 10 log10(P/(1-P))

                A quality of -5 corresponds to P=0.25
                In my datasets the range has been from -40 to 40.
                Farhat Habib

                Comment

                • sparks
                  Senior Member
                  • Mar 2008
                  • 126

                  #9
                  Quality Score Range

                  Farhats right for Solexa prb file formats from the base caller but for fastq format files the OP asked about, the range should be -5 to 40

                  Comment

                  • Farhat
                    Member
                    • Apr 2008
                    • 21

                    #10
                    Originally posted by sparks View Post
                    Farhats right for Solexa prb file formats from the base caller but for fastq format files the OP asked about, the range should be -5 to 40
                    Yes, that's right, because for solexa PRB file the probability of A,C,G or T is given separately, and can be really low, whereas for fastq the lowest probability is 0.25 implying equal probability for any nucleotide.
                    Farhat Habib

                    Comment

                    • baohua100
                      Senior Member
                      • Jun 2008
                      • 103

                      #11
                      $sQ = -10 * log($e / (1 - $e))

                      when $sQ =40, $e=0.0001

                      when $sQ=0, $e=0.5

                      0.5>0.25


                      when $sQ=-4 $e=0.72



                      what's the probalibity of error?????????????????????????????

                      Comment

                      • sparks
                        Senior Member
                        • Mar 2008
                        • 126

                        #12
                        If you are talking fastq format and have a quality of -4 then the probability of the base called is 0.28 and probability it is anyone of the other 3 bases is 0.72.

                        If you see a -4 in a prb format file then the probability of the base is 0.28 and the other bases will each have their own prb/qual value.

                        Comment

                        • mikertesz
                          Junior Member
                          • Sep 2008
                          • 1

                          #13
                          "quala" files

                          The output of a Solexa run generated a "quala" file of the following format:

                          >sequence_0
                          40 40 40 19 7 40 40 40 40 40 31 40 40 40 40 40 40 40 40 40 40 11 40 40 40
                          36 40 12 40 21 39 1 4 40 40 15 40 40 4 40 40 10 40 40 40 40 40 2 4 10
                          1

                          >sequence_1
                          40 40 8 13 12 40 40 40 40 17 27 40 25 17 4 40 40 40 21 40 40 37 40 40 37
                          4 40 33 40 25 40 3 20 40 40 20 40 40 4 40 8 7 40 40 15 4 10 1 5 20
                          1

                          etc...

                          Does anybody know what those numbers mean? Are those simply the Solexa quality scores per base-pair? The range seems to be 1-40 --- why isn't it -5 to 40 as in fasq?

                          Comment

                          • vruotti
                            Member
                            • Feb 2008
                            • 13

                            #14
                            Fastq file outside of GERALD

                            Hi,
                            Does anyone know an easy way or an existing program to convert all the .prb files from one particular lane into one fastq file? Similar to the s_1_sequence.txt file but with no filters applied?
                            We have trying hacking around the Perl scripts within GERALD but looks like you need an intermediate seqpre.tmp file which I think gets deleted after the completion of GERALD.

                            We know this is possible by just running GERALD with the fastq parameter. However, we would like to generate a fastq file that is not affected by GERALD's filters. That way we can set up our own quality filters.

                            Any ideas?
                            Do I go ahead and write one?

                            Thanks,
                            Victor

                            Comment

                            • swbarnes2
                              Senior Member
                              • May 2008
                              • 910

                              #15
                              I made my own very simple script, but here's a script of James Bonfield's here:

                              Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                              The only problem ithat I see is this line

                              foreach (glob("$fn/*seq.txt")) {

                              which is going to get every single .seq in the directory, not just the ones from a single lane. So you'll have to fix that.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 11:58 AM
                              0 responses
                              10 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              25 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              35 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              58 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...