Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by jkbonfield View Post
    Do they still emit varying quality values for N bases? That always confused me. Most were 4 I think, but we'd occasionally see N with quality all the way up to 10. I can only assume they change bases to N at some stage, but don't do anything with the Q value. It seemed broken at the time anyway, but maybe it's a bit saner now.
    I agree with your experience. I never understood what an "N" of quality "10" actually meant; Q10 means probability of error is 10% so does that mean there is a 90% chance it is not an "N" ? :-)

    I just wrote a quick Perl script to check how N is being qualitied on a recent Pipeline 1.6 for the first 2M reads of a random fastq file from the run (QVALUE => FREQUENCY):

    '6' => 7,
    '11' => 22,
    '7' => 57,
    '9' => 80,
    '12' => 18,
    '2' => 281517,
    '15' => 1,
    '14' => 5,
    '8' => 62,
    '4' => 51799,
    '13' => 3,
    '10' => 23,
    '5' => 72

    As you can see, most are Q02, which is "B" and is part of the 'rejected section' of the read, so they can be ignored. Most of true Ns are Q4 ("D") as they were in your experience, however there are still smatterings of Ns with qualities all the way up to Q15 !

    *sigh*

    Comment


    • #17
      Quality score of -54(10)

      Before mapping and before subtracting 64, I checked the distribution of quality scores for my reads (PIPELINE 1.6). I noticed what everyone mentioned here (quality scores starting at 66 - 64 = 2).

      However, I also noticed thousands of quality scores of 10 - 64 = -54. I thought negative quality scores were "phased out" according to the Wiki? What are these? More importantly, do they say anything about run quality? One end of my paired-end run has more -54 quality bases in the second end for every lane, what does that mean?

      Second question, do any of the current mapping programs (Bowtie, BWA, BFAST, SOAP, etc) automatically do end-clipping of "B" quality bases at ends of reads? I am guessing that the -54 scores are converted to zero.

      Cheers,
      Juan

      Comment


      • #18
        Solexa's negative quality scores only went down to -5, so something else is going on.

        Could you post a couple of reads with these funny quality scores? Wrap it in [ code ] and [ /code ] tags for display in the forum.

        Comment


        • #19
          Originally posted by jkbonfield View Post
          Do they still emit varying quality values for N bases?

          That always confused me. Most were 4 I think, but we'd occasionally see N with quality all the way up to 10. I can only assume they change bases to N at some stage, but don't do anything with the Q value. It seemed broken at the time anyway, but maybe it's a bit saner now.

          If I can understand what they have done here? -- they take low scoring bases and convert them to N (rather than calling the highest signal with a low score) ? -- when you align these reads are the N's counted as errors ?? or ignored ??

          Comment


          • #20
            [QUOTE=maubp;25709]Solexa's negative quality scores only went down to -5, so something else is going on.

            I figured it out. 10 is the ASCII code for newline. bug in code not bizarre quality score.

            Comment


            • #21
              Isolated B's

              Hi, hopefully I can revive this old thread for a little. I just got a big dataset from a HiSeq2000 machine at Berkeley, I'm not sure which version of the Illumina pipeline was used, but I do see single "B" qualities in some reads
              i.e.


              @HS1_0077:5:1101:1205:2082#0/1
              NCCCCAAAGCATGATGTTTCCACCCCCATGCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCAGACCGATATCGTATGCCGTCTTCCGC
              +HS1_0077:5:1101:1205:2082#0/1
              BTSTTV[VYYc_ac_cccccccc[YUYV_cBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
              @HS1_0077:5:1101:1231:2094#0/1
              NTGTGGTATATATGCATGTAGTTACTTGGCCANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCTCTCCACGATCTCCACACACACCCTCT
              +HS1_0077:5:1101:1231:2094#0/1
              BUPNUSSUUUcccac_c_ccccc_ccc_caBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB


              SamH

              Comment


              • #22
                As discussed earlier in this thread, you can have lone B qualities (PHRED 2) in the quality string, and a trailing block of B markers as well. The second example here specifically copes this:

                Comment


                • #23
                  Hi:

                  I have a question about the so-called Sanger format where Q can be in [0;93] represented by [!-~]. Since this is the set of all visible ASCII characters, then it looks like there is no symbol reserved for a missing value. Does it mean that specifying missing values for the quality of individual base pairs is impossible? If yes, why?

                  Comment


                  • #24
                    Originally posted by NikTuzov View Post
                    Hi:

                    I have a question about the so-called Sanger format where Q can be in [0;93] represented by [!-~]. Since this is the set of all visible ASCII characters, then it looks like there is no symbol reserved for a missing value. Does it mean that specifying missing values for the quality of individual base pairs is impossible? If yes, why?
                    Yes, although in this case you could use PHRED quality 0 (and I recall some tools may use the upper bound 93 as a special value).

                    Why? None of the early technologies needed a missing value quality score.

                    Comment


                    • #25
                      Originally posted by maubp View Post
                      Yes, although in this case you could use PHRED quality 0 (and I recall some tools may use the upper bound 93 as a special value).
                      Thanks. I take it the encoding of missing values depends on the sequencer and it cannot be changed downstream. Therefore, I don't see how some downstream tool can use 93 as a missing value when it is not seen as such by the sequencing instrument.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:37 PM
                      0 responses
                      7 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:07 PM
                      0 responses
                      7 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      49 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      66 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X