Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina/Solexa quality values

    Hi everyone,

    I have some Illumina GA fastq files with base quality values that don't span the full range that I expect.

    The quality values for each of five lanes have the following ranges:
    lane 1: 2 to 27
    lane 2: -1 to 26
    lane 3: 1 to 24
    lane 4: 1 to 27
    lane 5: 0 to 30
    with the majority of bases in all lanes having quality values 22 or 23.

    I got the values above by subtracting the offset 64=='@' from the ascii values of the chars presented in the fastq files.

    These ranges don't seem to be consistent with anything I've seen elsewhere. For example, with Solexa quality values I think the range should go from -5 to 40, and for Phred quality values 0 to 40.
    [ Side note: I am not certain whether my files contain Solexa or Phred-based quality values. I see that the quality value output in GERALD fastq files has changed since Illumina pipeline 1.3 (http://seqanswers.com/forums/showthread.php?t=1110). Since lane 2 contains some -1's, I assume my quality values are Solexa ]

    Anyone have any ideas about what could be happening here? Why don't I see any bases with qualities higher than 30?

    Thanks!
    Dan
    ________
    Last edited by d17; 01-19-2011, 01:56 AM.

  • #2
    Originally posted by d17 View Post
    Anyone have any ideas about what could be happening here? Why don't I see any bases with qualities higher than 30?
    Perhaps a problem with the instrument itself? Have you previously had high quality runs, and if so has anything changed with your hardware or software?
    @1
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    +
    """"""""""""""""""""""""""""""""""""

    Comment


    • #3
      Originally posted by TylerBackman View Post
      Perhaps a problem with the instrument itself? Have you previously had high quality runs, and if so has anything changed with your hardware or software?
      Hmm, we have had high quality runs in the past (i.e. quality values from -5 to 40, most bases called as 40). I'll definitely have to check into whether anything has changed with the machine's hardware or software (it's actually not our machine, and these files are a couple of months old now, so that may be hard to track down). I wonder if anyone else has come across quality values that look remotely like these?
      Last edited by d17; 01-19-2011, 01:56 AM.

      Comment


      • #4
        Hi Dan,

        I was just to post on the very same problem. Most of my quality scores are "V"s, which converts to Q22 on the Illumina scale, if I have that correct (new to this). I'd be interested to know if you find an explanation.

        Thanks,

        Dion

        Comment


        • #5
          Originally posted by d17 View Post
          Anyone have any ideas about what could be happening here? Why don't I see any bases with qualities higher than 30?
          For Solexa, the estimated probability of a base call error for Q30 is 0.001. ie. correct with 99.9% probability. This is actually not too bad.

          In our runs, we get similar quality ranges to what you list, although it is rare to get values below 0 - in fact bases called as "N" usually have Q=0 ... which doesn't make much sense to me. Yes, this was GAPipeline 1.0.

          As I suggest, the quality isn't that bad. The reason you aren't seeing higher is almost certainly due to the prep and/or instrument. eg. if you generate too many clusters on the flowcell (high density) you just won't get high confidence in base calls. It's a touchy tradeoff between density/yield and quality/ability to discern clusters.

          Comment


          • #6
            Torst, thanks for your input:

            Originally posted by Torst View Post
            For Solexa, the estimated probability of a base call error for Q30 is 0.001. ie. correct with 99.9% probability. This is actually not too bad.
            Yes, you're absolutely right ... but I would be happier if we had Q40's that were correct with 99.99% probability!

            Originally posted by Torst View Post
            In our runs, we get similar quality ranges to what you list, although it is rare to get values below 0 - in fact bases called as "N" usually have Q=0 ... which doesn't make much sense to me. Yes, this was GAPipeline 1.0.
            One strange thing we have is that bases called as "N" don't always have the same quality value: in the five lanes I posted about the quality values of "N" bases range from -1 to +3. Of course the -1 doesn't make any sense whatsoever, but at least the others are consistent with the base having a low probability of being correct.

            Originally posted by Torst View Post
            The reason you aren't seeing higher is almost certainly due to the prep and/or instrument.
            Does anyone know how much variation in the prep is stochastic? (i.e. Is there a definite problem that I need to hunt down here, or did we just get unlucky compared with previous runs that had higher quality values?)
            ________
            Last edited by d17; 01-19-2011, 01:56 AM.

            Comment


            • #7
              Is your image analysis with IPAR, or with the Illumina pipeline? The first time we used our IPAR unit, it needed "calibration" and resulted in reads with very low quality scores. Re-running the image analysis with firecrest provided higher quality reads.
              @1
              NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
              +
              """"""""""""""""""""""""""""""""""""

              Comment


              • #8
                I'm seeing the exact same thing. I'm seeing quality values from -1 (? or ASCII 63) to 25 (Y or ASCII 89), with most of the calls being 23 (W or ASCII 87). Tyler, how was your IPAR unit `recalibrated' exactly?

                Comment


                • #9
                  Originally posted by sjackman View Post
                  I'm seeing the exact same thing. I'm seeing quality values from -1 (? or ASCII 63) to 25 (Y or ASCII 89), with most of the calls being 23 (W or ASCII 87). Tyler, how was your IPAR unit `recalibrated' exactly?
                  The scores were only incorrect for the first run with the IPAR unit, and were then correct for all subsequent runs.
                  @1
                  NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                  +
                  """"""""""""""""""""""""""""""""""""

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  9 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X