Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina/Solexa quality values

    Hi everyone,

    I have some Illumina GA fastq files with base quality values that don't span the full range that I expect.

    The quality values for each of five lanes have the following ranges:
    lane 1: 2 to 27
    lane 2: -1 to 26
    lane 3: 1 to 24
    lane 4: 1 to 27
    lane 5: 0 to 30
    with the majority of bases in all lanes having quality values 22 or 23.

    I got the values above by subtracting the offset 64=='@' from the ascii values of the chars presented in the fastq files.

    These ranges don't seem to be consistent with anything I've seen elsewhere. For example, with Solexa quality values I think the range should go from -5 to 40, and for Phred quality values 0 to 40.
    [ Side note: I am not certain whether my files contain Solexa or Phred-based quality values. I see that the quality value output in GERALD fastq files has changed since Illumina pipeline 1.3 (http://seqanswers.com/forums/showthread.php?t=1110). Since lane 2 contains some -1's, I assume my quality values are Solexa ]

    Anyone have any ideas about what could be happening here? Why don't I see any bases with qualities higher than 30?

    Thanks!
    Dan
    ________
    Last edited by d17; 01-19-2011, 01:56 AM.

  • #2
    Originally posted by d17 View Post
    Anyone have any ideas about what could be happening here? Why don't I see any bases with qualities higher than 30?
    Perhaps a problem with the instrument itself? Have you previously had high quality runs, and if so has anything changed with your hardware or software?
    @1
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    +
    """"""""""""""""""""""""""""""""""""

    Comment


    • #3
      Originally posted by TylerBackman View Post
      Perhaps a problem with the instrument itself? Have you previously had high quality runs, and if so has anything changed with your hardware or software?
      Hmm, we have had high quality runs in the past (i.e. quality values from -5 to 40, most bases called as 40). I'll definitely have to check into whether anything has changed with the machine's hardware or software (it's actually not our machine, and these files are a couple of months old now, so that may be hard to track down). I wonder if anyone else has come across quality values that look remotely like these?
      Last edited by d17; 01-19-2011, 01:56 AM.

      Comment


      • #4
        Hi Dan,

        I was just to post on the very same problem. Most of my quality scores are "V"s, which converts to Q22 on the Illumina scale, if I have that correct (new to this). I'd be interested to know if you find an explanation.

        Thanks,

        Dion

        Comment


        • #5
          Originally posted by d17 View Post
          Anyone have any ideas about what could be happening here? Why don't I see any bases with qualities higher than 30?
          For Solexa, the estimated probability of a base call error for Q30 is 0.001. ie. correct with 99.9% probability. This is actually not too bad.

          In our runs, we get similar quality ranges to what you list, although it is rare to get values below 0 - in fact bases called as "N" usually have Q=0 ... which doesn't make much sense to me. Yes, this was GAPipeline 1.0.

          As I suggest, the quality isn't that bad. The reason you aren't seeing higher is almost certainly due to the prep and/or instrument. eg. if you generate too many clusters on the flowcell (high density) you just won't get high confidence in base calls. It's a touchy tradeoff between density/yield and quality/ability to discern clusters.

          Comment


          • #6
            Torst, thanks for your input:

            Originally posted by Torst View Post
            For Solexa, the estimated probability of a base call error for Q30 is 0.001. ie. correct with 99.9% probability. This is actually not too bad.
            Yes, you're absolutely right ... but I would be happier if we had Q40's that were correct with 99.99% probability!

            Originally posted by Torst View Post
            In our runs, we get similar quality ranges to what you list, although it is rare to get values below 0 - in fact bases called as "N" usually have Q=0 ... which doesn't make much sense to me. Yes, this was GAPipeline 1.0.
            One strange thing we have is that bases called as "N" don't always have the same quality value: in the five lanes I posted about the quality values of "N" bases range from -1 to +3. Of course the -1 doesn't make any sense whatsoever, but at least the others are consistent with the base having a low probability of being correct.

            Originally posted by Torst View Post
            The reason you aren't seeing higher is almost certainly due to the prep and/or instrument.
            Does anyone know how much variation in the prep is stochastic? (i.e. Is there a definite problem that I need to hunt down here, or did we just get unlucky compared with previous runs that had higher quality values?)
            ________
            Last edited by d17; 01-19-2011, 01:56 AM.

            Comment


            • #7
              Is your image analysis with IPAR, or with the Illumina pipeline? The first time we used our IPAR unit, it needed "calibration" and resulted in reads with very low quality scores. Re-running the image analysis with firecrest provided higher quality reads.
              @1
              NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
              +
              """"""""""""""""""""""""""""""""""""

              Comment


              • #8
                I'm seeing the exact same thing. I'm seeing quality values from -1 (? or ASCII 63) to 25 (Y or ASCII 89), with most of the calls being 23 (W or ASCII 87). Tyler, how was your IPAR unit `recalibrated' exactly?

                Comment


                • #9
                  Originally posted by sjackman View Post
                  I'm seeing the exact same thing. I'm seeing quality values from -1 (? or ASCII 63) to 25 (Y or ASCII 89), with most of the calls being 23 (W or ASCII 87). Tyler, how was your IPAR unit `recalibrated' exactly?
                  The scores were only incorrect for the first run with the IPAR unit, and were then correct for all subsequent runs.
                  @1
                  NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
                  +
                  """"""""""""""""""""""""""""""""""""

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Recent Advances in Sequencing Analysis Tools
                    by seqadmin


                    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                    05-06-2024, 07:48 AM
                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 05-10-2024, 06:35 AM
                  0 responses
                  18 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-09-2024, 02:46 PM
                  0 responses
                  21 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-07-2024, 06:57 AM
                  0 responses
                  19 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-06-2024, 07:17 AM
                  0 responses
                  21 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X