Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi-Seq quality score behavior

    Hey all,
    We're doing mRNA-seq on the Illumina Hi-Seq, and when we look at the quality score as a function of read position, we see that the quality initially increases, then decreases. We also see this when we do miRNA-seq on the Hi-Seq. With the GAII we see the quality decreases as you travel further out on the read, which is what I would expect. Has anyone else seen this on their Hi-Seq?

    What we see on the Hi-Seq:


    What we see on the GAII:

    (Plots generated using fastqc)

  • #2
    Originally posted by rkusko View Post
    Hey all,
    We're doing mRNA-seq on the Illumina Hi-Seq, and when we look at the quality score as a function of read position, we see that the quality initially increases, then decreases. We also see this when we do miRNA-seq on the Hi-Seq. With the GAII we see the quality decreases as you travel further out on the read, which is what I would expect. Has anyone else seen this on their Hi-Seq?

    What we see on the Hi-Seq:


    What we see on the GAII:

    (Plots generated using fastqc)
    Hypothesis: It seems that Hi-seq has an algorithm that uses several iterations of the chemistry to determine the precise dimensions of the clusters, then continues to refine that as it goes along, since it can't afford to store all of the images on the local drive(by default), from beginning to end. The worst case is when the bases are homogenous at the start, no clusters can be identified, it can recover somewhat. I am guessing that even in non-homogenous sequences, it can better define the clusters with more iterations until the chemistry starts to deteriorate.

    One way to test this hypothesis is to store the images and do an analysis with the complete set of images.

    Comment


    • #3
      The most recent version of the software that came with the v3 chemistry down-grades the quality of early bases of a read. Not sure what the basis for this is.

      Here is a plot from a v3 run we are in the midst of:



      (Color denotes raw cluster density)

      Looks like there are a population of the tiles in early cycles that have low signal to noise. Perhaps these are the source of the lower quality values?

      --
      Phillip

      Comment


      • #4
        Thanks for the explanation Phillip. I've also been seeing this pattern of quality scores in my latest Hi-Seq data (exome-sequencing).

        Comment


        • #5
          We are seeing the same thing with exactly the same step pattern of quality. I'm not sure how to handle this downstream since it essentially means that the quality scores in the first three bases, and possibly those in the next 2 blocks of five bases, cannot be relied upon to be accurate. Should I just be trimming off the first three bases as a matter of course?
          Attached Files

          Comment


          • #6
            Reminds me of the old Steve Martin routine on his stereo system...

            Q30 is still very high quality (1 error per 1000 base calls). Sure we would love to have more of those Q40 (1 error per 10,000 base calls), but have some perspective. Mean quality values on an Ion Torrent don't tend to go above Q20 (1 error per 100 base calls).

            --
            Phillip

            Comment


            • #7
              Originally posted by pmiguel View Post
              Reminds me of the old Steve Martin routine on his stereo system...

              Q30 is still very high quality (1 error per 1000 base calls). Sure we would love to have more of those Q40 (1 error per 10,000 base calls), but have some perspective. Mean quality values on an Ion Torrent don't tend to go above Q20 (1 error per 100 base calls).

              --
              Phillip
              My concern is whether those q30-32 scores for the first 3 bases have anything to do with reality. Another user here has some data from amplicon sequencing that suggests they do not (variation from known primer sequence is much more than 1 in 1000).

              Comment


              • #8
                Originally posted by greigite View Post
                My concern is whether those q30-32 scores for the first 3 bases have anything to do with reality. Another user here has some data from amplicon sequencing that suggests they do not (variation from known primer sequence is much more than 1 in 1000).
                Reasonable concern. But that is not a good test of the quality values. It probably reflects the oligo synthesis error rate.

                Maybe look at the phiX error rate vs quality values?

                --
                Phillip

                Comment


                • #9
                  I have this pattern in all of my HiSeq runs, even in the phiX lanes. I was initially worried and sent the Qscore by Cycle plots to Illumina. They responded:

                  "Your Illumina FAS asked me to follow up with you on some questions you had about the Qscore pattern you are seeing in your sequencing data, particularly at the very start of the run. I can confirm that this pattern is normal, and the lower scores at the beginning 15 cycles or so is typical. I've attached an example of a typical Qscore heat map from a run with the same software versions as those you are using. "

                  http://dl.dropbox.com/u/30955182/Typ...0HCS%2014x.pdf

                  Comment


                  • #10
                    So the question is whether runs prior to the v3 chemistry software change have quality values that are inaccurate during the 1st 15 cycles -- too high.

                    Partially related: For some reason we add and extra cycle to our reads; 101 bases instead of 100. The quality values for base 101 are always substantially lower than for base 100. My suspicions is that is a bogus downgrading of the quality of that base.

                    --
                    Phillip

                    Comment


                    • #11
                      Originally posted by pmiguel View Post
                      So the question is whether runs prior to the v3 chemistry software change have quality values that are inaccurate during the 1st 15 cycles -- too high.

                      Partially related: For some reason we add and extra cycle to our reads; 101 bases instead of 100. The quality values for base 101 are always substantially lower than for base 100. My suspicions is that is a bogus downgrading of the quality of that base.

                      --
                      Phillip
                      Phillip,

                      According to our FAS Illumina determined that their earlier error model was in fact over estimating the quality of the base calls at the beginning of the read. This was determined by plotting expected vs. observed error rates. They have adjusted the error model so that the called Q-scores more closely match observed error rates. He did not explain why the error rate for the first 10-15 bases was higher than later bases. He also mentioned that their studies indicated they had previously been under estimating Q-scores for later cycles so those have been adjusted upward in the current error model.

                      Regarding the last cycle, phasing/prephasing numbers can not be included in the error calculation for the last cycle of a read since for any cycle n you need data from cycle n+1 to estimate (pre)phasing. This is the rationale for adding the extra cycle to run and trimming it from the final output. The Q-score is lower because there is incomplete data to fit to the error model, thus lower confidence in the result.

                      Comment


                      • #12
                        Ah, that makes sense.

                        As far as error rate vs. quality scores. The SAV (Illumina's "Sequence Analysis Viewer") allows some possibly informative plots. Like cycle vs "error rate" on a phiX lane. Seems like the error rate bottoms our around cycle 5 or so, just below 0.1%. To my mind that would be Q30 average quality. Whereas for that lane looks like the median Q value at that cycle is around 35.

                        Is that what you see?

                        --
                        Phillip

                        Comment


                        • #13
                          Originally posted by pmiguel View Post
                          Ah, that makes sense.

                          As far as error rate vs. quality scores. The SAV (Illumina's "Sequence Analysis Viewer") allows some possibly informative plots. Like cycle vs "error rate" on a phiX lane. Seems like the error rate bottoms our around cycle 5 or so, just below 0.1%. To my mind that would be Q30 average quality. Whereas for that lane looks like the median Q value at that cycle is around 35.

                          Is that what you see?

                          --
                          Phillip
                          Phillip,

                          I'm dealing with a small sample size, our HiSeq was just installed and it's still in the middle of read 2 of its setup run (2 x 101 cycles) with the PhiX flowcell. That said, I would concur with your assessment that the error rate bottoms out @ ~cycle 4-5 but I would eyeball it at ~0.05%. It stays about at this level up through cycle 25 and then slowly climbs. At cycle 101 the median error rate was ~0.8% overall, the lowest lane was ~0.6% and the highest ~1.2%.

                          Kevin

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          26 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          29 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          25 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          52 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X