Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • invu
    Junior Member
    • Apr 2020
    • 5

    Poor seq quality due to low diversity sample

    Hi,

    I have some sets of HiSeq data that I am analyzing and the sequencing quality turned out quite bad. I attach the "per base seq quality" diagram and the "per tile seq quality" diagram for one of those sets, generated using FastQC.

    I contacted the service provider, and they say it's due to my sample having low diversity especially at the beginning. (I also attached the seq content diagram.)
    Based on some searches and reading of Illumina tech notes, I see that the diversity at the first several bases is quite important for the system to "calibrate" correctly for quality base calls for later bases.
    My first question is, is this roughly a correct interpretation? And is there any way to "post-process" maybe the raw(er) data to correct/improve the seq reads?

    Second, what I still don't understand is why does it affect the per tile seq quality? How does the low diversity at initial bases have anything to do with the spatial variation on seq quality?

    What do you guys think?
    What should I argue when replying to my service provider? Should I ask for a re-run?

    Any note will be greatly appreciated!
    Thanks.
    Attached Files
    Last edited by invu; 04-22-2020, 11:06 AM.
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Yikes this is a really low diversity sample. Do you know how much phiX (if any) was added to this sample. Did you not tell the sequence provider that these were low diversity? If you did not then it would be hard to make a case for them to re-sequence this sample again for free. You may have to pay for a re-run with a significant % of phiX (10-20% or more), if you want to get improved Q-scores.

    It is possible that in spite of the bad Q-scores etc your sequence may still be usable. Have you looked at that?

    Comment

    • invu
      Junior Member
      • Apr 2020
      • 5

      #3
      Originally posted by GenoMax View Post
      Yikes this is a really low diversity sample. Do you know how much phiX (if any) was added to this sample. Did you not tell the sequence provider that these were low diversity? If you did not then it would be hard to make a case for them to re-sequence this sample again for free. You may have to pay for a re-run with a significant % of phiX (10-20% or more), if you want to get improved Q-scores.

      It is possible that in spite of the bad Q-scores etc your sequence may still be usable. Have you looked at that?
      Thanks for your reply, GenoMax!
      The sample is a custom set of sequences with well-defined regions (hence those low-diversity regions). I had declined PhiX spike-in to obtain as many valid read lines as possible w/o sacrificing any to PhiX. I hadn't told them about the diversity because I had no idea about this kind of issue before; that being said, my old results for samples similar to this (even though they did have a few degenerate bases at the beginning) didn't have this problem (at least weren't as bad as this). Will I really need PhiX if I get to repeat something like this? Which way will I lose more data -- 10-20% loss by PhiX or less well-defined loss by poor quality reads like this?

      I am looking at the data, and a big portion of the lines do seem valid and usable, but again, I'd need more lines to be ideal, and more importantly, even among those lines that apparently look okay, if more base call errors were caused by this issue, then that's a separate problem, which is quite hard to tell just from looking at those other lines.

      Do you happen to know if someone looks at the rawer data (e.g., imaging data? if they're preserved? sorry I'm not really familiar with the details of the seq machines..) whether they could correct or improve the base calls throughout the seq data even now? Or is everything done real time by the machine and there's nothing that can be done to improve this?
      Also, do you know if this issue caused by low diversity would also cause the tile-dependent quality loss as shown in my diagram? (This is something I am having hard time in understanding, and something I'm trying to argue about..)
      Last edited by invu; 04-22-2020, 02:33 PM.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        You really should have asked for phiX to be added. You should consider the fact that this run could have completely failed, if it was a bit overloaded, leaving you with no data. Raw image data is generally not stored now-a-days so there is not much you can do with it afterwards. If you need more data consider sequencing an additional lane rather than taking a chance like this.

        Comment

        • invu
          Junior Member
          • Apr 2020
          • 5

          #5
          Originally posted by GenoMax View Post
          You really should have asked for phiX to be added. You should consider the fact that this run could have completely failed, if it was a bit overloaded, leaving you with no data. Raw image data is generally not stored now-a-days so there is not much you can do with it afterwards. If you need more data consider sequencing an additional lane rather than taking a chance like this.
          Ha, I see. Lesson learned. Thanks for your help, GenoMax!

          Comment

          • ATϟGC
            Member
            • Jun 2013
            • 56

            #6
            If these are amplicon libraries and you want to minimize the amount of PhiX you can add "stagger" or "offset" nucleotides between the illumina sequencing primer region (like the nextera or truseq tail) and your locus-specific primer in order to create diversity of bases. These stagger nucleotides can also be added to restriction-digests adapters to increase base diversity.

            I always add staggers to my amplicon primers and sequence multiple amplicons per run to increase diversity but I still always add 5-12% Phix just to be sure.

            Comment

            • invu
              Junior Member
              • Apr 2020
              • 5

              #7
              Originally posted by ATϟGC View Post
              If these are amplicon libraries and you want to minimize the amount of PhiX you can add "stagger" or "offset" nucleotides between the illumina sequencing primer region (like the nextera or truseq tail) and your locus-specific primer in order to create diversity of bases. These stagger nucleotides can also be added to restriction-digests adapters to increase base diversity.

              I always add staggers to my amplicon primers and sequence multiple amplicons per run to increase diversity but I still always add 5-12% Phix just to be sure.
              Thanks, ATϟGC, that's a good suggestion.
              Looking back, the adapter-primers that I had used for my older runs when I didn't have this issue, did have some degenerate bases in between for different purposes and I think that was key in preventing this issue.

              Still adding a minimal portion of PhiX is a good suggestion, too.
              Thanks!!

              Comment

              • cement_head
                Senior Member
                • Mar 2012
                • 264

                #8
                Originally posted by invu View Post
                Thanks for your reply, GenoMax!
                The sample is a custom set of sequences with well-defined regions (hence those low-diversity regions). I had declined PhiX spike-in to obtain as many valid read lines as possible w/o sacrificing any to PhiX. I hadn't told them about the diversity because I had no idea about this kind of issue before; that being said, my old results for samples similar to this (even though they did have a few degenerate bases at the beginning) didn't have this problem (at least weren't as bad as this). Will I really need PhiX if I get to repeat something like this? Which way will I lose more data -- 10-20% loss by PhiX or less well-defined loss by poor quality reads like this?

                I am looking at the data, and a big portion of the lines do seem valid and usable, but again, I'd need more lines to be ideal, and more importantly, even among those lines that apparently look okay, if more base call errors were caused by this issue, then that's a separate problem, which is quite hard to tell just from looking at those other lines.

                Do you happen to know if someone looks at the rawer data (e.g., imaging data? if they're preserved? sorry I'm not really familiar with the details of the seq machines..) whether they could correct or improve the base calls throughout the seq data even now? Or is everything done real time by the machine and there's nothing that can be done to improve this?
                Also, do you know if this issue caused by low diversity would also cause the tile-dependent quality loss as shown in my diagram? (This is something I am having hard time in understanding, and something I'm trying to argue about..)
                I'd have to agree with GenoMax; super-important to have a consultation with the sequencing center about the library composition and ask them what they recommend. You probably should have had 10% PhiX spike-in added. HiSeq are terrible at dynamic calibration - MiSeqs are better (to a point).

                Comment

                • invu
                  Junior Member
                  • Apr 2020
                  • 5

                  #9
                  Originally posted by cement_head View Post
                  I'd have to agree with GenoMax; super-important to have a consultation with the sequencing center about the library composition and ask them what they recommend. You probably should have had 10% PhiX spike-in added. HiSeq are terrible at dynamic calibration - MiSeqs are better (to a point).
                  I see. Next time I will consider PhiX spike-in. Thanks, cement_head!

                  Comment

                  • ATϟGC
                    Member
                    • Jun 2013
                    • 56

                    #10
                    I agree that would be best to discuss these issues with your sequencing provider.

                    If you do choose to use staggered bases I recommend making an alignment to check for base diversity in the first 12-20 base pairs of read1. This alignment should be made with respect to the Illumina sequencing primer. For my amplicon libraries, this means I anchor it on the left by the Nextera Read1 sequences. You then only need to consider the base diversity of your staggered and/or unstaggered (I use a mix of both in my round 1 PCR reactions) primers or adapters. I do this in microsoft excel so that I can calculate and optimize base diversity of all the amplicons that will be pooled in my run.

                    Adding stagger bases has the potential to introduce biases in your libraries due to secondary structures or other priming phenomena. If you use the same mix of staggers for all samples the bias should be the same in theory.

                    I have only sequenced amplicons on Miseq and Novaseq and 5-12% PhiX has been enough for me with those platforms so I cannot comment on Hiseq.

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM
                    • SEQadmin2
                      Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                      by SEQadmin2


                      With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                      Introduction

                      Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                      05-22-2026, 06:42 AM
                    • SEQadmin2
                      Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                      by SEQadmin2

                      Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                      Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                      05-06-2026, 09:04 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, Yesterday, 08:59 AM
                    0 responses
                    13 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    21 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 11:40 AM
                    0 responses
                    18 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 05-28-2026, 11:40 AM
                    0 responses
                    31 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...