Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • casbon
    Junior Member
    • Sep 2011
    • 7

    Sequencing low complexity libraries: effects on data

    I am planning some experiments that involve sequencing products that have a standard adaptor sequence at the start.

    Now I know that the cluster identification occurs using bases 1-5 so I have thought about using a NNNNN after the sequencing primer. This should ensure that clusters are identified correctly.

    However, for bases 6..15 all clusters have the same base. This will produce a single colour per flow, and there will potentially be optical effects due to saturation. Now, I don't really care about these bases, I am only interested in the genomic bases after the adaptor. So my question is: will the later bases be sequenced OK given that the early bases may have these problems?

    Also, what will happen for the paired end read if that also has low complexity bases at the start? Since the cluster identification happens during the first read, the effect should be the same?
  • casbon
    Junior Member
    • Sep 2011
    • 7

    #2
    PS this thread was useful: http://seqanswers.com/forums/showthread.php?t=9150

    but that deals with deferring cluster identification till after the low complexity bases. I want to know the effect of low complexity bases after a successful cluster identification.

    Comment

    • fkrueger
      Senior Member
      • Sep 2009
      • 627

      #3
      Hi casbon,

      if all of your sequences have the same kind of adapter sequence at the start, can't you just avoid the whole low complexity issue by using a custom sequencing primer for that lane so that you start reading straight into the genomic sequence?

      From our experience low complexity after the initial bases is not that much of a problem, and is certainly not nearly as bad as having it right at the start. If the same base composition would in general be much of a problem, then the shuffling process would not work very well, either. It does work quite well, even though the qualties do generally not quite reach the standards of a normal run (this is most likely due to phasing/prephasing though).

      And yes paired-ends would only suffer slighlty from technical issues with basecalling, but not from any influence on cluster detection.

      Comment

      • casbon
        Junior Member
        • Sep 2011
        • 7

        #4
        Thanks, fkrueger.

        There are slight complications with dealing with a custom sequencing primer that I didn't disclose.

        In light of your comments, I think I might just try a lane and see how it turns out.

        Comment

        • fkrueger
          Senior Member
          • Sep 2009
          • 627

          #5
          In any case, if you could convince your sequencing provider to keep hold of the images of the run this might possibly help you if you want to reprocess the data, e.g. only including cycles 1-5 and 16-end for the basecalling procedure. Or bareback shuffling of the first 15 bp for that matter... Good luck!

          Comment

          • NextGenSeq
            Senior Member
            • Apr 2009
            • 482

            #6
            The HiSeq doesn't save any of the images so the above suggestion would only work on the GAIIX.

            Comment

            • huguesparri
              Member
              • May 2008
              • 97

              #7
              You can also try the following:
              - increase the amount of Phix you're spiking in your library prior to hybridization on the flow cell. For some really low complexity libraries, you can go up to 50% PhiX. This should be really usefull when sequencing libraries where all your fragments start with the same bases.
              - try to dilute your libraries a bit more than usual before you hybridize it on the flow cell (4 pM opposed to the usual 6 to 8 pM for example). You will end up with fewer sequences but you should avoid some of the identification problems.
              Both these methods were given to us by Illumina's techsupport. We have tried the second one so far with some success and we are going to try the first one soon.

              Comment

              • simonandrews
                Simon Andrews
                • May 2009
                • 870

                #8
                There are basically two problems with biased libraries. Firstly, a lack of diversity in the first few bases means that overlapping clusters aren't able to be separated so the region of measurement identified can span two clusters, leading to mixed signals when the sequences later diverge. Secondly the highly biased sequence composition messes up the signal intensity calibration so that the quality of called bases can suffer.

                The solution to the first problem is to either dilute your library to the point where very few overlapping clusters are found, or to do the cluster calling from a later set of clusters, either by specifying the clusters to use when setting up the run (with a limited range of options), or by saving images and using something like bareback to shuffle the order in which they're presented to the cluster calling program.

                The solution to the second problem is either to increase the diversity of your library through the introduction of more random sequences, or to use an external calibration, either a standard fixed one, or one derived from a different diverse lane on the same flowcell.

                Adding PhiX attempts to solve both of these problems in one step - reducing the effective concentration of the biased library, and introducing some added diversity. Alternatively you could just dilute your library more and use a control lane elsewhere on the flowcell. Either of these approaches will yield substantially less data than a deferred cluster calling but they're much better than doing a standard analysis on a biased high density library which can, in extreme cases, return no data at all.

                In your specific case, if you introduce random bases at the start so that the clusters are called correctly you may still find that all of your sequences end up being rejected due to the compositional bias later in the read. Actually the calls for your later bases will probably be OK, but one of the illumina filters looks for deteriorating quality and then flags all remaining bases with low quality scores, even if the quality later improves (the so called 'killer Bs'. You can turn this off using the undocumented parameter NO-EAMSS when processing which will preserve the original qualities. If you then trim your sequences to just your bases of interest then the qualities there should be OK.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM
                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                8 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                44 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                104 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                125 views
                0 reactions
                Last Post SEQadmin2  
                Working...