Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Duplicated bases in 100 bp GA2 reads

    Hi All,

    I recently found an odd artifact in some 100 bp illumina GA2 reads we got from our sequencing provider. After some initial consternation, I realized that all the raw data contained duplicated bases at specific cycle numbers. More precisely, every sequence read in two of the samples that were run side-by-side had an insertion at the 37th and 74th positions that corresponded to the base at the 36th and 73 positions respectively. A third sample run at a later time had an insertion at the 51st position that was identical to the base at the 50th position for every single read. If I removed the 37th and 74th base for all the sequence reads in the first two datasets and the 51st base in the third datasets and then everything looked OK.

    Has anyone else experienced this type of artifact before? Any idea what could cause this sort of thing? I brought this to their attention and mentioned that the positions of the inserted bases bore a striking resemblance to the standard 36 bp and 50 bp read lengths, but they insisted their machines were working properly and that no one else had complained about the data. Thoughts? Thanks

  • #2
    Ah, the old "nobody else has experienced this" dodge. HORSE PUCKEY!!

    Contact your sequencing provider and ask them for details about these runs. Did they notice a significant drop in intensity across all channels and all lanes at the cycles you mentioned? Do your q-scores tank after this?

    We observed these two symptoms during a number of runs at one point. I later correlated duplicate base calls as you described to the anomalous cycles observed during the run.

    Here's what we (and be we I mean me and the Illumina engineers) believe was happening. Something prevented dye-terminator cleavage after an imaging cycle; this resulted in no incorporation of a new base so that at the next imaging cycle we were just detecting the bases from the previous cycle a second time. As the dyes had suffered photo bleaching from the first round of imaging the signals were consequently lower. It then appeared that the chemistry would return to normal for subsequent cycles. However due to the anomalous intensities in the one cycle (and we believe significant phasing introduced by the fact that this effect was not 100%) the Q-scores took a nose dive after this cycle. If this occurred early enough in the run we could extend the run with additional cycles, then I would re-run the pipeline off-line starting at a cycle after the bad one. This helped clean up the data somewhat.

    What finally resolved this issue for us was to have Illumina replace the VICI valve AND the controller board for this valve.

    I can't say for certain that this is what is going on with your data. You need to talk to your sequencing provider and have them discuss this with Illumina.

    Comment


    • #3
      @kmcarr- thanks for the detailed response. I took a closer look at the raw fastq data they gave me. For the sample with two separate insertions spaced 36 bp apart, the second insertion had a clear q-score drop. For the sample with one insertion around base 50 there was also clear q-score drop. So I think you're on to something.

      For the sake of argument, is this result consistent with originally setting the machine up to perform a 36 bp or 50 bp run, and then acquiring more data after the run ended and the operator remembered it was supposed to be a 100 bp run? Does the instrument perform dye-terminator cleavage after what is supposed to be the last imagine cycle?

      Thanks,
      wraithnot

      Originally posted by kmcarr View Post
      Ah, the old "nobody else has experienced this" dodge. HORSE PUCKEY!!

      Contact your sequencing provider and ask them for details about these runs. Did they notice a significant drop in intensity across all channels and all lanes at the cycles you mentioned? Do your q-scores tank after this?

      We observed these two symptoms during a number of runs at one point. I later correlated duplicate base calls as you described to the anomalous cycles observed during the run.

      Here's what we (and be we I mean me and the Illumina engineers) believe was happening. Something prevented dye-terminator cleavage after an imaging cycle; this resulted in no incorporation of a new base so that at the next imaging cycle we were just detecting the bases from the previous cycle a second time. As the dyes had suffered photo bleaching from the first round of imaging the signals were consequently lower. It then appeared that the chemistry would return to normal for subsequent cycles. However due to the anomalous intensities in the one cycle (and we believe significant phasing introduced by the fact that this effect was not 100%) the Q-scores took a nose dive after this cycle. If this occurred early enough in the run we could extend the run with additional cycles, then I would re-run the pipeline off-line starting at a cycle after the bad one. This helped clean up the data somewhat.

      What finally resolved this issue for us was to have Illumina replace the VICI valve AND the controller board for this valve.

      I can't say for certain that this is what is going on with your data. You need to talk to your sequencing provider and have them discuss this with Illumina.

      Comment


      • #4
        Once we had a run in which the percentage alignment was very low for all of the three lanes we ran. However, when I clipped from the 5' end to base 28 before aligning, the alignment went up to what would be expected.

        The reason ended-up being that when the image files were copied over to our server, folders for two cycles, 29 and 32, were not copied over in the case of just one lane. This resulted in a sequence that was 34 bases long. So, in this case there ended-up being a deletion, but it affected alignment for all lanes.

        When running the Illumina pipeline there was a warning printed on the screen, but it came on and went off the screen so quickly it was difficult to read.

        Comment


        • #5
          Originally posted by wraithnot View Post
          For the sake of argument, is this result consistent with originally setting the machine up to perform a 36 bp or 50 bp run, and then acquiring more data after the run ended and the operator remembered it was supposed to be a 100 bp run? Does the instrument perform dye-terminator cleavage after what is supposed to be the last imagine cycle?
          I am pretty certain that there is no cleavage after the last cycle. (Really, what would be the point, it would just be a waste of time and reagents.) So I suppose there is some plausibility to what you suggest but it would be a big hassle to do. You would need to aggregate all of the images into a single folder, renaming all of the files and folder from the added cycles to reconcile them with the unified cycle number scheme then run OLB from these merged image folders.

          It is possible to extend a recipe in progress, but RTA stops at the originally planned last cycle. The the SBS and imaging cycles continue normally to the new run lenght and you then analyze the run off line. Of course this only works if you can save all images so it's no longer an option with SCS 2.8.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          9 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          50 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X