Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samples artificially grouped by i5 inline barcodes, in Illumina HiSeq 2500

    Hi,
    I am using single digest RAD-Seq for analysing the population structure of a protist species, using the dual barcoding by i7 index barcodes and i5 inline barcodes. First two libraries consisting of 9 samples were sequenced on Illumina MiSeq (2x150 and 2x250), and the samples were biogeographically separated as expected. However, in two subsequent runs of libraries (80 samples) sequenced on Illumina HiSeq 2500 (2x100) the samples were artificially grouped by i5 barcodes.

    I will be very happy for any help or hint, as I have no idea how this structure can be obtained. Of course, the indexes were removed prior analysing the data. The library preparation pipeline is the same for MiSeq and HiSeq libraries, with the sole difference in the number of samples pooled together. Concerning the sequencing, one difference I am aware of is in the number of cycles. MiSeq used three cycles (two read cycles, one index cycle), whereas HiSeq used four cycles (R1+R4 read cycles, R2 i7 index cycle, R3 i5 index cycle; note we do not have any i5 index barcode).

    Attached is a file showing our results in detail. Thank you in advance for any feedback.

    Pavel
    Attached Files

  • #2
    It could be result of demultiplexing HiSeq runs. i5 index read need to be masked during demultiplexing otherwise the i5 read sequences will be added to 5' end of read 2 (R4).

    Comment


    • #3
      Thank you for the feedback!
      In fact, I already wondered about the possibility that i5 read might influence the data (in particular, after reading this: https://sequencing.qcfail.com/articl...uddle-samples/).
      However, both runs were also analysed using the single reads only (R1 cycle), and the pattern persists.

      Comment


      • #4
        When I read your post I wondered if index hopping would cause the problem, and your attachment says there is an enormous level of hopping. So wouldn't samples that have a shared P1 inline barcode be swapping reads as well, leading to all the samples sharing the inline barcode becoming more similar? This can drive a genetic structure signal if some loci are polymorphic present/absent, with the index hopping reads completely filling for the missing locus. Remember, if an "absent" index combo has 500,000 reads, then the "good" combo with 1.5M reads is probably 1M actual sample reads and 500k swapped reads. You wondered that swapping should swap across all samples, but the classic swap comes from the wrong index primer binding to a template and labelling that fragment as the wrong sample. The inline bcs cannot be misprimed this way, so any sample that has a P1 inline bc will remain in that group, but can pick up the wrong P2 index. This isn't great proof, but look at your table--P1 inline bc 1 has a sample with 25M reads (P2-4) and has empty combos with 1 to >2M reads. Inline bc 3 has all poorly sequenced samples, and the empty combos are just 200-300k reads. The rest play out the same way.

        So the question to me is less how this clustering by inline barcode is happening...I think you answered that with your hopping analysis, but more why in the world is this hopping happening at such levels? Particularly since this is a HiSeq 2500...the index hopping came to attention with ExAmp Illumina sequencers that use a patterned flow cell. We see a few hundred reads in non-existent dual index combinations (nextRAD, not RAD-Seq) on a 2500, and thousands of reads on a 4000...but usually less than 1% of reads.

        You have some samples with very high read counts (20M instead of 1-5M). Do the non-existent barcode combos group with those samples tightly within an inline bc group? I assume the hopping reads will be mostly from these dominant samples.

        You say you checked that your P1 inline barcodes were not cross contaminated. But you'd get this if the P2s were cross contaminated. Did you check those? I guess the other possibility is that we first developed RAD-Seq we amplified pools of samples. How did you amplify your samples? In pools grouped by P1 inline bcs? Individually? How did you remove the P2 primers and when?
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment


        • #5
          Thank you so much for the comment!
          Now, the problem seems to be solved. You are absolutely right the P2 adapters swap, not the P1 ones. Accordingly, the problem can be explained by the fact that not all P2 adapters were removed prior PCR. Consequently, they may act as primers, producing PCR chimeras with wrong P1-P2 combinations. We have checked our protocols and found out we increased the concentration of P2 adapters for HiSeq sequencing, as we multiplexed 80 instead of 9 samples. However, this change might lead to the P2 residuals in the PCR despite our purification step. We changed our protocol to perform PCRs in pools grouped by P2 adapters, so the swap should be eliminated.
          Attached is the PDF illustrating how our swapping probably occurred.
          Attached Files

          Comment


          • #6
            Hey, great! I was wondering if this was resolved and what you said makes perfect sense and that's a good solution as well.
            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

            Comment


            • #7
              Hello, the problem was solved, and we already got nice results from sequencing two libraries, with no swaps at all!
              Thank you again,
              Pavel

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              33 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              48 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              34 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-11-2024, 07:45 AM
              0 responses
              46 views
              0 likes
              Last Post seqadmin  
              Working...
              X