Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wierd overlapping paired ends with bimodal paired end distance distance distribution

    Hi Group!

    I'm dealing with quite the strange sequencing run. We don't normally do PE, so we don't have a standard operating procedure to fall back on, and are clearly making mistakes.

    We did 101 read size paired end using nextera chemistry. We expected the distance between the ENDS of the forward and reverse reads to be ~135, but my investigation is telling me that we wound up with ~65 bp distance from STARTS of the forward and reverse.

    So, I guess the reads start about 65 bp from each other, say hello on the way by and in the process, sequence past each of their respective starts by 35 bp.

    I guess this in effect, creates basically single reads of ~135 long.

    Further strangeness: the distance distribution for pairs is bimodal, with the biggest peak being the aforementioned one with ~65 bp distance between the STARTS of the forward and reverse reads. The second one, is much smaller then the first, but still quite pronounced and has a peak centered at a much more reasonable distance where there's 100 bp between the ENDS of the forward and reverse reads (closer to where we expected which was 135bp between ENDS).

    This latter peak reflects a distance that is more what we were shooting for, but when I set distances using this, we break ~90% of our pairs.

    Does anybody have any perspective on what is happening here? I'd be quite appreciative of any ideas at all..

    Thanks for reading!
    Last edited by Mouth_Breather; 08-24-2012, 11:11 AM. Reason: aCCURACY

  • #2
    Sounds like adapter trimming is working and you're getting a proper insert size distribution.

    What did your library look like size-wise?

    Comment


    • #3
      Originally posted by ECO View Post
      Sounds like adapter trimming is working and you're getting a proper insert size distribution.

      What did your library look like size-wise?

      Hi ECO, thanks for the response - truly appreciated! I'm gathering info on library size from my teammates, but in the interim, could you please explain how the bimodal distribution I described shows that adapter trimming is working?

      We are doing trimming both on quality and then transposon contamination. We have some wicked 3' contamination We also trim the 5' ends, and it does do some cutting, but to a much lesser extent the the 3'. Our average read length winds up being 60-70 bp or so.

      Of course, I had to leave reads trimmed down to nothing to preserve the order of the reads for pairing - could that be a factor?

      Each read is only 101 bp, so I'm not sure how trimming would change the paired end distance... are you suggesting that we are trimming so much from the 5 prime that it is making it appear in distance distribution statistics that we have overlapping reads?

      I'm having a hard time wrapping my brain around the possibilities that would explain potential overlapping reads, or the appearance thereof, and the bimodal distributions of paired end distances...

      Comment


      • #4
        First, not clear what you method is for determining the insert sizes. Depending on how you are doing it, your 135 bp sequences may be adapter/PCR dimers -- meaning your adapter clipping/trimming has nearly completely failed you.

        Second, I would advise either more contempt for the Illumina Nextera protocol or, should you be so inclined, amazement that this "blind archer" Nextera protocol works at all. That is, sequencing a library without looking at its size distribution strikes me as, well hubris, at best. But doing such a check is optional if one follows the Nextera XT protocol.

        The bi-modality, if it is real, and not just artifactual contamination of your data with primer/adapter dimers, would not be surprising. Epicentre has ostensibly tamed their wild "Nextera" transposable element machinery and bound it to their will. But at heart, transposable elements have their own agendas. Sometimes that will include some insertion preference biases. So don't be too surprised if the elements "ate all the icing" off the genome first then, with some reluctance, finished off the rest. The result would be more heavily chewed-up (smaller) segments of the genome and then another peak representing everything else.

        Then, exacerbating the situation is PCR's proclivity for shorter amplicons over longer one.

        Again, that Nextera ever works blind amazes me. My advice: do a size check before the single-stranded normalization step. If there is much of anything towards the smaller size ranges, get rid of it with a size selection. Either that, or optimize up front to avoid them.

        --
        Phillip

        Comment


        • #5
          Originally posted by ECO View Post
          Sounds like adapter trimming is working and you're getting a proper insert size distribution.

          What did your library look like size-wise?
          We use pippen prep for size selection, and according to that, it was 486.

          Comment


          • #6
            As a first approximation the answer to any question involving strangeness in in Illumina library distributions is to invoke "double peak"/"bubble product"/"bird nesting". Do a google search on seqanswers.com with one or more of those key words, for lots of background.

            Was a chip or gel run to show the size distribution prior to size selection? If so, please post it. My expectation is that it had 2 peaks, and your Pippen cut was into the higher molecular weight peak. Here is why:

            Again, presuming your clipping software did function correctly, the amplicon size you would back-calculate would be 135 bp insert + 136 bp for both adapters = 273 bp.

            But you cut out fragments that appeared to be 486 bp -- including some that were around that size. So how did the 273 bp ones get mixed in?

            During PCR template strands may become numerous enough to anneal to one another before primers can anneal. If this happens, instead of primer extension and creation of a nascent strand resulting in the normal double-stranded product, you get a "bubble product". That is, two unrelated library strands annealed only at their adapter termini. Apparently this lack of double strandedness in the central half (in your case) of the molecule causes it to electrophorese as if it were nearly double its actual molecular weight. (Alternatively, maybe the products are "daisy-chaining" rather than, or in addition to forming bubble products.)

            To avoid this, you have to back off on the number of cycles of enrichment PCR or add more primers to your PCR reaction. Or, presumably, you if you could size select on the denatured single strands somehow, that should work.

            If I am correct in deducing what you have seen. But I would speculate that it is.

            --
            Phillip
            Last edited by pmiguel; 08-21-2012, 12:02 PM.

            Comment


            • #7
              Originally posted by pmiguel View Post
              As a first approximation the answer to any question involving strangeness in in Illumina library distributions is to invoke "double peak"/"bubble product"/"bird nesting". Do a google search on seqanswers.com with one or more of those key words, for lots of background.

              Was a chip or gel run to show the size distribution prior to size selection? If so, please post it. My expectation is that it had 2 peaks, and your Pippen cut was into the higher molecular weight peak. Here is why:

              Again, presuming your clipping software did function correctly, the amplicon size you would back-calculate would be 135 bp insert + 136 bp for both adapters = 273 bp.

              But you cut out fragments that appeared to be 486 bp -- including some that were around that size. So how did the 273 bp ones get mixed in?

              During PCR template strands may become numerous enough to anneal to one another before primers can anneal. If this happens, instead of primer extension and creation of a nascent strand resulting in the normal double-stranded product, you get a "bubble product". That is, two unrelated library strands annealed only at their adapter termini. Apparently this lack of double strandedness in the central half (in your case) of the molecule causes it to electrophorese as if it were nearly double its actual molecular weight. (Alternatively, maybe the products are "daisy-chaining" rather than, or in addition to forming bubble products.)

              To avoid this, you have to back off on the number of cycles of enrichment PCR or add more primers to your PCR reaction. Or, presumably, you if you could size select on the denatured single strands somehow, that should work.

              If I am correct in deducing what you have seen. But I would speculate that it is.

              --
              Phillip
              Phillip,

              I've attached some shots of sizes pre size selection. Lanes 6 and onward.

              Obviously some bimodality there. Does this fit with your suspicions?

              Comment


              • #8
                Originally posted by Mouth_Breather View Post
                Phillip,

                I've attached some shots of sizes pre size selection. Lanes 6 and onward.

                Obviously some bimodality there. Does this fit with your suspicions?
                Not sure if image attached correctly, so doing so another way...Lanes of interest are lanes 6 and onward.

                Note that this isn't the exact sample where where I saw the exact centers of each mode (~65 and ~300), but most of the samples look the same, as you can see from what is shown.

                Last edited by Mouth_Breather; 08-23-2012, 05:30 PM.

                Comment


                • #9
                  Originally posted by Mouth_Breather View Post
                  Not sure if image attached correctly, so doing so another way...Lanes of interest are lanes 6 and onward.

                  Note that this isn't the exact sample where where I saw the exact centers of each mode (~65 and ~300), but most of the samples look the same, as you can see from what is shown.

                  and let me just add, thanks so much for your perspective - it's really gotten our gears turning over here, and sincerely appreciated.

                  Comment


                  • #10
                    Yes, but the lower peak would need to be around 250-300 for my "bubble-product" hypothesis to be confirmed as the cause of your 65 bp insert result.

                    But you can see how that would happen, if you were tasked with cutting out a band that was around 450-500 bp? You see two peaks, one is in the 250-300 bp -- that is no good, but there is one around twice that size. I'll just take the big one. But the "big" one only appears big, it really just comprises hetero-dimers of library molecules. Same linear size as the lower peak, just a 50% single-stranded (and higher "drag") duplex composition.

                    --
                    Phillip

                    Comment


                    • #11
                      With respect to the gel-sizing step used, the Pippin Prep, I'd like to add one extra point into the discussion.

                      The most popular Pippin Prep cassettes contain a high concentration of ethidium bromide. Binding of ethdium to dsDNA products slows electrophoretic mobility relative to dye free dsDNA (etbr is postively charged). The bound etbr concentration on bubble-products of library elements of the same strand length should be significantly lower, since etbr binds ssDNA less avidly.

                      So while we agree with Phillip that bubble-products and fully dsDNA with similar strand lengths will migrate very differently in dye-free gels (bubble-product slower in most cases), in ethidium-containing gels the mobility difference will be much smaller.

                      This may help explain the apparent discrepancy between insert size distribution and expected gel mobility of bubble-products pointed out by Phillip this morning (10:23 post).

                      Chris Boles, Sage Science.

                      Comment


                      • #12
                        Thanks for you post Chris.

                        I was of exactly the same opinion as you, however I was schooled by others in this forum. The posted a picture of an agarose gels run with ethidium bromide in the gel that showed the same double peak phenomenon we see on the bioanalyzer.

                        Of course "much smaller" difference in mobility is relative.

                        Also, the bioanalyzer is not dye-free, but I don't know what it is.

                        --
                        Phillip

                        Comment


                        • #13
                          Yes, "much smaller" is relative. We have heard anecdotal evidence from customers that bubble-products and fully ds amplicons of equal strand length comigrate in the Pippin etbr cassettes for lengths below 500 bp. The data were indirect - the customers were comparing insert size distributions for Pippin-sized libraries that showed single library peaks on Bioanalyzer, vs insert distributions for Pippin-sized libraries that showed double library peaks on Bioanalyzer. There wasn't much difference in insert size distribution. (I should caution that we haven't polled our customers on this issue, so others might have had a different finding.)

                          Regarding the difference in DNA mobility between etbr Pippin Prep cassettes and Bioanalyzer -- in the Agilent Bioanalyzer, there is an extremely sensitive detection system, and therefore they use a very low dye concentration. Under the low dye conditions of the BA, I think that the relative mobilities of ss and ds DNAs are similar to those observed in 1-2% dye-free agarose gels.

                          Chris Boles, Sage Science.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          11 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          10 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          51 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          67 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X