SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Sorting Paired-end files, and automating Mean Inner Distance between Mate Pairs prussiap RNA Sequencing 0 06-14-2012 10:35 AM
bimodal paired end width distribution illumina lg36 Illumina/Solexa 0 05-31-2012 10:59 AM
Overlapping paired end - tophat wenhuang Bioinformatics 25 04-29-2011 10:51 AM
Compute paired-end distance distribution? krobison Bioinformatics 11 11-12-2009 08:30 AM
illumina de novo txome assm: overlapping paired ends or long inserts? Zigster De novo discovery 1 10-15-2009 04:46 AM

Reply
 
Thread Tools
Old 08-18-2012, 07:02 PM   #1
Mouth_Breather
Junior Member
 
Location: Right Coast

Join Date: Jul 2011
Posts: 8
Default Wierd overlapping paired ends with bimodal paired end distance distance distribution

Hi Group!

I'm dealing with quite the strange sequencing run. We don't normally do PE, so we don't have a standard operating procedure to fall back on, and are clearly making mistakes.

We did 101 read size paired end using nextera chemistry. We expected the distance between the ENDS of the forward and reverse reads to be ~135, but my investigation is telling me that we wound up with ~65 bp distance from STARTS of the forward and reverse.

So, I guess the reads start about 65 bp from each other, say hello on the way by and in the process, sequence past each of their respective starts by 35 bp.

I guess this in effect, creates basically single reads of ~135 long.

Further strangeness: the distance distribution for pairs is bimodal, with the biggest peak being the aforementioned one with ~65 bp distance between the STARTS of the forward and reverse reads. The second one, is much smaller then the first, but still quite pronounced and has a peak centered at a much more reasonable distance where there's 100 bp between the ENDS of the forward and reverse reads (closer to where we expected which was 135bp between ENDS).

This latter peak reflects a distance that is more what we were shooting for, but when I set distances using this, we break ~90% of our pairs.

Does anybody have any perspective on what is happening here? I'd be quite appreciative of any ideas at all..

Thanks for reading!

Last edited by Mouth_Breather; 08-24-2012 at 11:11 AM. Reason: aCCURACY
Mouth_Breather is offline   Reply With Quote
Old 08-18-2012, 08:40 PM   #2
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Sounds like adapter trimming is working and you're getting a proper insert size distribution.

What did your library look like size-wise?
ECO is offline   Reply With Quote
Old 08-18-2012, 09:10 PM   #3
Mouth_Breather
Junior Member
 
Location: Right Coast

Join Date: Jul 2011
Posts: 8
Default

Quote:
Originally Posted by ECO View Post
Sounds like adapter trimming is working and you're getting a proper insert size distribution.

What did your library look like size-wise?

Hi ECO, thanks for the response - truly appreciated! I'm gathering info on library size from my teammates, but in the interim, could you please explain how the bimodal distribution I described shows that adapter trimming is working?

We are doing trimming both on quality and then transposon contamination. We have some wicked 3' contamination We also trim the 5' ends, and it does do some cutting, but to a much lesser extent the the 3'. Our average read length winds up being 60-70 bp or so.

Of course, I had to leave reads trimmed down to nothing to preserve the order of the reads for pairing - could that be a factor?

Each read is only 101 bp, so I'm not sure how trimming would change the paired end distance... are you suggesting that we are trimming so much from the 5 prime that it is making it appear in distance distribution statistics that we have overlapping reads?

I'm having a hard time wrapping my brain around the possibilities that would explain potential overlapping reads, or the appearance thereof, and the bimodal distributions of paired end distances...
Mouth_Breather is offline   Reply With Quote
Old 08-19-2012, 08:17 AM   #4
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,313
Default

First, not clear what you method is for determining the insert sizes. Depending on how you are doing it, your 135 bp sequences may be adapter/PCR dimers -- meaning your adapter clipping/trimming has nearly completely failed you.

Second, I would advise either more contempt for the Illumina Nextera protocol or, should you be so inclined, amazement that this "blind archer" Nextera protocol works at all. That is, sequencing a library without looking at its size distribution strikes me as, well hubris, at best. But doing such a check is optional if one follows the Nextera XT protocol.

The bi-modality, if it is real, and not just artifactual contamination of your data with primer/adapter dimers, would not be surprising. Epicentre has ostensibly tamed their wild "Nextera" transposable element machinery and bound it to their will. But at heart, transposable elements have their own agendas. Sometimes that will include some insertion preference biases. So don't be too surprised if the elements "ate all the icing" off the genome first then, with some reluctance, finished off the rest. The result would be more heavily chewed-up (smaller) segments of the genome and then another peak representing everything else.

Then, exacerbating the situation is PCR's proclivity for shorter amplicons over longer one.

Again, that Nextera ever works blind amazes me. My advice: do a size check before the single-stranded normalization step. If there is much of anything towards the smaller size ranges, get rid of it with a size selection. Either that, or optimize up front to avoid them.

--
Phillip
pmiguel is offline   Reply With Quote
Old 08-20-2012, 08:04 AM   #5
Mouth_Breather
Junior Member
 
Location: Right Coast

Join Date: Jul 2011
Posts: 8
Default

Quote:
Originally Posted by ECO View Post
Sounds like adapter trimming is working and you're getting a proper insert size distribution.

What did your library look like size-wise?
We use pippen prep for size selection, and according to that, it was 486.
Mouth_Breather is offline   Reply With Quote
Old 08-21-2012, 04:29 AM   #6
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,313
Default

As a first approximation the answer to any question involving strangeness in in Illumina library distributions is to invoke "double peak"/"bubble product"/"bird nesting". Do a google search on seqanswers.com with one or more of those key words, for lots of background.

Was a chip or gel run to show the size distribution prior to size selection? If so, please post it. My expectation is that it had 2 peaks, and your Pippen cut was into the higher molecular weight peak. Here is why:

Again, presuming your clipping software did function correctly, the amplicon size you would back-calculate would be 135 bp insert + 136 bp for both adapters = 273 bp.

But you cut out fragments that appeared to be 486 bp -- including some that were around that size. So how did the 273 bp ones get mixed in?

During PCR template strands may become numerous enough to anneal to one another before primers can anneal. If this happens, instead of primer extension and creation of a nascent strand resulting in the normal double-stranded product, you get a "bubble product". That is, two unrelated library strands annealed only at their adapter termini. Apparently this lack of double strandedness in the central half (in your case) of the molecule causes it to electrophorese as if it were nearly double its actual molecular weight. (Alternatively, maybe the products are "daisy-chaining" rather than, or in addition to forming bubble products.)

To avoid this, you have to back off on the number of cycles of enrichment PCR or add more primers to your PCR reaction. Or, presumably, you if you could size select on the denatured single strands somehow, that should work.

If I am correct in deducing what you have seen. But I would speculate that it is.

--
Phillip

Last edited by pmiguel; 08-21-2012 at 12:02 PM.
pmiguel is offline   Reply With Quote
Old 08-23-2012, 05:19 PM   #7
Mouth_Breather
Junior Member
 
Location: Right Coast

Join Date: Jul 2011
Posts: 8
Default

Quote:
Originally Posted by pmiguel View Post
As a first approximation the answer to any question involving strangeness in in Illumina library distributions is to invoke "double peak"/"bubble product"/"bird nesting". Do a google search on seqanswers.com with one or more of those key words, for lots of background.

Was a chip or gel run to show the size distribution prior to size selection? If so, please post it. My expectation is that it had 2 peaks, and your Pippen cut was into the higher molecular weight peak. Here is why:

Again, presuming your clipping software did function correctly, the amplicon size you would back-calculate would be 135 bp insert + 136 bp for both adapters = 273 bp.

But you cut out fragments that appeared to be 486 bp -- including some that were around that size. So how did the 273 bp ones get mixed in?

During PCR template strands may become numerous enough to anneal to one another before primers can anneal. If this happens, instead of primer extension and creation of a nascent strand resulting in the normal double-stranded product, you get a "bubble product". That is, two unrelated library strands annealed only at their adapter termini. Apparently this lack of double strandedness in the central half (in your case) of the molecule causes it to electrophorese as if it were nearly double its actual molecular weight. (Alternatively, maybe the products are "daisy-chaining" rather than, or in addition to forming bubble products.)

To avoid this, you have to back off on the number of cycles of enrichment PCR or add more primers to your PCR reaction. Or, presumably, you if you could size select on the denatured single strands somehow, that should work.

If I am correct in deducing what you have seen. But I would speculate that it is.

--
Phillip
Phillip,

I've attached some shots of sizes pre size selection. Lanes 6 and onward.

Obviously some bimodality there. Does this fit with your suspicions?
Mouth_Breather is offline   Reply With Quote
Old 08-23-2012, 05:25 PM   #8
Mouth_Breather
Junior Member
 
Location: Right Coast

Join Date: Jul 2011
Posts: 8
Default

Quote:
Originally Posted by Mouth_Breather View Post
Phillip,

I've attached some shots of sizes pre size selection. Lanes 6 and onward.

Obviously some bimodality there. Does this fit with your suspicions?
Not sure if image attached correctly, so doing so another way...Lanes of interest are lanes 6 and onward.

Note that this isn't the exact sample where where I saw the exact centers of each mode (~65 and ~300), but most of the samples look the same, as you can see from what is shown.


Last edited by Mouth_Breather; 08-23-2012 at 05:30 PM.
Mouth_Breather is offline   Reply With Quote
Old 08-23-2012, 06:20 PM   #9
Mouth_Breather
Junior Member
 
Location: Right Coast

Join Date: Jul 2011
Posts: 8
Default

Quote:
Originally Posted by Mouth_Breather View Post
Not sure if image attached correctly, so doing so another way...Lanes of interest are lanes 6 and onward.

Note that this isn't the exact sample where where I saw the exact centers of each mode (~65 and ~300), but most of the samples look the same, as you can see from what is shown.

and let me just add, thanks so much for your perspective - it's really gotten our gears turning over here, and sincerely appreciated.
Mouth_Breather is offline   Reply With Quote
Old 08-24-2012, 06:23 AM   #10
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,313
Default

Yes, but the lower peak would need to be around 250-300 for my "bubble-product" hypothesis to be confirmed as the cause of your 65 bp insert result.

But you can see how that would happen, if you were tasked with cutting out a band that was around 450-500 bp? You see two peaks, one is in the 250-300 bp -- that is no good, but there is one around twice that size. I'll just take the big one. But the "big" one only appears big, it really just comprises hetero-dimers of library molecules. Same linear size as the lower peak, just a 50% single-stranded (and higher "drag") duplex composition.

--
Phillip
pmiguel is offline   Reply With Quote
Old 08-24-2012, 08:01 AM   #11
Chris Boles
Junior Member
 
Location: Boston

Join Date: Aug 2009
Posts: 3
Default

With respect to the gel-sizing step used, the Pippin Prep, I'd like to add one extra point into the discussion.

The most popular Pippin Prep cassettes contain a high concentration of ethidium bromide. Binding of ethdium to dsDNA products slows electrophoretic mobility relative to dye free dsDNA (etbr is postively charged). The bound etbr concentration on bubble-products of library elements of the same strand length should be significantly lower, since etbr binds ssDNA less avidly.

So while we agree with Phillip that bubble-products and fully dsDNA with similar strand lengths will migrate very differently in dye-free gels (bubble-product slower in most cases), in ethidium-containing gels the mobility difference will be much smaller.

This may help explain the apparent discrepancy between insert size distribution and expected gel mobility of bubble-products pointed out by Phillip this morning (10:23 post).

Chris Boles, Sage Science.
Chris Boles is offline   Reply With Quote
Old 08-24-2012, 10:23 AM   #12
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,313
Default

Thanks for you post Chris.

I was of exactly the same opinion as you, however I was schooled by others in this forum. The posted a picture of an agarose gels run with ethidium bromide in the gel that showed the same double peak phenomenon we see on the bioanalyzer.

Of course "much smaller" difference in mobility is relative.

Also, the bioanalyzer is not dye-free, but I don't know what it is.

--
Phillip
pmiguel is offline   Reply With Quote
Old 08-24-2012, 11:06 AM   #13
Chris Boles
Junior Member
 
Location: Boston

Join Date: Aug 2009
Posts: 3
Default

Yes, "much smaller" is relative. We have heard anecdotal evidence from customers that bubble-products and fully ds amplicons of equal strand length comigrate in the Pippin etbr cassettes for lengths below 500 bp. The data were indirect - the customers were comparing insert size distributions for Pippin-sized libraries that showed single library peaks on Bioanalyzer, vs insert distributions for Pippin-sized libraries that showed double library peaks on Bioanalyzer. There wasn't much difference in insert size distribution. (I should caution that we haven't polled our customers on this issue, so others might have had a different finding.)

Regarding the difference in DNA mobility between etbr Pippin Prep cassettes and Bioanalyzer -- in the Agilent Bioanalyzer, there is an extremely sensitive detection system, and therefore they use a very low dye concentration. Under the low dye conditions of the BA, I think that the relative mobilities of ss and ds DNAs are similar to those observed in 1-2% dye-free agarose gels.

Chris Boles, Sage Science.
Chris Boles is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:19 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO