We recently prepped and ran our first (and thus far only) Illumina mate pair run. There were seven libraries prepped, all different isolates of E. coli genomic DNA. The Illumina mate pair kit was used and standard protocol followed. Both the initial (3 kbp) and second fragmentation were performed using a nebulizer. According to the lab personnel nothing unusual was noted during prep.
After sequencing an extreme bias for T at the 5' ends of both reads was observed (see the attached % Base Calls plot). These are NOT errors in sequencing, the full read align with high fidelity to the E. coli genome. The reads align in a fairly uniform manner around the whole genome but obviously are not random with respect to the positioning of the 5' end of the reads. Mapping shows the vast majority of sequenced fragments to be properly constructed mate pairs; they are separated by ~2,700 - 3,000 bp and point away from each other. All seven of the libraries look like this.
We have a hypothesis about cause but I'd like to ask the community if they have ever seen anything like this with an Illumina mate pair library.
Here is our hypothesis: After the large fragments are circularized there is an exonuclease step to degrade and remaining, linear DNA. The exonuclease is supplied with the mate pair prep kit so I don't know specifically which exo it is. After this digestion the sample is heated @70°C for 30' to kill the exo, followed by addition of EDTA to 20mM. This if followed by nebulization to fragment the circles, followed by streptavidin selection of the junction fragment and sequencing adapter ligation. We suspect that there was some very small amount of residual exo activity left after heat inactivation. The random ends created by the subsequent nebulization were acted upon by the exo, with its activity much lower when it encounters a thymidine (or di-thymidine) at the 5' ends of dsDNA. Does this sound reasonable (or at least plausible)? Can any of you good folks think of an alternative mechanism?
Thanks.
After sequencing an extreme bias for T at the 5' ends of both reads was observed (see the attached % Base Calls plot). These are NOT errors in sequencing, the full read align with high fidelity to the E. coli genome. The reads align in a fairly uniform manner around the whole genome but obviously are not random with respect to the positioning of the 5' end of the reads. Mapping shows the vast majority of sequenced fragments to be properly constructed mate pairs; they are separated by ~2,700 - 3,000 bp and point away from each other. All seven of the libraries look like this.
We have a hypothesis about cause but I'd like to ask the community if they have ever seen anything like this with an Illumina mate pair library.
Here is our hypothesis: After the large fragments are circularized there is an exonuclease step to degrade and remaining, linear DNA. The exonuclease is supplied with the mate pair prep kit so I don't know specifically which exo it is. After this digestion the sample is heated @70°C for 30' to kill the exo, followed by addition of EDTA to 20mM. This if followed by nebulization to fragment the circles, followed by streptavidin selection of the junction fragment and sequencing adapter ligation. We suspect that there was some very small amount of residual exo activity left after heat inactivation. The random ends created by the subsequent nebulization were acted upon by the exo, with its activity much lower when it encounters a thymidine (or di-thymidine) at the 5' ends of dsDNA. Does this sound reasonable (or at least plausible)? Can any of you good folks think of an alternative mechanism?
Thanks.
Comment