In response to an off-hand comment by our Illumina FAS we decided to try 1.5 kb PE reads from libraries made using the TruSeq DNA Prep kit. I did not expect it to work, but it seemed worth a shot because paired-end libraries are much easier to make than mate-end (mate pair) libraries.
Instead of failing completely we got final results indicating that our inserts were ~500 bp shorter than we thought they would be.
We made 6 libraries of this sort, all with similar results. We fragmented the DNA using the 1.5 kb Covaris protocol. Here is the size distribution from an Agilent 7500 DNA chip:
Some short fragments there, so we used 0.5:1 AmPure:Sample volume clean-ups at any place in the protocol where AmPure was called for. In addition we did a double 0.5:1 AmPure clean-up prior to ligation. After library construction, but before enrichment PCR, our size distribution looked like this on a DNA High Sensitivity chip:
Story so far: started with DNA fragmented to a modal length of 1.7 kb. Added adapters and did several 0.5:1 AmPure clean ups and our modal length shifts to 2.25 kb. The adapters add 125 bp and, because they are forked, tend to appear to add more than that. Arguably a consistent result.
[Section added a few hours after initial post:
Here is the chip of the sample after 4 cycles of enrichment PCR:
]
Okay, then we fired up a MiSeq run at 1/2 normal density (4 pM instead of 8 pM) as calculated using a Kapa qPCR kit and adjusting for relative size of the reference library (phiX) and the large-insert libraries.
Cluster density is right where we expect it, around 400K clusters/mm2.
We do 2x150 + index run on the MiSeq. I grab the fastq file for the above sample and map it back to a previous ABySS assembly of a ~50 million base fungal genome from which the DNA derives. I get high mapping rates. However the size distribution, as determined using the TLEN (column 9) values of all >0 contain-records is this:
BWA also estimated the pair distances:
[infer_isize] (25, 50, 75) percentile: (883, 1045, 1199)
[infer_isize] low and high boundaries: 251 and 1831 for estimating avg and std
[infer_isize] inferred external isize from 82848 pairs: 1032.148 +/- 249.468
[infer_isize] skewness: -0.414; kurtosis: 0.376; ap_prior: 1.00e-05
Any ideas where we lost the longer inserts? There is the obvious:
(1) During enrichment PCR there would have been some bias to shorter fragments. But we only did 4 cycles and used long extensions.
(2) During clustering short amplicons may predominate over long amplicons. But to this extent?
(3) Less obvious, but Agilent chip size distributions are mass-based, not count based. So a molar adjustment would tend to shift the mode of the peaks we see to the left.
We did not do any gel cuts, nor any reverse AmPures.
Ideas?
--
Phillip
Instead of failing completely we got final results indicating that our inserts were ~500 bp shorter than we thought they would be.
We made 6 libraries of this sort, all with similar results. We fragmented the DNA using the 1.5 kb Covaris protocol. Here is the size distribution from an Agilent 7500 DNA chip:
Some short fragments there, so we used 0.5:1 AmPure:Sample volume clean-ups at any place in the protocol where AmPure was called for. In addition we did a double 0.5:1 AmPure clean-up prior to ligation. After library construction, but before enrichment PCR, our size distribution looked like this on a DNA High Sensitivity chip:
Story so far: started with DNA fragmented to a modal length of 1.7 kb. Added adapters and did several 0.5:1 AmPure clean ups and our modal length shifts to 2.25 kb. The adapters add 125 bp and, because they are forked, tend to appear to add more than that. Arguably a consistent result.
[Section added a few hours after initial post:
Here is the chip of the sample after 4 cycles of enrichment PCR:
]
Okay, then we fired up a MiSeq run at 1/2 normal density (4 pM instead of 8 pM) as calculated using a Kapa qPCR kit and adjusting for relative size of the reference library (phiX) and the large-insert libraries.
Cluster density is right where we expect it, around 400K clusters/mm2.
We do 2x150 + index run on the MiSeq. I grab the fastq file for the above sample and map it back to a previous ABySS assembly of a ~50 million base fungal genome from which the DNA derives. I get high mapping rates. However the size distribution, as determined using the TLEN (column 9) values of all >0 contain-records is this:
BWA also estimated the pair distances:
[infer_isize] (25, 50, 75) percentile: (883, 1045, 1199)
[infer_isize] low and high boundaries: 251 and 1831 for estimating avg and std
[infer_isize] inferred external isize from 82848 pairs: 1032.148 +/- 249.468
[infer_isize] skewness: -0.414; kurtosis: 0.376; ap_prior: 1.00e-05
Any ideas where we lost the longer inserts? There is the obvious:
(1) During enrichment PCR there would have been some bias to shorter fragments. But we only did 4 cycles and used long extensions.
(2) During clustering short amplicons may predominate over long amplicons. But to this extent?
(3) Less obvious, but Agilent chip size distributions are mass-based, not count based. So a molar adjustment would tend to shift the mode of the peaks we see to the left.
We did not do any gel cuts, nor any reverse AmPures.
Ideas?
--
Phillip
Comment