Hi all,
We finally got our SOLiD 5500 up and running and have run into a suite of problems we are currently trying to trouble-shoot. There are three specific issues which I will describe below, but I am wondering if anyone else has experienced anything similar and, if so, if they were able to fix/identify the problem. Keep in mind, all the protocols below will relate to RNA-seq protocols for which all quality controls were well within expectations outlined in SOLiD's protocol manuals. We ran 4 human libraries and 3 non-model fish libraries in this run and all subsequent analyses were done in sequence space (not colorspace).
Issue #1) Significant quality degradation across the read and a lot of primer sequence in final sequence.
Boxplots of quality values show rapid degradation of quality values across the read starting at 33 bp. We lose 15% of our reads due to quality trimming our data and discarding anything below a quality score of 10 and total read length of 30 bp.
Issue #2) We find LOTS of primer sequence in our final sequencing results.
After primer trimming, we lose another 45%-47% of the total reads, using the same length cut-off of 30bp minimum length. Of this, between 4%-10% of the total reads are identified as "primer only." Thus, we lose ~60-65% of our data to poor quality and primer trimming. None of the cDNA libraries have "primer" peaks (that one ~100bp) that exceed manual thresholds and most have very minor ones.
Issue #3) Lots of ribosomal sequences in final sequences.
Of the ~40% of the data that comes back as "usable", 15%-18% of the total sequence (~40% of what's left of the usable reads) has been identified as rRNA. This appears to be the case for both the human and fish samples. All samples were subject to Invitrogen RiboMinus protocols by different users using different kits and neither lab has experienced this kind of over-representation of rRNA in samples previously using similar methods, so while it could be a handling/technical issue, still seems odd. The final cDNA libraries for the human samples tend to indicate one large peak which might be indicative of rRNA contamination, but the fish samples look textbook and would not indicate the same relative amount of contamination even though that is what we find after sequencing.
A few other things to add: (1) we ran ERCC spike-ins (added before RiboMinus) and they look as expected with R-squared values around 0.95; (2) all libraries were cycled at 18 cycles during amplification as, from past experience, we have found it necessary to do so for visualization (using Bio-Rad Experion) and also to hit manual library size targets; (3) average read length after trimming was around 49bp with a standard deviation of 13.
Obviously, we weren't thrilled to only have about 20%-25% of our data usable at the end of everything. SOLiD has given us a few explanations/potential fixes for what they think might be the reasons behind these results (e.g. poor efficiency of RiboMinus kit and need to RiboMinus twice in the future - frustrating if that's the case; too much primer-dimer in final library - doesn't match particularly well wtih Experion or Bioanalyzer traces of library size distributions; primer-dimer disrupting sequencing and driving quality scores down - the lack of 75bp reads still disturbing). We think that it's also possible the Easy Bead prep did not work appropriately.
So we're curious, anyone else had one or more of these problems? Any thoughts or potential insight/wisdom to offer?
Thanks for your time,
Nate
We finally got our SOLiD 5500 up and running and have run into a suite of problems we are currently trying to trouble-shoot. There are three specific issues which I will describe below, but I am wondering if anyone else has experienced anything similar and, if so, if they were able to fix/identify the problem. Keep in mind, all the protocols below will relate to RNA-seq protocols for which all quality controls were well within expectations outlined in SOLiD's protocol manuals. We ran 4 human libraries and 3 non-model fish libraries in this run and all subsequent analyses were done in sequence space (not colorspace).
Issue #1) Significant quality degradation across the read and a lot of primer sequence in final sequence.
Boxplots of quality values show rapid degradation of quality values across the read starting at 33 bp. We lose 15% of our reads due to quality trimming our data and discarding anything below a quality score of 10 and total read length of 30 bp.
Issue #2) We find LOTS of primer sequence in our final sequencing results.
After primer trimming, we lose another 45%-47% of the total reads, using the same length cut-off of 30bp minimum length. Of this, between 4%-10% of the total reads are identified as "primer only." Thus, we lose ~60-65% of our data to poor quality and primer trimming. None of the cDNA libraries have "primer" peaks (that one ~100bp) that exceed manual thresholds and most have very minor ones.
Issue #3) Lots of ribosomal sequences in final sequences.
Of the ~40% of the data that comes back as "usable", 15%-18% of the total sequence (~40% of what's left of the usable reads) has been identified as rRNA. This appears to be the case for both the human and fish samples. All samples were subject to Invitrogen RiboMinus protocols by different users using different kits and neither lab has experienced this kind of over-representation of rRNA in samples previously using similar methods, so while it could be a handling/technical issue, still seems odd. The final cDNA libraries for the human samples tend to indicate one large peak which might be indicative of rRNA contamination, but the fish samples look textbook and would not indicate the same relative amount of contamination even though that is what we find after sequencing.
A few other things to add: (1) we ran ERCC spike-ins (added before RiboMinus) and they look as expected with R-squared values around 0.95; (2) all libraries were cycled at 18 cycles during amplification as, from past experience, we have found it necessary to do so for visualization (using Bio-Rad Experion) and also to hit manual library size targets; (3) average read length after trimming was around 49bp with a standard deviation of 13.
Obviously, we weren't thrilled to only have about 20%-25% of our data usable at the end of everything. SOLiD has given us a few explanations/potential fixes for what they think might be the reasons behind these results (e.g. poor efficiency of RiboMinus kit and need to RiboMinus twice in the future - frustrating if that's the case; too much primer-dimer in final library - doesn't match particularly well wtih Experion or Bioanalyzer traces of library size distributions; primer-dimer disrupting sequencing and driving quality scores down - the lack of 75bp reads still disturbing). We think that it's also possible the Easy Bead prep did not work appropriately.
So we're curious, anyone else had one or more of these problems? Any thoughts or potential insight/wisdom to offer?
Thanks for your time,
Nate
Comment