![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
short reads from amplicon for 454 using Titanium chemistry | pseudorabies | 454 Pyrosequencing | 9 | 08-15-2011 08:46 AM |
Titanium kit - short reads | andpet | 454 Pyrosequencing | 14 | 06-19-2009 01:54 PM |
Short reads in Titanium | jpp | 454 Pyrosequencing | 1 | 03-24-2009 07:01 AM |
Fraction of reads incorporated in Velvet assemblies | foram | Bioinformatics | 3 | 02-04-2009 06:36 PM |
High Short Reads % | elly | 454 Pyrosequencing | 2 | 11-10-2008 01:10 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Germany Join Date: Jul 2008
Posts: 26
|
![]()
Hi,
does anyone experienced a high fraction of really short reads in a Titanium run ? We have sequenced genomic samples from two fish with the 454 Titanium (1/2 plate per fish) and about 30 - 40 % of all reads are shorter than 100 bps, in both cases. However the rest of the reads looks all really good with lengths of 500 bps and more ... I have attached the read profile. Has anyone seen such a bimodal distribution and has an idea what the problem could be ? Library prep ? Sequencing ? Thanks, Andreas |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]()
I've got that one beat. Looking at the untrimmed reads for this run (extracted with sffinfo -s -n <sfffile>) shows that the raw read average ~650nt which is typical, though the distribution in this case is broader and flatter than usual. The reads are being aggressively trimmed back by either the Signal Intensity, Valley or Q Score filter(s) (can't tell which one(s)). This has occurred on the last couple of runs, though this one is the by far the worst. We have just opened a support issue with Roche.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Nijmegen, Netherlands Join Date: Jun 2009
Posts: 22
|
![]()
We've seen these kinds of distributions only when sequencing array-based enrichment samples. We get one peak at about 50nt and another peak at 450bp. The height of the peaks varies: with a bad run the 50nt peak is larger than the 450bp peak. From Roche we heard that such a distribution is normal with array enrichment. We also suspected that the reads are actually longer and get trimmed by the Roche software. But after playing with the filter settings we did not see any real improvements.
|
![]() |
![]() |
![]() |
#4 |
Member
Location: Germany Join Date: Jul 2008
Posts: 26
|
![]()
@kmcarr:
Thanks for your fasta answer and wow that looks really bad ... I checked the untrimmed reads and could confirm your answer (see attachment). We will do another run and if the results stay the same contact roche ... Andreas |
![]() |
![]() |
![]() |
#5 |
Member
Location: Denmark Join Date: Oct 2009
Posts: 12
|
![]()
Hi
I often see the same phenomenon in my Ti454 sequences: a peak of truncated reads ~50 bp and a lot of short reads <150 bp. It does not appear to be due to some obvious PCR artefact, s.a. primer dimers. The libraries appear OK, as a repeat run of the same sample often gives excellent results. I have attached two figures showing what I mean (note that our samples have very distinct sizes, normally 140-210 bp). It may be related to the Roche kits, it seems some batches give rise to more truncated sequences than other. Any other ideas are very helpful. andpet and kmcarr: If you get any useful hints from Roche, please post them here. Thanks Last edited by sulfobus; 10-26-2009 at 04:21 AM. |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: Purdue University, West Lafayette, Indiana Join Date: Aug 2008
Posts: 2,317
|
![]()
We were having trouble with a high fraction of short reads. It is still an issue with some samples, but we seem to have less trouble by altering a couple of factors.
(1) Scale the amount of adaptor added to the adaptor-ligation to the amount of incoming sheared DNA. The Roche protocol does not do this. Probably does not matter in cases where there is plenty of DNA, but otherwise I think there can be adaptor-adaptor ligations occurring. (2) After double SPRI clean-up, we look at the size distribution on a lab chip. If we see even a hint of a peak below 50 bases, we agarose gel isolate the correct size range. You might think you are fine with a 50 base peak being only 10% the area of your 500+ base peak, but think moles, not peak size. That tiny 50 base peak can have an equal number of library molecules in it to the big 500+ base peak because the lab chip signal is generated per base, not per molecule. This does seem to help, but I am still a little mystified by runs with a high fraction of short reads. Why? Because I would expect issues caused by short library molecules to be labeled as "short primer" in the failure metrics. But what we see most in the failure metrics is "short quality". -- Phillip |
![]() |
![]() |
![]() |
#7 |
Member
Location: Denmark Join Date: Oct 2009
Posts: 12
|
![]()
Our truncated sequences starts off correctly, but then suddenly halts. It is no chimera with insertions of primers or adaptors, just truncated correct sequences. We elongate our DNA with the Ti454-adaptors using PCR and purify the library with gel extraction.
|
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: Purdue University, West Lafayette, Indiana Join Date: Aug 2008
Posts: 2,317
|
![]()
Aside from the obvious possibilities that you have probably already considered (forgot to add a component to one of the run reagents, apyrase denatured due to it being held too long by someone with very warm hands, etc.) we had one bad run on the same night our lab temperature went very cool. Since then I've wondered if the GS-FLX relies on ambient temperatures being in a certain range. My guess is that the 454 guys keep their instruments in a very precisely controlled environment. That is just speculation, but if so, the instruments might not do as well outside of certain conditions.
-- Phillip |
![]() |
![]() |
![]() |
#9 |
Member
Location: Germany Join Date: Jul 2008
Posts: 26
|
![]()
We did a second run with half a plate of the fish and it was even worse (looked more like kmcarrs example). But the other half of the plate was okay so now I guess it could have to do something with the library construction. We will sequence a third library ..
Another odd thing I observed was that reads that were trimmed too much contained a larger fraction of tandem repeats. I divided my data set in reads that are smaller than 200 bps and in reads larger than 200 bps and used tandem repeat finder on the untrimmed read sets. The set with the smaller reads contained 10x more tandem repeats. My thought is: Could 454 sequencing of short tandem repeats be more error-prone or difficult ? @sulfobus: Okay, will do .. @pmiguel: Well at least our Solexa is susceptible to temperature changes so I bet the same is true for the 454. However extreme temperatures are rather rare in Germany :-). Thanks for the other hints, I will discuss them with our technicians .. Andreas |
![]() |
![]() |
![]() |
#10 | |
Senior Member
Location: Purdue University, West Lafayette, Indiana Join Date: Aug 2008
Posts: 2,317
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: Cambridge, MA Join Date: Mar 2009
Posts: 141
|
![]()
Hi,
I recently got a surprisingly high number of small reads from Titanium sequencing of some amplicons, and wondered if anyone else is still having the issue addressed in this thread. The double stranded library did not look skewed to small reads (attached), so I'm somewhat mystified why the read length distribution turned out as it did (attached Picture 1). The amplicons can have high secondary structure, so I thought perhaps longer amplicons were amplified during emPCR at lower efficiency. Alternatively it could have something to do with sharing a plate with some non amplicon based samples (separated by MID tags). In general what happens when you mix amplicon and non amplicon samples on a plate? I've heard this is not optimal but since I'm new to the whole business, I don't understand why. |
![]() |
![]() |
![]() |
#12 |
Senior Member
Location: Purdue University, West Lafayette, Indiana Join Date: Aug 2008
Posts: 2,317
|
![]()
So you were using the amplicon procedure? The trace you show (Agilent?) has a smooth size distribution like a fragment or cDNA library. For amplicons doesn't one expect fragments of discrete sizes?
For cDNAs, issues with short read lengths generally stem from polyA tracts in the library molecules. Even with "V" anchored, interrupted polyT primers being used for reverse transcription I see a preponderance of long polyA containing library molecule in some libraries. (We check by cloning them into pCR4TOPO and Sanger sequencing them.) -- Phillip |
![]() |
![]() |
![]() |
#13 | |
Senior Member
Location: Cambridge, MA Join Date: Mar 2009
Posts: 141
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#14 | |
Senior Member
Location: Purdue University, West Lafayette, Indiana Join Date: Aug 2008
Posts: 2,317
|
![]() Quote:
Phillip |
|
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: Cambridge, MA Join Date: Mar 2009
Posts: 141
|
![]() |
![]() |
![]() |
![]() |
#16 | |
Senior Member
Location: Purdue University, West Lafayette, Indiana Join Date: Aug 2008
Posts: 2,317
|
![]() Quote:
As it is, as far as we know the two plots more or less match one another... -- Phillip |
|
![]() |
![]() |
![]() |
#17 |
Junior Member
Location: Seattle Join Date: Jun 2010
Posts: 1
|
![]()
Did you ever determine what the source of your sequencing problem was? We've just sequenced a cDNA library on the Titanium and have nearly the exact same read distribution and di-nucleotide repeat issues you describe. We are currently attempting to troubleshoot
Thanks Meredith |
![]() |
![]() |
![]() |
#18 |
Junior Member
Location: Brisbane, Australia Join Date: Jul 2010
Posts: 1
|
![]()
Hi all
Maybe a related issue, and I'd appreciate any suggestions. We're trying to sequence an amplicon library of a three-base repeat region (essentially, deep sequencing of a microsatellite marker from a population), and are also getting short average read lengths. Smaller numbers of repeats (15-20 copies) aren't too bad, but larger ones (in some cases, 40-50 copies or more) won't get through to the end of the repeat region - quality just drops off too much to call. I was wondering if the problem might be polymerase slippage, either during the emPCR or the sequencing itself, and a colleague has suggested inhibition of emPCR (caused by localised imbalances in dNTP concentrations from only three being used) which might give poor quality amplification on the beads. Does anyone have any comments on either of these theories (especially how to solve them) or other ideas why repeats are a problem? Cheers Mark |
![]() |
![]() |
![]() |
#19 | ||
Senior Member
Location: Purdue University, West Lafayette, Indiana Join Date: Aug 2008
Posts: 2,317
|
![]() Quote:
Quote:
First, I should point out that things have changed since I wrote that. Roche released an official method for generating cDNA libraries for running on the GS-FLX, that uses random primers for cDNA synthesis, so this issue has greatly diminished. The story is this: sequencers generally have an "Achilles heel" or "kryptonite", if you prefer -- some weakness to which they are particularly subject. The 454's weakness is homopolymers in general and poly T in the library molecule in particular. The 454 is built to be fast and achieves this speed by not using reversible terminator chemistry to precisely control addition of each base. This gives you speed -- no need to deblock after scanning plus if you have 2 or 3 bases in a row you collect sequence from all of them at once. But in this strength lies a weakness: longer homopolymers are difficult to distinguish among. (Is that 9 A's in a row or 8?) Further, in extreme cases, a long stretch of a single base will exhaust all the dNTP being flowed without reaching the end of that stretch of bases on every nascent strand on the bead. This is bad both because the signal produced will be so high it can bleed into adjacent wells and because next time that base cycles around you get bad "CAFIE" effects from all the strands that were not fully extended. On top of that, because 454 relies on a chemical cascade to produce the ATP used by luciferase to generate signal -- natural dATP cannot be used. The analog used instead of dATP is not incorporated as well as dATP would be. Nevertheless, the conditions work well enough except in extreme cases. Alas, one of those extremes is occasioned by the most common homopolymer in eukaryotic molecular biology: a poly A tail. cDNA production protocols frequently prime first strand synthesis from a dT oligomer. So after ligating this cDNA library to adapters about 1/2 of them may have that stretch of homopolymeric dT right next to the sequencing primer. These factors combine to make a perfect storm that will sink a run. All your beads key pass -- then the next X bases incorporated are all "A". So you get the blinding burst of A, plus lots of incompletely extended strands of various lengths leading to non-synchronized signal in later cycles. There are plenty of ways to work around this issue during library construction. But looking over the wreckage of your run, it is actually difficult to tell what has happened. No warning is generated by the Roche image processing pipe line that says "Your library sucks!" You just get poor results -- not very diagnostic, many factors can lead to poor results. If I suspect this is an issue I have to fire up the RunBrowser and page through the early cycles, and check to see that a burst of homopolymer is to blame. Anyway, Roche did finally release a sanctioned cDNA library construction protocol. We don't really run into these issues using it because first strand synthesis is primed by random oligos, instead of oligo dT. So -- this is likely not your problem unless someone used a naive oligo dT primed 1st strand synthesis method to construct your library. Not that likely these days. -- Phillip |
||
![]() |
![]() |
![]() |
Thread Tools | |
|
|