SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
short reads from amplicon for 454 using Titanium chemistry pseudorabies 454 Pyrosequencing 9 08-15-2011 08:46 AM
Titanium kit - short reads andpet 454 Pyrosequencing 14 06-19-2009 01:54 PM
Short reads in Titanium jpp 454 Pyrosequencing 1 03-24-2009 07:01 AM
Fraction of reads incorporated in Velvet assemblies foram Bioinformatics 3 02-04-2009 06:36 PM
High Short Reads % elly 454 Pyrosequencing 2 11-10-2008 01:10 PM

Reply
 
Thread Tools
Old 10-12-2009, 10:48 AM   #1
andpet
Member
 
Location: Germany

Join Date: Jul 2008
Posts: 26
Default Titanium - high fraction of short reads

Hi,

does anyone experienced a high fraction of really short reads in a Titanium run ?

We have sequenced genomic samples from two fish with the 454 Titanium (1/2 plate per fish) and about 30 - 40 % of all reads are shorter than 100 bps, in both cases. However the rest of the reads looks all really good with lengths of 500 bps and more ...

I have attached the read profile. Has anyone seen such a bimodal distribution and has an idea what the problem could be ? Library prep ? Sequencing ?

Thanks,

Andreas
Attached Images
File Type: jpg reads_length_TCAG_chart_region1.jpg (20.0 KB, 148 views)
andpet is offline   Reply With Quote
Old 10-12-2009, 12:46 PM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

I've got that one beat. Looking at the untrimmed reads for this run (extracted with sffinfo -s -n <sfffile>) shows that the raw read average ~650nt which is typical, though the distribution in this case is broader and flatter than usual. The reads are being aggressively trimmed back by either the Signal Intensity, Valley or Q Score filter(s) (can't tell which one(s)). This has occurred on the last couple of runs, though this one is the by far the worst. We have just opened a support issue with Roche.
Attached Images
File Type: jpg reads_length_TCAG_chart.jpg (13.5 KB, 98 views)
Attached Files
File Type: pdf ReadLengthDistrib2.pdf (473.9 KB, 97 views)
kmcarr is offline   Reply With Quote
Old 10-13-2009, 12:13 AM   #3
Tuxido
Member
 
Location: Nijmegen, Netherlands

Join Date: Jun 2009
Posts: 22
Default

We've seen these kinds of distributions only when sequencing array-based enrichment samples. We get one peak at about 50nt and another peak at 450bp. The height of the peaks varies: with a bad run the 50nt peak is larger than the 450bp peak. From Roche we heard that such a distribution is normal with array enrichment. We also suspected that the reads are actually longer and get trimmed by the Roche software. But after playing with the filter settings we did not see any real improvements.
Tuxido is offline   Reply With Quote
Old 10-13-2009, 01:59 AM   #4
andpet
Member
 
Location: Germany

Join Date: Jul 2008
Posts: 26
Default Titanium - high fraction of short reads

@kmcarr:

Thanks for your fasta answer and wow that looks really bad ...

I checked the untrimmed reads and could confirm your answer (see attachment). We will do another run and if the results stay the same contact roche ...

Andreas
Attached Images
File Type: jpg F3R43UH01_hist.jpg (9.1 KB, 118 views)
andpet is offline   Reply With Quote
Old 10-26-2009, 04:18 AM   #5
sulfobus
Member
 
Location: Denmark

Join Date: Oct 2009
Posts: 12
Question Titanium - high fraction of short reads

Hi

I often see the same phenomenon in my Ti454 sequences: a peak of truncated reads ~50 bp and a lot of short reads <150 bp. It does not appear to be due to some obvious PCR artefact, s.a. primer dimers. The libraries appear OK, as a repeat run of the same sample often gives excellent results. I have attached two figures showing what I mean (note that our samples have very distinct sizes, normally 140-210 bp).

It may be related to the Roche kits, it seems some batches give rise to more truncated sequences than other. Any other ideas are very helpful.

andpet and kmcarr: If you get any useful hints from Roche, please post them here.
Thanks
Attached Images
File Type: jpg badruns.jpg (10.3 KB, 80 views)
File Type: jpg repeatruns.jpg (8.8 KB, 69 views)

Last edited by sulfobus; 10-26-2009 at 04:21 AM.
sulfobus is offline   Reply With Quote
Old 10-26-2009, 02:03 PM   #6
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

We were having trouble with a high fraction of short reads. It is still an issue with some samples, but we seem to have less trouble by altering a couple of factors.

(1) Scale the amount of adaptor added to the adaptor-ligation to the amount of incoming sheared DNA. The Roche protocol does not do this. Probably does not matter in cases where there is plenty of DNA, but otherwise I think there can be adaptor-adaptor ligations occurring.

(2) After double SPRI clean-up, we look at the size distribution on a lab chip. If we see even a hint of a peak below 50 bases, we agarose gel isolate the correct size range.

You might think you are fine with a 50 base peak being only 10% the area of your 500+ base peak, but think moles, not peak size. That tiny 50 base peak can have an equal number of library molecules in it to the big 500+ base peak because the lab chip signal is generated per base, not per molecule.

This does seem to help, but I am still a little mystified by runs with a high fraction of short reads. Why? Because I would expect issues caused by short library molecules to be labeled as "short primer" in the failure metrics. But what we see most in the failure metrics is "short quality".

--
Phillip
pmiguel is offline   Reply With Quote
Old 10-27-2009, 12:58 AM   #7
sulfobus
Member
 
Location: Denmark

Join Date: Oct 2009
Posts: 12
Default Titanium - high fraction of short reads

Our truncated sequences starts off correctly, but then suddenly halts. It is no chimera with insertions of primers or adaptors, just truncated correct sequences. We elongate our DNA with the Ti454-adaptors using PCR and purify the library with gel extraction.
sulfobus is offline   Reply With Quote
Old 10-27-2009, 04:08 AM   #8
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Aside from the obvious possibilities that you have probably already considered (forgot to add a component to one of the run reagents, apyrase denatured due to it being held too long by someone with very warm hands, etc.) we had one bad run on the same night our lab temperature went very cool. Since then I've wondered if the GS-FLX relies on ambient temperatures being in a certain range. My guess is that the 454 guys keep their instruments in a very precisely controlled environment. That is just speculation, but if so, the instruments might not do as well outside of certain conditions.

--
Phillip
pmiguel is offline   Reply With Quote
Old 10-27-2009, 12:20 PM   #9
andpet
Member
 
Location: Germany

Join Date: Jul 2008
Posts: 26
Default

We did a second run with half a plate of the fish and it was even worse (looked more like kmcarrs example). But the other half of the plate was okay so now I guess it could have to do something with the library construction. We will sequence a third library ..

Another odd thing I observed was that reads that were trimmed too much contained a larger fraction of tandem repeats. I divided my data set in reads that are smaller than 200 bps and in reads larger than 200 bps and used tandem repeat finder on the untrimmed read sets. The set with the smaller reads contained 10x more tandem repeats. My thought is: Could 454 sequencing of short tandem repeats be more error-prone or difficult ?

@sulfobus: Okay, will do ..

@pmiguel: Well at least our Solexa is susceptible to temperature changes so I bet the same is true for the 454. However extreme temperatures are rather rare in Germany :-). Thanks for the other hints, I will discuss them with our technicians ..

Andreas
andpet is offline   Reply With Quote
Old 10-28-2009, 05:24 AM   #10
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by andpet View Post
We did a second run with half a plate of the fish and it was even worse (looked more like kmcarrs example). But the other half of the plate was okay so now I guess it could have to do something with the library construction. We will sequence a third library ..

Andreas
Could you run that library on a pico RNA labchip? Would be interesting to know if you have a peak below 50 bases...
pmiguel is offline   Reply With Quote
Old 01-20-2010, 10:11 AM   #11
greigite
Senior Member
 
Location: Cambridge, MA

Join Date: Mar 2009
Posts: 141
Default more short reads with titanium amplicon seq

Hi,
I recently got a surprisingly high number of small reads from Titanium sequencing of some amplicons, and wondered if anyone else is still having the issue addressed in this thread. The double stranded library did not look skewed to small reads (attached), so I'm somewhat mystified why the read length distribution turned out as it did (attached Picture 1). The amplicons can have high secondary structure, so I thought perhaps longer amplicons were amplified during emPCR at lower efficiency. Alternatively it could have something to do with sharing a plate with some non amplicon based samples (separated by MID tags). In general what happens when you mix amplicon and non amplicon samples on a plate? I've heard this is not optimal but since I'm new to the whole business, I don't understand why.
Attached Images
File Type: jpg Picture 1.jpg (8.6 KB, 73 views)
File Type: jpg cs4dslib.jpg (5.0 KB, 49 views)
greigite is offline   Reply With Quote
Old 01-20-2010, 11:15 AM   #12
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

So you were using the amplicon procedure? The trace you show (Agilent?) has a smooth size distribution like a fragment or cDNA library. For amplicons doesn't one expect fragments of discrete sizes?

For cDNAs, issues with short read lengths generally stem from polyA tracts in the library molecules. Even with "V" anchored, interrupted polyT primers being used for reverse transcription I see a preponderance of long polyA containing library molecule in some libraries. (We check by cloning them into pCR4TOPO and Sanger sequencing them.)

--
Phillip
pmiguel is offline   Reply With Quote
Old 01-20-2010, 11:39 AM   #13
greigite
Senior Member
 
Location: Cambridge, MA

Join Date: Mar 2009
Posts: 141
Default

Quote:
Originally Posted by pmiguel View Post
So you were using the amplicon procedure? The trace you show (Agilent?) has a smooth size distribution like a fragment or cDNA library. For amplicons doesn't one expect fragments of discrete sizes?

For cDNAs, issues with short read lengths generally stem from polyA tracts in the library molecules. Even with "V" anchored, interrupted polyT primers being used for reverse transcription I see a preponderance of long polyA containing library molecule in some libraries. (We check by cloning them into pCR4TOPO and Sanger sequencing them.)

--
Phillip
yes, usually you would expect discrete fragment sizes for amplicons, but this locus has a wide size range within a mixed population (expansion/contraction of the locus is one of the things we are looking at). This is not a cDNA library so polyA is not an issue.
greigite is offline   Reply With Quote
Old 01-20-2010, 11:47 AM   #14
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by greigite View Post
yes, usually you would expect discrete fragment sizes for amplicons, but this locus has a wide size range within a mixed population (expansion/contraction of the locus is one of the things we are looking at). This is not a cDNA library so polyA is not an issue.
So what is the library size range? Your plot is labeled in seconds, not in bp.

Phillip
pmiguel is offline   Reply With Quote
Old 01-20-2010, 12:29 PM   #15
greigite
Senior Member
 
Location: Cambridge, MA

Join Date: Mar 2009
Posts: 141
Default

Quote:
Originally Posted by pmiguel View Post
So what is the library size range? Your plot is labeled in seconds, not in bp.

Phillip
I don't have this info- got the plot from the person who did the library prep and they didn't change the bioanalyzer settings to output in bp- unfortunately.
greigite is offline   Reply With Quote
Old 01-20-2010, 12:41 PM   #16
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by greigite View Post
I don't have this info- got the plot from the person who did the library prep and they didn't change the bioanalyzer settings to output in bp- unfortunately.
Hmmm. Well, could they tell you what type of bioanalyzer chip they ran?

As it is, as far as we know the two plots more or less match one another...


--
Phillip
pmiguel is offline   Reply With Quote
Old 10-27-2010, 06:05 PM   #17
MEverett
Junior Member
 
Location: Seattle

Join Date: Jun 2010
Posts: 1
Default

Did you ever determine what the source of your sequencing problem was? We've just sequenced a cDNA library on the Titanium and have nearly the exact same read distribution and di-nucleotide repeat issues you describe. We are currently attempting to troubleshoot

Thanks
Meredith
MEverett is offline   Reply With Quote
Old 11-04-2010, 10:26 PM   #18
markc
Junior Member
 
Location: Brisbane, Australia

Join Date: Jul 2010
Posts: 1
Default

Hi all

Maybe a related issue, and I'd appreciate any suggestions. We're trying to sequence an amplicon library of a three-base repeat region (essentially, deep sequencing of a microsatellite marker from a population), and are also getting short average read lengths. Smaller numbers of repeats (15-20 copies) aren't too bad, but larger ones (in some cases, 40-50 copies or more) won't get through to the end of the repeat region - quality just drops off too much to call.

I was wondering if the problem might be polymerase slippage, either during the emPCR or the sequencing itself, and a colleague has suggested inhibition of emPCR (caused by localised imbalances in dNTP concentrations from only three being used) which might give poor quality amplification on the beads. Does anyone have any comments on either of these theories (especially how to solve them) or other ideas why repeats are a problem?

Cheers

Mark
markc is offline   Reply With Quote
Old 11-30-2011, 08:29 AM   #19
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by pmiguel View Post
For cDNAs, issues with short read lengths generally stem from polyA tracts in the library molecules.
I have the following from an email just sent me:

Quote:
I came across your post about 2 years ago regarding a high fraction of short reads. In your post, you said that "For cDNAs, issues with short read lengths generally stem from polyA tracts in the library molecules." Could you please elaborate a little more on this? My libraries are cDNAs and I have got a lot of short reads from multiple runs. I would really appreciate your suggestions.
My preference is to answer questions of this sort in the forum -- that way they may help others as well.

First, I should point out that things have changed since I wrote that. Roche released an official method for generating cDNA libraries for running on the GS-FLX, that uses random primers for cDNA synthesis, so this issue has greatly diminished.

The story is this: sequencers generally have an "Achilles heel" or "kryptonite", if you prefer -- some weakness to which they are particularly subject. The 454's weakness is homopolymers in general and poly T in the library molecule in particular.

The 454 is built to be fast and achieves this speed by not using reversible terminator chemistry to precisely control addition of each base. This gives you speed -- no need to deblock after scanning plus if you have 2 or 3 bases in a row you collect sequence from all of them at once.

But in this strength lies a weakness: longer homopolymers are difficult to distinguish among. (Is that 9 A's in a row or 8?) Further, in extreme cases, a long stretch of a single base will exhaust all the dNTP being flowed without reaching the end of that stretch of bases on every nascent strand on the bead. This is bad both because the signal produced will be so high it can bleed into adjacent wells and because next time that base cycles around you get bad "CAFIE" effects from all the strands that were not fully extended.

On top of that, because 454 relies on a chemical cascade to produce the ATP used by luciferase to generate signal -- natural dATP cannot be used. The analog used instead of dATP is not incorporated as well as dATP would be. Nevertheless, the conditions work well enough except in extreme cases.

Alas, one of those extremes is occasioned by the most common homopolymer in eukaryotic molecular biology: a poly A tail. cDNA production protocols frequently prime first strand synthesis from a dT oligomer. So after ligating this cDNA library to adapters about 1/2 of them may have that stretch of homopolymeric dT right next to the sequencing primer.

These factors combine to make a perfect storm that will sink a run. All your beads key pass -- then the next X bases incorporated are all "A". So you get the blinding burst of A, plus lots of incompletely extended strands of various lengths leading to non-synchronized signal in later cycles.

There are plenty of ways to work around this issue during library construction. But looking over the wreckage of your run, it is actually difficult to tell what has happened. No warning is generated by the Roche image processing pipe line that says "Your library sucks!" You just get poor results -- not very diagnostic, many factors can lead to poor results. If I suspect this is an issue I have to fire up the RunBrowser and page through the early cycles, and check to see that a burst of homopolymer is to blame.

Anyway, Roche did finally release a sanctioned cDNA library construction protocol. We don't really run into these issues using it because first strand synthesis is primed by random oligos, instead of oligo dT. So -- this is likely not your problem unless someone used a naive oligo dT primed 1st strand synthesis method to construct your library. Not that likely these days.

--
Phillip
pmiguel is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:43 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO