Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Titanium - high fraction of short reads

    Hi,

    does anyone experienced a high fraction of really short reads in a Titanium run ?

    We have sequenced genomic samples from two fish with the 454 Titanium (1/2 plate per fish) and about 30 - 40 % of all reads are shorter than 100 bps, in both cases. However the rest of the reads looks all really good with lengths of 500 bps and more ...

    I have attached the read profile. Has anyone seen such a bimodal distribution and has an idea what the problem could be ? Library prep ? Sequencing ?

    Thanks,

    Andreas
    Attached Files

  • #2
    I've got that one beat. Looking at the untrimmed reads for this run (extracted with sffinfo -s -n <sfffile>) shows that the raw read average ~650nt which is typical, though the distribution in this case is broader and flatter than usual. The reads are being aggressively trimmed back by either the Signal Intensity, Valley or Q Score filter(s) (can't tell which one(s)). This has occurred on the last couple of runs, though this one is the by far the worst. We have just opened a support issue with Roche.
    Attached Files

    Comment


    • #3
      We've seen these kinds of distributions only when sequencing array-based enrichment samples. We get one peak at about 50nt and another peak at 450bp. The height of the peaks varies: with a bad run the 50nt peak is larger than the 450bp peak. From Roche we heard that such a distribution is normal with array enrichment. We also suspected that the reads are actually longer and get trimmed by the Roche software. But after playing with the filter settings we did not see any real improvements.

      Comment


      • #4
        Titanium - high fraction of short reads

        @kmcarr:

        Thanks for your fasta answer and wow that looks really bad ...

        I checked the untrimmed reads and could confirm your answer (see attachment). We will do another run and if the results stay the same contact roche ...

        Andreas
        Attached Files

        Comment


        • #5
          Titanium - high fraction of short reads

          Hi

          I often see the same phenomenon in my Ti454 sequences: a peak of truncated reads ~50 bp and a lot of short reads <150 bp. It does not appear to be due to some obvious PCR artefact, s.a. primer dimers. The libraries appear OK, as a repeat run of the same sample often gives excellent results. I have attached two figures showing what I mean (note that our samples have very distinct sizes, normally 140-210 bp).

          It may be related to the Roche kits, it seems some batches give rise to more truncated sequences than other. Any other ideas are very helpful.

          andpet and kmcarr: If you get any useful hints from Roche, please post them here.
          Thanks
          Attached Files
          Last edited by sulfobus; 10-26-2009, 03:21 AM.

          Comment


          • #6
            We were having trouble with a high fraction of short reads. It is still an issue with some samples, but we seem to have less trouble by altering a couple of factors.

            (1) Scale the amount of adaptor added to the adaptor-ligation to the amount of incoming sheared DNA. The Roche protocol does not do this. Probably does not matter in cases where there is plenty of DNA, but otherwise I think there can be adaptor-adaptor ligations occurring.

            (2) After double SPRI clean-up, we look at the size distribution on a lab chip. If we see even a hint of a peak below 50 bases, we agarose gel isolate the correct size range.

            You might think you are fine with a 50 base peak being only 10% the area of your 500+ base peak, but think moles, not peak size. That tiny 50 base peak can have an equal number of library molecules in it to the big 500+ base peak because the lab chip signal is generated per base, not per molecule.

            This does seem to help, but I am still a little mystified by runs with a high fraction of short reads. Why? Because I would expect issues caused by short library molecules to be labeled as "short primer" in the failure metrics. But what we see most in the failure metrics is "short quality".

            --
            Phillip

            Comment


            • #7
              Titanium - high fraction of short reads

              Our truncated sequences starts off correctly, but then suddenly halts. It is no chimera with insertions of primers or adaptors, just truncated correct sequences. We elongate our DNA with the Ti454-adaptors using PCR and purify the library with gel extraction.

              Comment


              • #8
                Aside from the obvious possibilities that you have probably already considered (forgot to add a component to one of the run reagents, apyrase denatured due to it being held too long by someone with very warm hands, etc.) we had one bad run on the same night our lab temperature went very cool. Since then I've wondered if the GS-FLX relies on ambient temperatures being in a certain range. My guess is that the 454 guys keep their instruments in a very precisely controlled environment. That is just speculation, but if so, the instruments might not do as well outside of certain conditions.

                --
                Phillip

                Comment


                • #9
                  We did a second run with half a plate of the fish and it was even worse (looked more like kmcarrs example). But the other half of the plate was okay so now I guess it could have to do something with the library construction. We will sequence a third library ..

                  Another odd thing I observed was that reads that were trimmed too much contained a larger fraction of tandem repeats. I divided my data set in reads that are smaller than 200 bps and in reads larger than 200 bps and used tandem repeat finder on the untrimmed read sets. The set with the smaller reads contained 10x more tandem repeats. My thought is: Could 454 sequencing of short tandem repeats be more error-prone or difficult ?

                  @sulfobus: Okay, will do ..

                  @pmiguel: Well at least our Solexa is susceptible to temperature changes so I bet the same is true for the 454. However extreme temperatures are rather rare in Germany :-). Thanks for the other hints, I will discuss them with our technicians ..

                  Andreas

                  Comment


                  • #10
                    Originally posted by andpet View Post
                    We did a second run with half a plate of the fish and it was even worse (looked more like kmcarrs example). But the other half of the plate was okay so now I guess it could have to do something with the library construction. We will sequence a third library ..

                    Andreas
                    Could you run that library on a pico RNA labchip? Would be interesting to know if you have a peak below 50 bases...

                    Comment


                    • #11
                      more short reads with titanium amplicon seq

                      Hi,
                      I recently got a surprisingly high number of small reads from Titanium sequencing of some amplicons, and wondered if anyone else is still having the issue addressed in this thread. The double stranded library did not look skewed to small reads (attached), so I'm somewhat mystified why the read length distribution turned out as it did (attached Picture 1). The amplicons can have high secondary structure, so I thought perhaps longer amplicons were amplified during emPCR at lower efficiency. Alternatively it could have something to do with sharing a plate with some non amplicon based samples (separated by MID tags). In general what happens when you mix amplicon and non amplicon samples on a plate? I've heard this is not optimal but since I'm new to the whole business, I don't understand why.
                      Attached Files

                      Comment


                      • #12
                        So you were using the amplicon procedure? The trace you show (Agilent?) has a smooth size distribution like a fragment or cDNA library. For amplicons doesn't one expect fragments of discrete sizes?

                        For cDNAs, issues with short read lengths generally stem from polyA tracts in the library molecules. Even with "V" anchored, interrupted polyT primers being used for reverse transcription I see a preponderance of long polyA containing library molecule in some libraries. (We check by cloning them into pCR4TOPO and Sanger sequencing them.)

                        --
                        Phillip

                        Comment


                        • #13
                          Originally posted by pmiguel View Post
                          So you were using the amplicon procedure? The trace you show (Agilent?) has a smooth size distribution like a fragment or cDNA library. For amplicons doesn't one expect fragments of discrete sizes?

                          For cDNAs, issues with short read lengths generally stem from polyA tracts in the library molecules. Even with "V" anchored, interrupted polyT primers being used for reverse transcription I see a preponderance of long polyA containing library molecule in some libraries. (We check by cloning them into pCR4TOPO and Sanger sequencing them.)

                          --
                          Phillip
                          yes, usually you would expect discrete fragment sizes for amplicons, but this locus has a wide size range within a mixed population (expansion/contraction of the locus is one of the things we are looking at). This is not a cDNA library so polyA is not an issue.

                          Comment


                          • #14
                            Originally posted by greigite View Post
                            yes, usually you would expect discrete fragment sizes for amplicons, but this locus has a wide size range within a mixed population (expansion/contraction of the locus is one of the things we are looking at). This is not a cDNA library so polyA is not an issue.
                            So what is the library size range? Your plot is labeled in seconds, not in bp.

                            Phillip

                            Comment


                            • #15
                              Originally posted by pmiguel View Post
                              So what is the library size range? Your plot is labeled in seconds, not in bp.

                              Phillip
                              I don't have this info- got the plot from the person who did the library prep and they didn't change the bioanalyzer settings to output in bp- unfortunately.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X