Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Titanium kit - short reads

    Hi,

    we upgraded to 454 Titanium in january 2009 and the first runs seemed to be fine (although we see some small decline in average read length).

    Now we startet a new Titanium sequencing kit and all runs show massive amounts of short reads. The read length is almost equal distributed among the reads.

    I have attached a histogram of the read lengths:

    [IMG]ftp://genome.imb-jena.de/pub/andpet/titanium_read_length_histo.jpg[/IMG]


    Has anyone an idea about this or experienced the same before ?

    Thanks a lot,

    Andreas

  • #2
    Hmm,

    insert picture does not work with ftp ... why not ?

    Andreas

    Comment


    • #3
      Andreas, try attaching it to the post, rather than using the [IMG] tags. As far as I know one cannot in-line display images with FTP.

      Comment


      • #4
        Titanium kit - short reads

        Okay here is the attached sequence length histogram ...

        ANdreas
        Attached Files

        Comment


        • #5
          what type of sequencing was it? amplicon or shotgun?

          Comment


          • #6
            Shotgun sequencing. We tried to sequence three insects and all runs showed the same read profile.

            We always measure the size distribution of our DNA before sequencing (Agilent Chip) and all three cases showed good results (500 - 2000 bp).

            Andreas

            Comment


            • #7
              we had the same profile doing amplicon sequencing and the amplicons were selected to be 200-300 bp (it was before we had the titanium upgrade).

              After further analysing the results we discovered that the graph profile in fact was a combination of a peak around 60 and the normal peak around 200-300. The 60bp peak was caused by priming dimers and priming mismatches. After removing everything from the dataset that had 2 short priming matches, the profile looked ok.

              In a later run the pcr was optimised and there were 95% less priming mismatches etc and the profile was ok then.

              When i look at your graph, it looks like 95% the same to the graph we had... So i guess there must be something wrong with the emulsion PCR? Are there primer contaminations in the room or something else that would cause priming dimers to appear?

              Are you doing de novo sequencing or just a resequencing experiment to validate the techniques? It might be useful to map the ok reads on an existing genome and see if the 60bp peak has a different 'mapping behaviour'. We have discovered that the down stream analysis can greatly improve the upfront library preparation. It takes some time to analyse your data, or write some scripts to automate it. But future experiments can be QC-ed in 1h and you can always use the results to optimize the upfront work.

              Comment


              • #8
                This looks like a shape of poor Titanium run.

                For good Titanium shotgun runs, the peak length usually at 500bp, and average length is around 400bp. Short length regions may have a hump, but should not have a big peak. In other word, the length distribution should be mainly one peak around 500bp.

                Besides mapping method suggested above, I would say use output of quality filter metrics of 454 pipeline to make judgement on the run.

                Run metrics prints out lots of information, particuarlly useful is those metrics of raw reads, filtered reads, dots, mixed etc breakdown. Use sffinfo (offline package 454 software) to see whether the short reads was trimmed by quality filter. Just run" sffinfo sffile" you will get all filtered reads information with non-trimmed full length there.

                Another way to easy tell something is wrong or not is to check reads count. Titanium should generate 1 million plus per run filtered reads passing quality filter. If you don't get this number of reads per run, then lots of raw reads were rejected (and trimmed) by quality filter, meaning something is wrong on library, EMPCR or whatever upstream procedure.

                If the short reads were not result of trimming and rejecting by quality filter, then the story may change. The sample may have massive amount of short reads (primer dimers etc) and 454 machine would favor those short fragement and sequencing them in junk amount, in a high quality manner.

                Comment


                • #9
                  We are seeing some similar runs, intermittent.

                  Something else we tried to eliminate library prep issues was to do a filter template and in <qualityFilter> add a <doPrimerTrimming>false</doPrimerTrimming>


                  This will allow you to see the B-adaptor sequence if present. We would expect library issues to present B-adaptors on short sequences.

                  joa_ds what did you do for PCR optimization?

                  Comment


                  • #10
                    well, we did amplicon sequencing and had a pcr reaction prior to the emPCR, which of course amplified problems that were already there.

                    We added a purification step prior to the emPCR. I don't know the details, as I am not in the lab, but i know they are using some sort of chromatographic technique to remove short sequences 50-100bp long. This simple technique improved our mapping efficiency from around 60% to around 95%.

                    We add MID tags and detection of an MID is also something we use as a quality check to verify if sequences are ok. We expect very short sequences to have a reverse complement of the MID at the end of a sequence.

                    Another thing we do is map and check if the mapped sequence is just the primersequence, they appear ok at first sight, but something is wrong of course then. After using the chromatography, and doing some minor tweaking, the distribution looks like what it should be

                    Comment


                    • #11
                      Following hlu's suggestion (thanks did not realize the sff file had complete reads) I dug into a run that had a similar profile to andpet's

                      In this case the length is there it is just low quality, ie the untrimmed lengths have a mode right around 500.

                      Looking at some of the trimmed sequences they are trimmed for a reason, low quality scores are evident. The question still is though why do we get reads with low quality?

                      Anybody know how to dig out the signals for a particular read, I guess I have the pixel coordinates ...
                      Attached Files

                      Comment


                      • #12
                        Hi everybody,
                        we recently upgraded to the Titanium system. What we observed is that the breaking step of the Large Volume emPCR could be tricky.
                        Sometimes we found some oil above the beads pellet (usually more visible at the end of the procedure when the beads are transferred to a 1,5 mL tube).
                        In the Small Volume preparation (where the breaking is performed by using the filters), performed to titrate the library, we didn't find oil on the beads.
                        Both the beads from the Small and the Large preparation were sequenced and we obtained good results for the "Small" (expected number of reads, median length 450 bp) and very bad results for the "Large" ones (40-50% of the expected reads, median length 260 bp).
                        The output graph of the "oily sample" was very similar to that obtained by Andeas, while for the other beads it was as expected.
                        As the library used for both preparations was the same, we hypothesize the oil could hamper something during the enrichment steps or during the sequencing reaction.

                        Comment


                        • #13
                          Hi Bia,

                          Thanks for sharing your emPCR experience. We got problem of too low % filter passed reads -- about 25%. One question, were the % filter passed reads differ or similar between your results from Large vs. Small emPCR?

                          Comment


                          • #14
                            hlu,

                            what do you mean by primer dimers in the sequencing reaction? wouldnt these be removed by SPRI during LC?

                            Comment


                            • #15
                              Originally posted by boss_hoss View Post
                              hlu,

                              what do you mean by primer dimers in the sequencing reaction? wouldnt these be removed by SPRI during LC?

                              Sometimes, the content of reads were very short, 40bp, mostly formed from a pair of primer sequences.

                              This happened in the past in GS-FLX, and GS-20 times.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 08:47 AM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              59 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              54 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X