Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Paired End Merging

    We are going to undertake some sequencing where our insert size is less than the sum of our read-lengths (paired end overlap).
    We would like to merge our paired end reads to create single "psuedo-reads" with high quality all the way along.
    A quick Google has shown me there are lots of software available for this
    Introduction In very simple terms, current sequencing technology begins by breaking up long pieces of DNA into lots more short pieces of ...

    I've had a play with some dummy data using FLASH and PANDA-Seq and the results seem slightly different between them (different number of final read number and different distributions) despite using the same parameters.
    Has anyone done a comparison of these packages before, or have a particular feel for what they think is the best performer?

  • #2
    That blog is a great reference. Compared fastq-join, FLASH and read-linker from CD-HIT toolkit. Consistent results but fastq-join came out fastest testing with 5% and 20% error tolerance. Been using it since.

    Comment


    • #3
      Utility of merging PE reads?

      Is/are there practical/bioinformatic reasons why merging overlapping reads isn't/hasn't been performed more often? As a 'beginner' to NGS interested in applying it to WG shotgun metgenomics it would seem the obvious thing to do. I plan to use the EBI metgenomics-InterPro portal which analyzes non-assembled sequences but wouldn't it be useful to input combined overlapping PE 'virtual' reads?

      Comment


      • #4
        One practical reason is that the paired reads never come out with all bases the same quality so after trimming poor quality ends a overlapping 150x150 PE read can become non-overlapping (say 125x140). Therefore most tools are designed to accept the Fwd and Rev for a PE read separately along with the expected insert size. The insert size, you will discover, also varies. So keeping all these uncertainties in mind, its better to work with Fwds and Revs separately. If you can overlap some of the reads, more power to you.

        Comment


        • #5
          I use FLASH and it's spot on for >99% of sequences

          Comment


          • #6
            Originally posted by suryasaha View Post
            One practical reason is that the paired reads never come out with all bases the same quality so after trimming poor quality ends a overlapping 150x150 PE read can become non-overlapping (say 125x140). Therefore most tools are designed to accept the Fwd and Rev for a PE read separately along with the expected insert size. The insert size, you will discover, also varies. So keeping all these uncertainties in mind, its better to work with Fwds and Revs separately. If you can overlap some of the reads, more power to you.
            I thought this might be it - a combination of 'poor' sequence quality and the lack of an experimental approach able to isolate fragments of DNA within a tight size range. Isolation from agarose gel slices (whether via a manual approach or something like the Pippin system) and SPRI approaches appear popular - has anyone tried isolating from acrylamide gels rather than agarose as these have greater resolving power?

            Comment


            • #7
              Originally posted by Coltom View Post
              I thought this might be it - a combination of 'poor' sequence quality and the lack of an experimental approach able to isolate fragments of DNA within a tight size range. Isolation from agarose gel slices (whether via a manual approach or something like the Pippin system) and SPRI approaches appear popular - has anyone tried isolating from acrylamide gels rather than agarose as these have greater resolving power?
              For at least some papers, the agarose method produces remarkably tight library distributions, though when I looked at this it was when much shorter insert sizes (~120) were common; with the long reads now it makes sense to go to longer inserts, and perhaps the extreme precision isn't doable there.

              Anything that adds a step may be skipped by some. Also, there is always a risk of paired end merging getting things wrong when dealing with repeats in the same size range as the overlap.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              27 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X