Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Results with mate-pair preps

    Hello,

    I've been looking around for information on Illumina mate-pair library preps. Apparently paired-end contamination and chimeric reads are often produced.

    In practice, how frequent are these mishaps? Does the operator have much leverage on these issues, or are they pre-determined by the mechanics of the process?

    Thanks in advance,

    Daniel

  • #2
    Hi Daniel

    Yes, these are two of the problems with large insert illumina libraries. A third is the low diversity. These two problems have a similar source: the fragmentation of the circularized molecule is random. PE reads ought to be normally filtered during a washing step with only the ME reads (which have the biotin) kept. The wet lab can opt to be very stringent during the wash (or do more than one) or not. The higher the stringency, the lower the diversity of sequence.

    PE contamination can be quite significant, I'm looking at a 40-60 % dataset right now. Baylor tells me that usually it is around 20 %. Quantifying it is only possible if you have a reference (in which case you wouldn't be sequencing mate libraries!).

    A workaround is to assemble them as unpaired reads along with your other data. Use these contigs as a reference to align and determing which reads are PE. This will not capture everything (esp. if contigs are short/many repeats) but one hopes that the short PEs will be identified more easily than the longer ME.

    Chimaeric molecules are also produced often. You can detect the ones which are not chimaeric by the above alignment method. Then you can trim back a little and try again. Illumina's std recommendation is to trim outright to 35 bp (but the repeated alignment method is quite fast).

    I guess velvet (or any assembler) would benefit from treating ME libraries differently: first use them as unpaired reads, filter out the ones that are obviously PE or chimaeric. then use the rest in the scaffolding step...?

    thanks
    alexie

    Comment


    • #3
      Ouch. This sounds like a bit of a nightmare. How good is the scaffolding after all that effort?

      And do Illumina recommend a particular assembly package to make use of the data?

      Comment


      • #4
        Quite a few. With 2,000 USD you get ca 15 M seqs (more with High Seq). Even if you had only 10 % success (you don't it's higher than that) you get 1.5 M mates. Compare that with what people had to do with the human genome project...

        For applicability see the Panda genome paper by BGI which used 20kb ME libraries but had to throw away a large portion of the produced reads.

        p.s. should have mentioned that the larger the insert size, the worse the effects

        Comment


        • #5
          I guess I was wondering about the possibilities of misassemblies when doing true de novo sequencing. Could chimeras potentially contribute to misassemblies or are they always easily detected?

          Comment


          • #6
            Originally posted by nickloman View Post
            I guess I was wondering about the possibilities of misassemblies when doing true de novo sequencing. Could chimeras potentially contribute to misassemblies or are they always easily detected?
            Most programs I have read about only use the MP libraries for contig joining, so you wouldn't have issues at the sequence level within contigs if this is the case. These errors could probably screw up which contigs you determine are adjacent though.

            I am working on a short read simulator that models this type of error:

            Comment


            • #7
              How large is too large for the PE libraries?

              Comment


              • #8
                Originally posted by KeG View Post
                How large is too large for the PE libraries?
                From what i've been told, you start to get less usable read clusters from above 600 bases. Certainly 'long' mate pair lengths (>5Kbp) are well out of range.

                Comment


                • #9
                  HI
                  I think we have a similar problem, but just t make sure, what are you calling "contamination". Are you talking about inward mapping reads?
                  Eric

                  Comment


                  • #10
                    long insert mate pair is nightmare with all sort of problem you guys already discussed. Large amount of starting material, overnight gel separation on LMT agarose, quality check for adapter dimers, PCR cycles, etc are all important. Labor and care intensive procedure.

                    However, there is a good reason we need this as this improve scaffold significantly. Start with Abyss assembly and finish scaffolding with SSPACE to improve your assembly.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-27-2024, 06:37 PM
                    0 responses
                    13 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-27-2024, 06:07 PM
                    0 responses
                    11 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    69 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X