Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Do I need 2+ mate pair libraries with different insert sizes?

    Hi all,

    I would like to sequence and de novo assemble a draft genome from a marine invertebrate with a roughly human-sized genome: 3.46 gigabases (as estimated by FIAD). I know nothing about repeat content, etc but I have a muscle tissue transcriptome.

    My goal here is to generate a cost-effective draft assembly and gene models. I realize the assembly will not be spectacular with this approach but it will be used as preliminary data for a grant proposal that would involve additional approaches to improve the assembly. My tentative strategy is to do one whole run of a HiSeq X Ten: one lane with two different paired end libraries (one with partially overlapping reads and one with further apart reads as recommended by the allpaths-lg manual) and the other lane with one or two mate pair libraries. I will try assembly with allpaths-lg and meraculous.

    My question is: would you expect a better assembly of these libraries if I have two or more different mate pair libraries (say one with a 3 kb insert and one with a 10 kb insert) or should I just go with one library with the longest insert size I can get? If the difference in genome assembly quality with 2+ different mate pair libraries would be negligible, I'm inclined to go with just one mate pair library and save money.

    Any general advice on sequencing strategy for getting a the best draft genome I can for under USD $6,000 or so would be greatly appreciated.

    Best,
    Kevin

  • #2
    My general advice is to go with more PE reads -- the 'meat' of the assembly where it is nice to have a large coverage -- and fewer MP reads -- used to string the PE reads together with smaller coverage. Of course it is useful to have multiple insert MP libraries but given the choice of one then you should aim for around 3 KB inserts. Longer inserts will help out repeats but with you need intermediate results as well.

    I once had a customer who (on an older machine) generated a single lane of PE data and then a lane of 3 KB MP, 6 KB MP and 10KB MP. Those MP libraries were basically worthless because we did not have a solid base of PE reads to build from. Eventually we did another two lanes of PE which helped the assembly but I suspect that if we had gone with 3 lanes of PE and three 1/3 lanes of MP in the first place we wouldn't have spent so much time trying to get something useful.

    In your case I suggest 1 1/2 lanes of PE and 1/2 lane of MP.

    Comment


    • #3
      It's also possible to use the Nextera Long Mate Pair protocol to get both short and long inserts from a single library. Is that a good idea? I'm not really sure; I've never assembled anything from a single Nextera LMP library. But it's worth considering if you don't have the budget for multiple libraries. IIRC, it produces around half long-mate and half short-insert, though the exact ratio varies.

      The amount that scaffolding will help you really depends on the genome in question... but you may be able to answer it yourself very cheaply:

      1) Take the closest relative with a genome assembly, and hope that the genome structure is similar.
      2) Generate synthetic short reads from the genome, in a fragment library and an LMP library.
      3) Assemble using either one or both libraries, and see how much improvement you got.

      Comment


      • #4
        Nextera matepair tends to have inserts from 2kb to 12kb.

        Nextera matepair tends to have inserts in the 2kb to 12kb range.

        So one library can be one size fits all...

        If you add a bit more DNA (2x), it will become 3.5kb to 17kb (provided your ligation still works well).

        In any case do one PCR-free library and one matepair one, and sequence them on 2x250 or 2x300 bp mode. (1-2 MiSeq runs).

        It may be very tempting to go with hiseq 2x125bp, but 100X 2x125bp gives way worse assembly than a good 20x 2x250 or 2x300 bp with PCR-free library...

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        66 views
        0 likes
        Last Post seqadmin  
        Working...
        X