SEQanswers (
-   Illumina/Solexa (
-   -   Do I need 2+ mate pair libraries with different insert sizes? (

kmkocot 06-06-2016 09:09 AM

Do I need 2+ mate pair libraries with different insert sizes?
Hi all,

I would like to sequence and de novo assemble a draft genome from a marine invertebrate with a roughly human-sized genome: 3.46 gigabases (as estimated by FIAD). I know nothing about repeat content, etc but I have a muscle tissue transcriptome.

My goal here is to generate a cost-effective draft assembly and gene models. I realize the assembly will not be spectacular with this approach but it will be used as preliminary data for a grant proposal that would involve additional approaches to improve the assembly. My tentative strategy is to do one whole run of a HiSeq X Ten: one lane with two different paired end libraries (one with partially overlapping reads and one with further apart reads as recommended by the allpaths-lg manual) and the other lane with one or two mate pair libraries. I will try assembly with allpaths-lg and meraculous.

My question is: would you expect a better assembly of these libraries if I have two or more different mate pair libraries (say one with a 3 kb insert and one with a 10 kb insert) or should I just go with one library with the longest insert size I can get? If the difference in genome assembly quality with 2+ different mate pair libraries would be negligible, I'm inclined to go with just one mate pair library and save money.

Any general advice on sequencing strategy for getting a the best draft genome I can for under USD $6,000 or so would be greatly appreciated.


westerman 06-07-2016 01:06 PM

My general advice is to go with more PE reads -- the 'meat' of the assembly where it is nice to have a large coverage -- and fewer MP reads -- used to string the PE reads together with smaller coverage. Of course it is useful to have multiple insert MP libraries but given the choice of one then you should aim for around 3 KB inserts. Longer inserts will help out repeats but with you need intermediate results as well.

I once had a customer who (on an older machine) generated a single lane of PE data and then a lane of 3 KB MP, 6 KB MP and 10KB MP. Those MP libraries were basically worthless because we did not have a solid base of PE reads to build from. Eventually we did another two lanes of PE which helped the assembly but I suspect that if we had gone with 3 lanes of PE and three 1/3 lanes of MP in the first place we wouldn't have spent so much time trying to get something useful.

In your case I suggest 1 1/2 lanes of PE and 1/2 lane of MP.

Brian Bushnell 06-07-2016 01:24 PM

It's also possible to use the Nextera Long Mate Pair protocol to get both short and long inserts from a single library. Is that a good idea? I'm not really sure; I've never assembled anything from a single Nextera LMP library. But it's worth considering if you don't have the budget for multiple libraries. IIRC, it produces around half long-mate and half short-insert, though the exact ratio varies.

The amount that scaffolding will help you really depends on the genome in question... but you may be able to answer it yourself very cheaply:

1) Take the closest relative with a genome assembly, and hope that the genome structure is similar.
2) Generate synthetic short reads from the genome, in a fragment library and an LMP library.
3) Assemble using either one or both libraries, and see how much improvement you got.

Markiyan 06-13-2016 03:09 AM

Nextera matepair tends to have inserts from 2kb to 12kb.
Nextera matepair tends to have inserts in the 2kb to 12kb range.

So one library can be one size fits all...

If you add a bit more DNA (2x), it will become 3.5kb to 17kb (provided your ligation still works well).

In any case do one PCR-free library and one matepair one, and sequence them on 2x250 or 2x300 bp mode. (1-2 MiSeq runs).

It may be very tempting to go with hiseq 2x125bp, but 100X 2x125bp gives way worse assembly than a good 20x 2x250 or 2x300 bp with PCR-free library...

All times are GMT -8. The time now is 10:31 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.