SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
NxSeq Long Mate Pair Technology for up to 20 kb mate pair libraries cknox Vendor Forum 2 08-12-2015 02:05 PM
BWA MEM mate pair incorrect insert sizes? hartmaier Bioinformatics 7 07-24-2014 11:38 AM
Maximum mate pair insert size? anar Sample Prep / Library Generation 2 06-14-2011 10:37 PM
bfast with illumina mate-pair and insert size estimation Protaeus Bioinformatics 3 01-19-2011 01:46 PM

Reply
 
Thread Tools
Old 06-06-2016, 08:09 AM   #1
kmkocot
Member
 
Location: Alabama

Join Date: Jun 2009
Posts: 48
Question Do I need 2+ mate pair libraries with different insert sizes?

Hi all,

I would like to sequence and de novo assemble a draft genome from a marine invertebrate with a roughly human-sized genome: 3.46 gigabases (as estimated by FIAD). I know nothing about repeat content, etc but I have a muscle tissue transcriptome.

My goal here is to generate a cost-effective draft assembly and gene models. I realize the assembly will not be spectacular with this approach but it will be used as preliminary data for a grant proposal that would involve additional approaches to improve the assembly. My tentative strategy is to do one whole run of a HiSeq X Ten: one lane with two different paired end libraries (one with partially overlapping reads and one with further apart reads as recommended by the allpaths-lg manual) and the other lane with one or two mate pair libraries. I will try assembly with allpaths-lg and meraculous.

My question is: would you expect a better assembly of these libraries if I have two or more different mate pair libraries (say one with a 3 kb insert and one with a 10 kb insert) or should I just go with one library with the longest insert size I can get? If the difference in genome assembly quality with 2+ different mate pair libraries would be negligible, I'm inclined to go with just one mate pair library and save money.

Any general advice on sequencing strategy for getting a the best draft genome I can for under USD $6,000 or so would be greatly appreciated.

Best,
Kevin
kmkocot is offline   Reply With Quote
Old 06-07-2016, 12:06 PM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

My general advice is to go with more PE reads -- the 'meat' of the assembly where it is nice to have a large coverage -- and fewer MP reads -- used to string the PE reads together with smaller coverage. Of course it is useful to have multiple insert MP libraries but given the choice of one then you should aim for around 3 KB inserts. Longer inserts will help out repeats but with you need intermediate results as well.

I once had a customer who (on an older machine) generated a single lane of PE data and then a lane of 3 KB MP, 6 KB MP and 10KB MP. Those MP libraries were basically worthless because we did not have a solid base of PE reads to build from. Eventually we did another two lanes of PE which helped the assembly but I suspect that if we had gone with 3 lanes of PE and three 1/3 lanes of MP in the first place we wouldn't have spent so much time trying to get something useful.

In your case I suggest 1 1/2 lanes of PE and 1/2 lane of MP.
westerman is offline   Reply With Quote
Old 06-07-2016, 12:24 PM   #3
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

It's also possible to use the Nextera Long Mate Pair protocol to get both short and long inserts from a single library. Is that a good idea? I'm not really sure; I've never assembled anything from a single Nextera LMP library. But it's worth considering if you don't have the budget for multiple libraries. IIRC, it produces around half long-mate and half short-insert, though the exact ratio varies.

The amount that scaffolding will help you really depends on the genome in question... but you may be able to answer it yourself very cheaply:

1) Take the closest relative with a genome assembly, and hope that the genome structure is similar.
2) Generate synthetic short reads from the genome, in a fragment library and an LMP library.
3) Assemble using either one or both libraries, and see how much improvement you got.
Brian Bushnell is offline   Reply With Quote
Old 06-13-2016, 02:09 AM   #4
Markiyan
Senior Member
 
Location: Cambridge

Join Date: Sep 2010
Posts: 115
Lightbulb Nextera matepair tends to have inserts from 2kb to 12kb.

Nextera matepair tends to have inserts in the 2kb to 12kb range.

So one library can be one size fits all...

If you add a bit more DNA (2x), it will become 3.5kb to 17kb (provided your ligation still works well).

In any case do one PCR-free library and one matepair one, and sequence them on 2x250 or 2x300 bp mode. (1-2 MiSeq runs).

It may be very tempting to go with hiseq 2x125bp, but 100X 2x125bp gives way worse assembly than a good 20x 2x250 or 2x300 bp with PCR-free library...
Markiyan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:42 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO