SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Compare de-novo transcriptome assembly to genome reference guided assembly IdoBar Bioinformatics 1 04-04-2014 12:28 AM
Inquiry: minimum length of reads for referece-based assembly or de novo assembly sunfuhui Bioinformatics 1 10-04-2013 09:28 AM
Primer design on assembly lorendarith General 2 10-02-2012 05:15 AM

Reply
 
Thread Tools
Old 01-24-2016, 11:25 PM   #1
lamz138138
Member
 
Location: beijing

Join Date: Mar 2015
Posts: 10
Default library design in de novo assembly, thanks!

Hi, everyone!

I have a question of designing library size when performing de novo assembly. Usually, there are short fragment librarys (250bp-800bp), and long fragment librarys (2K-40K), and it is aboult 3:1 for coverage of 250bp to 500bp, and 2:1 for coverage of 2K to 5K. Why there are such ratio, can I just sequence the same coverage for each library or any other suggestion?

Thanks in advance!

Best wishes!

XM Zhong
lamz138138 is offline   Reply With Quote
Old 01-25-2016, 07:52 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

This is for Illumina, of course. I am not sure what you mean by the 3:1 and 2:1 ratios.

The short fragment libraries are most commonly used to create contigs. However short fragments have problems dealing with repeats and duplicated regions of the genome thus even if you have high coverage the contigs will be truncated. The long fragment libraries are then used to deal with these problem areas via stringing the contigs together into scaffolds.

You can sequence both libraries at the same coverage however since less information needed for the creation of scaffolds as opposed to the contigs then it is usually better to put more effort into the short fragment libraries. We usually recommend about a 2:1 ratio. In other words if it takes 1 HiSeq lane to come up with the number of bases needed for a given species' short fragments (approximately a 1 GB organism) then we would do 1/2 of a HiSeq lane for the long fragments. Multiple long fragment libraries can be useful.
westerman is offline   Reply With Quote
Old 01-25-2016, 09:20 AM   #3
lamz138138
Member
 
Location: beijing

Join Date: Mar 2015
Posts: 10
Default

Hi, westerman, thank you for your reply very much!

Take paper titled “Whole-genome sequencing of the snub-nosed monkey provides insights into folivory and evolutionary history" as example, the sequencing coverage of 180bp, 500bp, 2K and 5K were 57.3, 22.9, 19.5 and 10.7 respectively. So I got the ratio of 3:1 and 2:1, which could be 6:2:2:1 too.

I felt the other papers also have these ratio too, so I wander to know whether this is optimal ratio to perform de novo assembly or because they can only got that coverage at the moment? In other word, if I want to perform de novo assembly with 200bp, 500bp, 2K and 5K library, considering one paper had suggested 45X is suitable, I should get all these library with 45X data or with the 45X, 16X, 16X, 8X respectively? With your suggestion above, I think it would be 45X for 200bp and 500bp library, and 22X for 2K and 5K library, am I right? Or could you give me other suggestions?

Thanks for any suggestion!

Best wishes!

XM Zhong
lamz138138 is offline   Reply With Quote
Old 01-25-2016, 10:48 AM   #4
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

The answer is both "because of coverage at the moment" and "because someone came up with the ratios they used". I do not think that there is a universally accepted ratio between short libraries and long libraries. We do wish to have more short libraries than long ones but after that statement it becomes more of a "I like this ratio" statement than anything else.

For technical reasons it is harder to construct libraries the longer they become plus the size error becomes worse therefore saying "we need more 2K long library sequences than 20K sequences" is reasonable. Those 20K sequences may be mostly worthless anyway. Also be aware that upon filtering and QC long (mate-pair) libraries will lose many more sequences/bases than short (paired-end) libraries thus you need to order correspondingly more lanes than expected. Thus my general 2:1 ratio.

Looking at the paper they did 41 lanes of sequencing (HiSeq2000). Not surprisingly they do not mention how many lanes they allocated to each library so we can not tell if the drop off in the number of reads from the 2K, 5K, 10K, and 20K libraries is due to sequencing loss or due to some pre-set ratio of lanes ordered. Probably both. May be affected by budget constraints and/or budget re-allocations mid-stream within the project.

My suggestion is to order enough lanes to do 50x coverage for the short libraries (the contigs) and enough lanes to (at least in theory) to 25x coverage for the long libraries and then just be satisfied with what you get from the long libraries (which will be less than 25x). The mate-pair libraries will tend to lose about 25% of their reads so the final ratio will be closer to 50:18 or 2.5:1

But it does depend on your budget. The 41 lanes that they ordered would be somewhere on the order of USD $100,000. Most of the plant and animal projects I work with have a much smaller budget and thus skimp on the number of lanes ordered. Generally I am lucky to have 2 lanes of paired-end plus 1 lane of a single mate-pair library. It is possible to get by with fewer mate-pair library reads. Having multiple mate-pair libraries is wonderful but not at the cost of going less than 10x coverage per library. When in doubt order more of the short library.

Bottom line. The number of lanes to order is not an exact science and, to large extent, depends on your budget. If I was able to order up 41 lanes for a project ... gee ... I'd go hog wild in my ratios.
westerman is offline   Reply With Quote
Old 01-26-2016, 04:26 AM   #5
lamz138138
Member
 
Location: beijing

Join Date: Mar 2015
Posts: 10
Default

Hi, westerman, thank you for your detailed reply very much, especially the experience of sequence coverage in de novo assembly, which had give me a important guide for my project!

Thanks again!

Best wishes!

XM Zhong
lamz138138 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO