Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Assembly of nextera mate pair libraries agseq Bioinformatics 1 03-18-2014 03:39 AM
mate pair insert size variation and de novo assembly Mark Introductions 2 10-13-2012 01:48 AM
Mate pair mapping to improve assembly jmcock Illumina/Solexa 0 07-12-2012 09:16 AM
Assembly (velvet) of mate-pair data from Illumina Protaeus Bioinformatics 2 03-26-2012 06:13 AM
bwa for mate pair reads talk24 Bioinformatics 1 03-29-2010 08:37 PM

Thread Tools
Old 09-15-2014, 07:04 PM   #1
Location: Midwest

Join Date: Mar 2009
Posts: 30
Default genome assembly with only mate pair reads


I am mostly comfortable with DNA resequencing, mRNAseq, ChIPseq, etc. data. And always feel difficult handling de novo assembly works. But it comes my way anyway.

I have a set of data that are mate pair sequencing of a ~1GB genome. It is close to 30x coverage after linker being removed. the insert size is about 8Kb. I don't feel it is a good idea to use mate pair only (I'd rather to have various sized libraries). Without evidence, I feel a single mate pair library sequence is worse than paired end at the same depth. Let me know if I am wrong.

Now, I am asked to get best out of this data. Without diving in too deep (spend too much time), what the best (practical) case scenario and the worst case scenario I should prepare the collaborator for?

I have access to a 512GB 32 core machine, and have velvet, soap denovo, and spades to use. Also a CLC bio license that can be moved to that computer. What is the recommended methods, programs, and parameters to use?

Very much appreciate your thoughts and suggestions!

By the way, I did recommend them to (at least) sequence another 50x in 2x100~150. But I don't think it is going to fly.

liux is offline   Reply With Quote
Old 09-15-2014, 08:10 PM   #2
Junior Member
Location: Nanning, Guangxi, China

Join Date: Sep 2014
Posts: 1

Hello, I'm a newcomer.
HandsonneQin is offline   Reply With Quote
Old 09-15-2014, 09:41 PM   #3
Location: Australia

Join Date: Aug 2010
Posts: 54

You need to consider several things.
Is it a plant or animal genome? Do you have a reference?
How complex is the genome i.e ploidy etc?
I don't think mate pair alone can do much. Also you just have one mate pair library.
A starting point would be to sequence several paired end libraries with varying insert sizes e.g. 180bp, 300bp, 600bp etc. for the contig level assembly and later coupled them with several mate pair libraries e.g. 2kb, 5kb, 8kb etc. for scaffolding. Longer reads e.g. PacBio may also help you to resolve large repetitive regions.
You need to carefully plan each stage of your project: sequencing, quality control and error correction of reads, preliminary contig assembly, scaffolding and gap closing. And of course there is no single best assembler/pipeline for all assembly problem. You need to evaluate multiple assemblers to find the one that gives you best assembly.
fahmida is offline   Reply With Quote
Old 09-16-2014, 09:06 AM   #4
Location: Midwest

Join Date: Mar 2009
Posts: 30

Thanks for the reply.

These are exactly what I thought, and recommended to the researcher. Unfortunately I have no control over how the sequencing was designed. But I can refuse to performed the analysis without adequate data :-)
liux is offline   Reply With Quote
Old 09-16-2014, 09:37 AM   #5
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

It sounds like a waste of your time. You'll end up with a bad assembly that they probably won't like.
Brian Bushnell is offline   Reply With Quote

de novo, mate pair, ngs, velvet

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 09:17 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO