SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Inquiry: minimum length of reads for referece-based assembly or de novo assembly sunfuhui Bioinformatics 1 10-04-2013 09:28 AM
Scaffolding Translational mapping (STM) - velvet assembly summary? Canadian_philosophy Bioinformatics 2 05-12-2011 12:24 PM
de novo assembly vs. reference assembly fadista General 3 02-15-2011 11:11 PM
Scaffolding output from a velvet assembly of SOLiD data xavier De novo discovery 6 02-24-2010 03:00 AM

Reply
 
Thread Tools
Old 03-12-2013, 09:23 AM   #1
NGS_New_User
Member
 
Location: USA

Join Date: Sep 2012
Posts: 41
Question De novo assembly (scaffolding)

I am performing de novo assembly on a non-model organism. I am using the clc denovo assembly tool. In the assembly options, there is an option of performing de novo including and excluding scaffolding. I would like to know what scaffolding means? I am still a newbie at this, and I have been searching for the definition of scaffolding but I keep getting various definitions. Could someone please help me out with this, or direct me to a post that has discussed scaffolding?

Another question I have is, what constitutes a good de novo assembly, especially of an organism with a genome size of ~500Mb? I have read some papers that say fewer count of contigs, large N50... what else should I look for? In short, could someone please give me a few tips on what steps I should take to have a good assembly and also maybe a few tips on what to look for so as to choose an optimal assembly (even if its directing to me a specific scientific paper, I will be extremely grateful!).
NGS_New_User is offline   Reply With Quote
Old 03-12-2013, 09:34 PM   #2
kbradnam
Member
 
Location: Davis, CA

Join Date: May 2011
Posts: 53
Default

Overlapping reads can be merged to produce a single contiguous sequence which are called contigs. The first part of assembly is used to make contigs from reads. The next step is to see if some of these contigs can be joined together. This process is called scaffolding.

E.g. contig A and B might not overlap, but there may be a mate pair read that maps to contig A and the other read of the mate pair maps to contig B. Thus you can make one scaffold that consists of two contigs. In this case, the scaffold sequence would be padded with N characters to reflect the length of the unknown region between the two contigs. In some extreme cases a very large assembly of scaffolds, might contain a lot of Ns.

As for your second question, I suggest checking out the Assemblathon 2 pre-print that is currently on available on arxiv.org.

Regards,

Keith
kbradnam is offline   Reply With Quote
Old 03-13-2013, 08:16 AM   #3
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Check out CLC's white paper where their algorithm for assembly and scaffolding is explained;

http://www.clcbio.com/wp-content/upl...assembly-4.pdf

For quality checking, one way is to map your paired-reads back to your assembly (there is an option for this in CLC Workbench). The higher number of paired-reads that could be mapped back to your contigs/scaffolds, the better the assembly. In addition, the N50 and number of contigs is also a way to check the quality. Although this does not tell you if there is any misassembly.

Regards,
Boetsie
boetsie is offline   Reply With Quote
Old 03-13-2013, 07:17 PM   #4
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default

I'd also suggest that N50 and contig number is not the only metric for how "good" an assembly is, nor even a particularly good one. There are alternatives (e.g. http://genomebiology.com/2013/14/1/R8/abstract)
If you have any smaller known sequences that you absolutely know should be in your genome (e.g. genes from earlier studies), you should see if they are in there and correctly assembled.
danwiththeplan is offline   Reply With Quote
Old 03-18-2013, 10:39 AM   #5
NGS_New_User
Member
 
Location: USA

Join Date: Sep 2012
Posts: 41
Default

Thank you for the responses, and the pointers
NGS_New_User is offline   Reply With Quote
Reply

Tags
contigs comparison, de novo assembly, scaffolding

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:35 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO