Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Evaluating overdispersion in CummeRbund gwilymh Bioinformatics 0 11-05-2013 01:39 PM
benchmarking tools for evaluating methods for differential exon usage alittleboy Bioinformatics 2 07-12-2013 09:02 AM

Thread Tools
Old 01-23-2014, 09:29 AM   #1
Location: US

Join Date: Jun 2013
Posts: 96
Default How do I go about evaluating my assembly?

I have made an assembly.

Here are my tasks. I do not know where do go from step 1 and I do not even know how to attempt step 2 and step 3.

1. Align assembly to reference genome.

Grab coordinates of the set of sequences that aligns to the reference genome and grab coordinates of the set of sequences that DO NOT align to the reference genome.

I used to MUMmer for this and got a .coords file

2. Take sequences that did not align and map them against a given plasmids database. Differentiate between nuclear genome and plasmids. Then take what's left over and map against a virulent gene database to see what the virulent genes are.

I was told to use BLAST for this but I have no idea what to do.

If there are still unaligned sequences left over, then I have to use a new reference to align remaining a sequences.

3. Gene annotation, obtain gene locations

4. SNP calling


I have Step 4 down.
prs321 is offline   Reply With Quote
Old 01-23-2014, 03:05 PM   #2
Location: SE MN

Join Date: Oct 2013
Posts: 44

BLAST would be online (through NCBI) for single fastas, or downloading and compiling blast on your end along with the database, and running a search of your contigs against the database.

Curious if something like metaphlan, phytophlan or Kraken against your assemblies (and your raw reads, just to check) would tell you what you have. Of course, "clade-specific markers" and Kmer search is prone to some degree of noise.
ctseto is offline   Reply With Quote
Old 01-23-2014, 03:54 PM   #3
Location: Guangzhou China

Join Date: Aug 2013
Posts: 82

How did you make your assembly(de-novo or reference-guilded) ?
Is this a meta project?
yueluo is offline   Reply With Quote
Old 01-23-2014, 09:02 PM   #4
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

If you have a reference (and it appears that you do), I recommend QUAST; it's quite effective!

Even if you don't have a reference, it still tells you things like the number of predicted genes of size>=X; better assemblies tend to have more longer genes and fewer short genes.

Also, you could try ALE (Assembly Likelihood Evaluator), which does not need a reference and estimates the correctness of an assembly from a sam file, based on statistics of variations, coverage, and insert size:

ALE is not designed to evaluate the quality of a single assembly, but rather, the relative quality of multiple assemblies from the same set of reads. But that's still quite useful when you have several assemblies and need to pick the best one.

EST capture is also a good method when you have EST data.

You can also capture metrics like the percent of source reads that align to the assembly, and the rate of substitutions/insertions/deletions in those reads. The higher the mapping rate, and the lower the error count, the better the assembly is. For this you should use a normal aligner, not mummer.

Last edited by Brian Bushnell; 01-23-2014 at 09:08 PM.
Brian Bushnell is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 04:50 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO