SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Complete Genomics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extracting the V4 region from a complete genome lorealmt General 3 09-17-2013 04:56 AM
Performance comparison of whole-genome sequencing platforms (ILMN vs Complete) ECO Literature Watch 0 12-20-2011 02:31 PM
Make a complete genome sequences from Illumina data nhbach Illumina/Solexa 2 08-30-2011 06:58 PM
Complete Genomics releases its first draft genome. Come get our data! thondeboer The Pipeline 1 02-08-2009 12:06 PM

Reply
 
Thread Tools
Old 09-17-2013, 02:33 AM   #1
applez
Junior Member
 
Location: Malaysia

Join Date: Feb 2013
Posts: 2
Default Complete genome validation

Hi guys,

I have a question regarding how to validate a completed bacterial genome. The sequencing technology used was the Illumina GAIIX, and the annotations were done in CLC bio.

I've recently finished the gap closing, and I've confirmed the alignment using CLCbio and ClustalOmega.

My supervisor insists that I validate the genome, but I have absolutely no clue how to do that. I've completely closed all the gaps (resulting in a final single fasta file output), and there are no longer any ambiguous nucleotides.

is there something I'm missing?

Thanks.
applez is offline   Reply With Quote
Old 09-17-2013, 05:23 AM   #2
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

You might want to ask your supervisor to clarify, but he might mean that you map all of your read data back to your closed/circularized genome and see if you have any possible mapping issues (areas of low/no coverage, areas where paired reads lose their mates, etc.)

Only other option might be to call ORFs and then annotate and see if you're missing any conserved genes that might suggest assembly issues or if you have multiple copies of confirmed single copy genes.

P.S. Your post is in the wrong sub-forum, this is for discussion surrounding the company Complete Genomics, which has been taken over by BGI.
mcnelson.phd is offline   Reply With Quote
Old 09-17-2013, 06:02 AM   #3
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

There are a number of programs which have been published which assess assemblies given the read data; trying them out is on my to-do list so I can't make a specific recommendation

ALE: Assembly Likelihood Evaluator
CGAL: Computing Genome Assembly Likelihoods
QUAST: Quality Assessment Tool for Genome Assemblies
REAPR
(not claiming this is the full list)
Plantagora
LAP
Mauve
AMOSvalidate

The suggestions for Dr. McNelson are good as well; coverage excesses could indicate collapsed direct repeats which cannot be resolved with the sequence technology you used.

You should also consider reading the GAGE, Assemblathon 1 & Assemblathon 2 papers, which evaluated a number of assembly programs and can illustrate some of the errors for which to watch.
krobison is offline   Reply With Quote
Old 09-17-2013, 06:08 AM   #4
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

Quote:
Originally Posted by krobison View Post
The suggestions for Dr. McNelson are good as well; coverage excesses could indicate collapsed direct repeats which cannot be resolved with the sequence technology you used.
Thanks for the compliment, but it's just Dr. Nelson,
mcnelson.phd is offline   Reply With Quote
Old 09-17-2013, 06:13 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

Let me suggest something simple. If there is a genome of a related species (there should some something out there that is close to whatever you have sequenced) available you could compare your "genome" to the those.

Something like "mauve" (http://gel.ahabs.wisc.edu/mauve/) would be a simple start if there is a closely related genus/species available at NCBI http://www.ncbi.nlm.nih.gov/genome/browse/.
GenoMax is offline   Reply With Quote
Old 09-18-2013, 06:02 AM   #6
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Quote:
Originally Posted by mcnelson.phd View Post
Thanks for the compliment, but it's just Dr. Nelson,
Apologies; as someone whose first & last name are often butchered, I'm usually more careful about this.
krobison is offline   Reply With Quote
Old 09-18-2013, 08:22 AM   #7
cliffbeall
Senior Member
 
Location: Ohio

Join Date: Jan 2010
Posts: 144
Default

A non-computational technique that you might want to look at is optical mapping.

I heard a presentation by Opgen and it looked useful.
cliffbeall is offline   Reply With Quote
Old 09-18-2013, 06:45 PM   #8
applez
Junior Member
 
Location: Malaysia

Join Date: Feb 2013
Posts: 2
Default

Thanks guys!
applez is offline   Reply With Quote
Old 11-19-2013, 05:38 PM   #9
Adam Smith
Junior Member
 
Location: UK

Join Date: Oct 2013
Posts: 1
Default

Whole-genome sequencing is becoming commonplace, but the accuracy and completeness of variant calling by the most widely used platforms from Illumina and Complete Genomics have not been reported. Here we sequenced the genome of an individual with both technologies to a high average coverage of ∼76, and compared their performance with respect to sequence coverage and calling of single-nucleotide variants (SNVs), insertions and deletions (indels). Although 88.1% of the ∼3.7 million unique SNVs were concordant between platforms, there were tens of thousands of platform-specific calls located in genes and other genomic regions. In contrast, 26.5% of indels were concordant between platforms. Target enrichment validated 92.7% of the concordant SNVs, whereas validation by genotyping array revealed a sensitivity of 99.3%. The validation experiments also suggested that >60% of the platform-specific variants were indeed present in the genome. Our results have important implications for understanding the accuracy and completeness of the genome sequencing platforms.
Adam Smith is offline   Reply With Quote
Old 11-26-2014, 11:51 AM   #10
thebutcher
Junior Member
 
Location: USA

Join Date: Aug 2014
Posts: 3
Default CGAL for genome assembly comparison

Hi,

I'm quite new to bioinformatics, so please excuse the simplicity of this post.

I've sequenced a fungal genome (<40 Mbp) using the Ion Torrent platform. I created a fragment library, which means that I should now have single end reads (right?)

MIRA seemed like a good choice of assembler so I used that as well as CLC to assemble the reads into contigs, but now I'm stuck. I'd like to compare the qualities of the MIRA and CLC assemblies using CGAL, but I have no idea how to use the program.

I've read the CGAL paper, but I'm not sure where to begin running this program on the cluster at my school and I can't find much info on this program anywhere else. Does anyone have any experience/suggestions as to how I should proceed?

Thanks in advance!
thebutcher is offline   Reply With Quote
Reply

Tags
bacteria, complete, genome, validation

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:51 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO