Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • applez
    Junior Member
    • Feb 2013
    • 2

    Complete genome validation

    Hi guys,

    I have a question regarding how to validate a completed bacterial genome. The sequencing technology used was the Illumina GAIIX, and the annotations were done in CLC bio.

    I've recently finished the gap closing, and I've confirmed the alignment using CLCbio and ClustalOmega.

    My supervisor insists that I validate the genome, but I have absolutely no clue how to do that. I've completely closed all the gaps (resulting in a final single fasta file output), and there are no longer any ambiguous nucleotides.

    is there something I'm missing?

    Thanks.
  • mcnelson.phd
    Senior Member
    • Jul 2011
    • 162

    #2
    You might want to ask your supervisor to clarify, but he might mean that you map all of your read data back to your closed/circularized genome and see if you have any possible mapping issues (areas of low/no coverage, areas where paired reads lose their mates, etc.)

    Only other option might be to call ORFs and then annotate and see if you're missing any conserved genes that might suggest assembly issues or if you have multiple copies of confirmed single copy genes.

    P.S. Your post is in the wrong sub-forum, this is for discussion surrounding the company Complete Genomics, which has been taken over by BGI.

    Comment

    • krobison
      Senior Member
      • Nov 2007
      • 734

      #3
      There are a number of programs which have been published which assess assemblies given the read data; trying them out is on my to-do list so I can't make a specific recommendation

      ALE: Assembly Likelihood Evaluator
      CGAL: Computing Genome Assembly Likelihoods
      QUAST: Quality Assessment Tool for Genome Assemblies
      REAPR
      (not claiming this is the full list)
      Plantagora
      LAP
      Mauve
      AMOSvalidate

      The suggestions for Dr. McNelson are good as well; coverage excesses could indicate collapsed direct repeats which cannot be resolved with the sequence technology you used.

      You should also consider reading the GAGE, Assemblathon 1 & Assemblathon 2 papers, which evaluated a number of assembly programs and can illustrate some of the errors for which to watch.

      Comment

      • mcnelson.phd
        Senior Member
        • Jul 2011
        • 162

        #4
        Originally posted by krobison View Post
        The suggestions for Dr. McNelson are good as well; coverage excesses could indicate collapsed direct repeats which cannot be resolved with the sequence technology you used.
        Thanks for the compliment, but it's just Dr. Nelson,

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #5
          Let me suggest something simple. If there is a genome of a related species (there should some something out there that is close to whatever you have sequenced) available you could compare your "genome" to the those.

          Something like "mauve" (http://gel.ahabs.wisc.edu/mauve/) would be a simple start if there is a closely related genus/species available at NCBI http://www.ncbi.nlm.nih.gov/genome/browse/.

          Comment

          • krobison
            Senior Member
            • Nov 2007
            • 734

            #6
            Originally posted by mcnelson.phd View Post
            Thanks for the compliment, but it's just Dr. Nelson,
            Apologies; as someone whose first & last name are often butchered, I'm usually more careful about this.

            Comment

            • cliffbeall
              Senior Member
              • Jan 2010
              • 144

              #7
              A non-computational technique that you might want to look at is optical mapping.

              I heard a presentation by Opgen and it looked useful.

              Comment

              • applez
                Junior Member
                • Feb 2013
                • 2

                #8
                Thanks guys!

                Comment

                • Adam Smith
                  Junior Member
                  • Oct 2013
                  • 1

                  #9
                  Whole-genome sequencing is becoming commonplace, but the accuracy and completeness of variant calling by the most widely used platforms from Illumina and Complete Genomics have not been reported. Here we sequenced the genome of an individual with both technologies to a high average coverage of ∼76×, and compared their performance with respect to sequence coverage and calling of single-nucleotide variants (SNVs), insertions and deletions (indels). Although 88.1% of the ∼3.7 million unique SNVs were concordant between platforms, there were tens of thousands of platform-specific calls located in genes and other genomic regions. In contrast, 26.5% of indels were concordant between platforms. Target enrichment validated 92.7% of the concordant SNVs, whereas validation by genotyping array revealed a sensitivity of 99.3%. The validation experiments also suggested that >60% of the platform-specific variants were indeed present in the genome. Our results have important implications for understanding the accuracy and completeness of the genome sequencing platforms.
                  motorhome hire
                  long term motorhome hire
                  motorhome hire birmingham airport

                  Comment

                  • thebutcher
                    Junior Member
                    • Aug 2014
                    • 3

                    #10
                    CGAL for genome assembly comparison

                    Hi,

                    I'm quite new to bioinformatics, so please excuse the simplicity of this post.

                    I've sequenced a fungal genome (<40 Mbp) using the Ion Torrent platform. I created a fragment library, which means that I should now have single end reads (right?)

                    MIRA seemed like a good choice of assembler so I used that as well as CLC to assemble the reads into contigs, but now I'm stuck. I'd like to compare the qualities of the MIRA and CLC assemblies using CGAL, but I have no idea how to use the program.

                    I've read the CGAL paper, but I'm not sure where to begin running this program on the cluster at my school and I can't find much info on this program anywhere else. Does anyone have any experience/suggestions as to how I should proceed?

                    Thanks in advance!

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM
                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    41 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    102 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    123 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-04-2026, 08:59 AM
                    0 responses
                    114 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...