Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Complete genome validation

    Hi guys,

    I have a question regarding how to validate a completed bacterial genome. The sequencing technology used was the Illumina GAIIX, and the annotations were done in CLC bio.

    I've recently finished the gap closing, and I've confirmed the alignment using CLCbio and ClustalOmega.

    My supervisor insists that I validate the genome, but I have absolutely no clue how to do that. I've completely closed all the gaps (resulting in a final single fasta file output), and there are no longer any ambiguous nucleotides.

    is there something I'm missing?

    Thanks.

  • #2
    You might want to ask your supervisor to clarify, but he might mean that you map all of your read data back to your closed/circularized genome and see if you have any possible mapping issues (areas of low/no coverage, areas where paired reads lose their mates, etc.)

    Only other option might be to call ORFs and then annotate and see if you're missing any conserved genes that might suggest assembly issues or if you have multiple copies of confirmed single copy genes.

    P.S. Your post is in the wrong sub-forum, this is for discussion surrounding the company Complete Genomics, which has been taken over by BGI.

    Comment


    • #3
      There are a number of programs which have been published which assess assemblies given the read data; trying them out is on my to-do list so I can't make a specific recommendation

      ALE: Assembly Likelihood Evaluator
      CGAL: Computing Genome Assembly Likelihoods
      QUAST: Quality Assessment Tool for Genome Assemblies
      REAPR
      (not claiming this is the full list)
      Plantagora
      LAP
      Mauve
      AMOSvalidate

      The suggestions for Dr. McNelson are good as well; coverage excesses could indicate collapsed direct repeats which cannot be resolved with the sequence technology you used.

      You should also consider reading the GAGE, Assemblathon 1 & Assemblathon 2 papers, which evaluated a number of assembly programs and can illustrate some of the errors for which to watch.

      Comment


      • #4
        Originally posted by krobison View Post
        The suggestions for Dr. McNelson are good as well; coverage excesses could indicate collapsed direct repeats which cannot be resolved with the sequence technology you used.
        Thanks for the compliment, but it's just Dr. Nelson,

        Comment


        • #5
          Let me suggest something simple. If there is a genome of a related species (there should some something out there that is close to whatever you have sequenced) available you could compare your "genome" to the those.

          Something like "mauve" (http://gel.ahabs.wisc.edu/mauve/) would be a simple start if there is a closely related genus/species available at NCBI http://www.ncbi.nlm.nih.gov/genome/browse/.

          Comment


          • #6
            Originally posted by mcnelson.phd View Post
            Thanks for the compliment, but it's just Dr. Nelson,
            Apologies; as someone whose first & last name are often butchered, I'm usually more careful about this.

            Comment


            • #7
              A non-computational technique that you might want to look at is optical mapping.

              I heard a presentation by Opgen and it looked useful.

              Comment


              • #8
                Thanks guys!

                Comment


                • #9
                  Whole-genome sequencing is becoming commonplace, but the accuracy and completeness of variant calling by the most widely used platforms from Illumina and Complete Genomics have not been reported. Here we sequenced the genome of an individual with both technologies to a high average coverage of ∼76×, and compared their performance with respect to sequence coverage and calling of single-nucleotide variants (SNVs), insertions and deletions (indels). Although 88.1% of the ∼3.7 million unique SNVs were concordant between platforms, there were tens of thousands of platform-specific calls located in genes and other genomic regions. In contrast, 26.5% of indels were concordant between platforms. Target enrichment validated 92.7% of the concordant SNVs, whereas validation by genotyping array revealed a sensitivity of 99.3%. The validation experiments also suggested that >60% of the platform-specific variants were indeed present in the genome. Our results have important implications for understanding the accuracy and completeness of the genome sequencing platforms.
                  motorhome hire
                  long term motorhome hire
                  motorhome hire birmingham airport

                  Comment


                  • #10
                    CGAL for genome assembly comparison

                    Hi,

                    I'm quite new to bioinformatics, so please excuse the simplicity of this post.

                    I've sequenced a fungal genome (<40 Mbp) using the Ion Torrent platform. I created a fragment library, which means that I should now have single end reads (right?)

                    MIRA seemed like a good choice of assembler so I used that as well as CLC to assemble the reads into contigs, but now I'm stuck. I'd like to compare the qualities of the MIRA and CLC assemblies using CGAL, but I have no idea how to use the program.

                    I've read the CGAL paper, but I'm not sure where to begin running this program on the cluster at my school and I can't find much info on this program anywhere else. Does anyone have any experience/suggestions as to how I should proceed?

                    Thanks in advance!

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    9 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    50 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X