Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Noa
    Member
    • Jun 2011
    • 62

    Verifying de novo bacterial genome

    I used velvet to assemble a ~4.7M bacterial genome. I got some seemingly nice results - 4.8M of genome covered, 100 contigs >1k, largest contig 3.7M (!!!).

    So now my question is - how do I verify that this is all good?
    Do I do synteny maps with a nearby bacteria?
    Do I finish connecting the genome as best I can (and how?)
    Do I pray?
    Thanks....
  • LeightonP
    Member
    • Feb 2011
    • 29

    #2
    Originally posted by Noa View Post
    I used velvet to assemble a ~4.7M bacterial genome. I got some seemingly nice results - 4.8M of genome covered, 100 contigs >1k, largest contig 3.7M (!!!).
    Nice! (even though 3.7 Mbp sounds too good to be true, using velvet - but I don't know what your read lengths/coverage were...)

    So now my question is - how do I verify that this is all good?
    You can't really 'verify' without an independently-obtained assembly of the same organism; realistically you can only increase your level of confidence in this assembly. Happily, that's all anyone is likely to want from you.

    Do I do synteny maps with a nearby bacteria?
    That's sensible. Mauve is good for this. Bacterial genomes are prone to rearrangement though, and it's not true that a breakdown of synteny implies misassembly; you'd want to look for other indicators of rearrangement, and also to inspect the assembly for indicators of poor quality assembly.

    Do I finish connecting the genome as best I can (and how?)
    You could do that, for example, by designing primers to either side of a 'gap' in the assembly, amplifying up from chromosomal DNA, and sequencing the amplification product. You could do the same for questionable assembly regions, too (use Tablet/some other viewer to inspect the assembly for dips/spikes in coverage and other indicators of misassembly). Depending on what you want from your assembly, this could be unnecessary, or too much effort to be worthwhile.

    Other approaches you might consider could include: BLASTing your sequence with the annotated genes of a fully-sequenced, related bacterium, to estimate your recovery of a comparable gene complement; having a quick look at a GC skew plot (window size ≈4kbp) of your Mauve output to see if you have a 'sensible' assembly, in the sense that GC skew usually has a characteristic pattern, either side of the origin of replication (positive on one strand, negative on the other); checking evenness of coverage of your assembled/(re-)mapped reads ('spikes' might indicate collapsed repeats), etc...

    Do I pray?
    This is one of the least likely routes to an improved assembly

    L.

    Comment

    • arolfe
      Member
      • Jul 2011
      • 29

      #3
      Depends on how much work you want to do

      If you have the time and resources for more experiments, you might evaluate your assembly by one or several of

      - paired-end sequencing and looking for indels. There are a number of techniques that look for aberant average distances between the pairs of reads given the expected library size as an indicator of indels. Such an analysis might identify assembly errors (either mis-joined contigs or missing pieces)

      - array CGH to see whether it indicates copy number changes relative to your assembly, which would indicate missing or duplicated segments in your assembly

      If you're limited to computational techniques, then synteny is a good idea. I'd also look for coding regions that show substantial differences (especially truncation) to the nearest species for which an annotated genome exists. Such changes may be real, but would be good candidates for resequencing (potentially Sanger) to confirm. Similarly, changes in copy number of genes would be good to confirm.

      Comment

      • Bacteria Genomes
        Junior Member
        • Jul 2009
        • 8

        #4
        Another option is to get an optical or restriction map of the physical genome. I used OpGen's service for making optical maps and had good results. I found a number of misassemblies which I corrected and closed a lot of gaps.

        OpGen, along with its subsidiaries, Curetis & Ares Genetics, develops & commercializes molecular microbiology solutions.

        Comment

        • flxlex
          Moderator
          • Nov 2008
          • 412

          #5
          Bioinformatically, there are another approaches you could try: AmosValidate and hawkeye, including the FRC (feature Response Curve), see these papers: http://bib.oxfordjournals.org/cgi/co...tract/bbr074v1 and http://dx.plos.org/10.1371/journal.pone.0031002. This should allow you to flag potential problematic regions.

          Comment

          • wgm
            Junior Member
            • May 2012
            • 1

            #6
            Originally posted by Noa View Post
            I used velvet to assemble a ~4.7M bacterial genome. I got some seemingly nice results - 4.8M of genome covered, 100 contigs >1k, largest contig 3.7M (!!!).

            So now my question is - how do I verify that this is all good?
            Do I do synteny maps with a nearby bacteria?
            Do I finish connecting the genome as best I can (and how?)
            Do I pray?
            Thanks....
            As Bacteria Genomes pointed out (thank you, B.G.!), Whole Genome Mapping (formerly known as "Optical Mapping") by OpGen could certainly help in improving your assembly and reducing those 100 contigs to a potentially significantly lower number. Full disclaimer: I work for OpGen. Feel free to contact me, and I'd be happy to put you in touch with the right people if you wish to discuss further.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Pathogen Surveillance with Advanced Genomic Tools
              by seqadmin




              The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
              03-24-2025, 11:48 AM
            • seqadmin
              New Genomics Tools and Methods Shared at AGBT 2025
              by seqadmin


              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

              The Headliner
              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
              03-03-2025, 01:39 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-20-2025, 05:03 AM
            0 responses
            49 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-19-2025, 07:27 AM
            0 responses
            57 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-18-2025, 12:50 PM
            0 responses
            50 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-03-2025, 01:15 PM
            0 responses
            201 views
            0 reactions
            Last Post seqadmin  
            Working...