Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Verifying de novo bacterial genome

    I used velvet to assemble a ~4.7M bacterial genome. I got some seemingly nice results - 4.8M of genome covered, 100 contigs >1k, largest contig 3.7M (!!!).

    So now my question is - how do I verify that this is all good?
    Do I do synteny maps with a nearby bacteria?
    Do I finish connecting the genome as best I can (and how?)
    Do I pray?
    Thanks....

  • #2
    Originally posted by Noa View Post
    I used velvet to assemble a ~4.7M bacterial genome. I got some seemingly nice results - 4.8M of genome covered, 100 contigs >1k, largest contig 3.7M (!!!).
    Nice! (even though 3.7 Mbp sounds too good to be true, using velvet - but I don't know what your read lengths/coverage were...)

    So now my question is - how do I verify that this is all good?
    You can't really 'verify' without an independently-obtained assembly of the same organism; realistically you can only increase your level of confidence in this assembly. Happily, that's all anyone is likely to want from you.

    Do I do synteny maps with a nearby bacteria?
    That's sensible. Mauve is good for this. Bacterial genomes are prone to rearrangement though, and it's not true that a breakdown of synteny implies misassembly; you'd want to look for other indicators of rearrangement, and also to inspect the assembly for indicators of poor quality assembly.

    Do I finish connecting the genome as best I can (and how?)
    You could do that, for example, by designing primers to either side of a 'gap' in the assembly, amplifying up from chromosomal DNA, and sequencing the amplification product. You could do the same for questionable assembly regions, too (use Tablet/some other viewer to inspect the assembly for dips/spikes in coverage and other indicators of misassembly). Depending on what you want from your assembly, this could be unnecessary, or too much effort to be worthwhile.

    Other approaches you might consider could include: BLASTing your sequence with the annotated genes of a fully-sequenced, related bacterium, to estimate your recovery of a comparable gene complement; having a quick look at a GC skew plot (window size ≈4kbp) of your Mauve output to see if you have a 'sensible' assembly, in the sense that GC skew usually has a characteristic pattern, either side of the origin of replication (positive on one strand, negative on the other); checking evenness of coverage of your assembled/(re-)mapped reads ('spikes' might indicate collapsed repeats), etc...

    Do I pray?
    This is one of the least likely routes to an improved assembly

    L.

    Comment


    • #3
      Depends on how much work you want to do

      If you have the time and resources for more experiments, you might evaluate your assembly by one or several of

      - paired-end sequencing and looking for indels. There are a number of techniques that look for aberant average distances between the pairs of reads given the expected library size as an indicator of indels. Such an analysis might identify assembly errors (either mis-joined contigs or missing pieces)

      - array CGH to see whether it indicates copy number changes relative to your assembly, which would indicate missing or duplicated segments in your assembly

      If you're limited to computational techniques, then synteny is a good idea. I'd also look for coding regions that show substantial differences (especially truncation) to the nearest species for which an annotated genome exists. Such changes may be real, but would be good candidates for resequencing (potentially Sanger) to confirm. Similarly, changes in copy number of genes would be good to confirm.

      Comment


      • #4
        Another option is to get an optical or restriction map of the physical genome. I used OpGen's service for making optical maps and had good results. I found a number of misassemblies which I corrected and closed a lot of gaps.

        OpGen, along with its subsidiaries, Curetis & Ares Genetics, develops & commercializes molecular microbiology solutions.

        Comment


        • #5
          Bioinformatically, there are another approaches you could try: AmosValidate and hawkeye, including the FRC (feature Response Curve), see these papers: http://bib.oxfordjournals.org/cgi/co...tract/bbr074v1 and http://dx.plos.org/10.1371/journal.pone.0031002. This should allow you to flag potential problematic regions.

          Comment


          • #6
            Originally posted by Noa View Post
            I used velvet to assemble a ~4.7M bacterial genome. I got some seemingly nice results - 4.8M of genome covered, 100 contigs >1k, largest contig 3.7M (!!!).

            So now my question is - how do I verify that this is all good?
            Do I do synteny maps with a nearby bacteria?
            Do I finish connecting the genome as best I can (and how?)
            Do I pray?
            Thanks....
            As Bacteria Genomes pointed out (thank you, B.G.!), Whole Genome Mapping (formerly known as "Optical Mapping") by OpGen could certainly help in improving your assembly and reducing those 100 contigs to a potentially significantly lower number. Full disclaimer: I work for OpGen. Feel free to contact me, and I'd be happy to put you in touch with the right people if you wish to discuss further.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Working...
            X