Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Verifying de novo bacterial genome

    I used velvet to assemble a ~4.7M bacterial genome. I got some seemingly nice results - 4.8M of genome covered, 100 contigs >1k, largest contig 3.7M (!!!).

    So now my question is - how do I verify that this is all good?
    Do I do synteny maps with a nearby bacteria?
    Do I finish connecting the genome as best I can (and how?)
    Do I pray?
    Thanks....

  • #2
    Originally posted by Noa View Post
    I used velvet to assemble a ~4.7M bacterial genome. I got some seemingly nice results - 4.8M of genome covered, 100 contigs >1k, largest contig 3.7M (!!!).
    Nice! (even though 3.7 Mbp sounds too good to be true, using velvet - but I don't know what your read lengths/coverage were...)

    So now my question is - how do I verify that this is all good?
    You can't really 'verify' without an independently-obtained assembly of the same organism; realistically you can only increase your level of confidence in this assembly. Happily, that's all anyone is likely to want from you.

    Do I do synteny maps with a nearby bacteria?
    That's sensible. Mauve is good for this. Bacterial genomes are prone to rearrangement though, and it's not true that a breakdown of synteny implies misassembly; you'd want to look for other indicators of rearrangement, and also to inspect the assembly for indicators of poor quality assembly.

    Do I finish connecting the genome as best I can (and how?)
    You could do that, for example, by designing primers to either side of a 'gap' in the assembly, amplifying up from chromosomal DNA, and sequencing the amplification product. You could do the same for questionable assembly regions, too (use Tablet/some other viewer to inspect the assembly for dips/spikes in coverage and other indicators of misassembly). Depending on what you want from your assembly, this could be unnecessary, or too much effort to be worthwhile.

    Other approaches you might consider could include: BLASTing your sequence with the annotated genes of a fully-sequenced, related bacterium, to estimate your recovery of a comparable gene complement; having a quick look at a GC skew plot (window size ≈4kbp) of your Mauve output to see if you have a 'sensible' assembly, in the sense that GC skew usually has a characteristic pattern, either side of the origin of replication (positive on one strand, negative on the other); checking evenness of coverage of your assembled/(re-)mapped reads ('spikes' might indicate collapsed repeats), etc...

    Do I pray?
    This is one of the least likely routes to an improved assembly

    L.

    Comment


    • #3
      Depends on how much work you want to do

      If you have the time and resources for more experiments, you might evaluate your assembly by one or several of

      - paired-end sequencing and looking for indels. There are a number of techniques that look for aberant average distances between the pairs of reads given the expected library size as an indicator of indels. Such an analysis might identify assembly errors (either mis-joined contigs or missing pieces)

      - array CGH to see whether it indicates copy number changes relative to your assembly, which would indicate missing or duplicated segments in your assembly

      If you're limited to computational techniques, then synteny is a good idea. I'd also look for coding regions that show substantial differences (especially truncation) to the nearest species for which an annotated genome exists. Such changes may be real, but would be good candidates for resequencing (potentially Sanger) to confirm. Similarly, changes in copy number of genes would be good to confirm.

      Comment


      • #4
        Another option is to get an optical or restriction map of the physical genome. I used OpGen's service for making optical maps and had good results. I found a number of misassemblies which I corrected and closed a lot of gaps.

        OpGen, along with its subsidiaries, Curetis & Ares Genetics, develops & commercializes molecular microbiology solutions.

        Comment


        • #5
          Bioinformatically, there are another approaches you could try: AmosValidate and hawkeye, including the FRC (feature Response Curve), see these papers: http://bib.oxfordjournals.org/cgi/co...tract/bbr074v1 and http://dx.plos.org/10.1371/journal.pone.0031002. This should allow you to flag potential problematic regions.

          Comment


          • #6
            Originally posted by Noa View Post
            I used velvet to assemble a ~4.7M bacterial genome. I got some seemingly nice results - 4.8M of genome covered, 100 contigs >1k, largest contig 3.7M (!!!).

            So now my question is - how do I verify that this is all good?
            Do I do synteny maps with a nearby bacteria?
            Do I finish connecting the genome as best I can (and how?)
            Do I pray?
            Thanks....
            As Bacteria Genomes pointed out (thank you, B.G.!), Whole Genome Mapping (formerly known as "Optical Mapping") by OpGen could certainly help in improving your assembly and reducing those 100 contigs to a potentially significantly lower number. Full disclaimer: I work for OpGen. Feel free to contact me, and I'd be happy to put you in touch with the right people if you wish to discuss further.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Advancing Precision Medicine for Rare Diseases in Children
              by seqadmin




              Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
              12-16-2024, 07:57 AM
            • seqadmin
              Recent Advances in Sequencing Technologies
              by seqadmin



              Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

              Long-Read Sequencing
              Long-read sequencing has seen remarkable advancements,...
              12-02-2024, 01:49 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 12-17-2024, 10:28 AM
            0 responses
            23 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-13-2024, 08:24 AM
            0 responses
            42 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-12-2024, 07:41 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-11-2024, 07:45 AM
            0 responses
            42 views
            0 likes
            Last Post seqadmin  
            Working...
            X