Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mwatson
    Member
    • Aug 2010
    • 13

    Combine de novo and reference assembly

    Hi

    I'd be interested in anyone who can tell me about software that will/can combine de novo assembled contigs with a reference assembly.

    What I have are bacterial genomes and between 36 and 72bp reads.

    When I align to the reference, large parts of the genome align perfectly, but then I find gaps; If I do a de novo assembly, I can see that some of the contigs span the gaps, but I am doing this by eye using MUMmer, IGV and a few other bits and bobs.

    It seems to me obvious that someone would have written this, but I can't find anything....

    Mick
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    What assembler are you using for your de novo and reference guided assemblies?

    Have you tried MIRA3?

    Comment

    • mwatson
      Member
      • Aug 2010
      • 13

      #3
      OK, so I use either SOAPdenovo or Velvet for the de novo stuff, and I currently use Novoalign for the reference assembly.

      The problem I am trying to solve is that from the reference assembly, I have a SAM/BAM file that I can view in IGV, and if I find a gap, I want to know why there is a gap - and many times, one of the de novo contigs will span the gap, indicating that the gap in the reference assembly is due to a real gap in the genome, not just an alignment issue.

      Comment

      • Geneious
        Registered Vendor
        • Jul 2010
        • 22

        #4
        You can also try the 14-day free trial of Geneious Pro.

        Comment

        • maubp
          Peter (Biopython etc)
          • Jul 2009
          • 1544

          #5
          Originally posted by mwatson View Post
          OK, so I use either SOAPdenovo or Velvet for the de novo stuff, and I currently use Novoalign for the reference assembly.

          The problem I am trying to solve is that from the reference assembly, I have a SAM/BAM file that I can view in IGV, and if I find a gap, I want to know why there is a gap - and many times, one of the de novo contigs will span the gap, indicating that the gap in the reference assembly is due to a real gap in the genome, not just an alignment issue.
          I don't quite understand what you are describing. Are you saying you have "gaps" meaning areas where nothing maps to the reference, yet there is a de novo contig that could be used to span this "gap"? If so, to me it sounds like a region of divergence between the reference genome and your new genome (not a missing section in the new genome). Perhaps you can try adjusting the setting of your guided assembler? Or try including selected de novo contigs in the guided assembly as well?

          Comment

          • tplsmith
            Junior Member
            • Aug 2008
            • 7

            #6
            MIRA-based pipeline to view scaffolds

            Not sure what you really want is to "combine" a reference and your de novo stuff. Researchers that I have been assisting in the same kinds of studies have had good success using the MIRA assembler mentioned earlier, then using bambus to generate a .bnk file. Viewing the results using hawkeye provides a lot of information like the kind you appear to be looking for, especially useful when you have a lot of paired end information and want to see how the scaffolds fit together and where problems may lie. Definitely this is useful to see if a piece in the reference that is "missing" in your data is due to assembly quality issues because you can get an idea of read depth at the ends of the contigs. You can then compare the pertinent scaffolds of the assembly to a reference in a variety of ways, a good one that you can easily edit is in the Geneious package advertised in one of the replies especially because in that single package you can call a variety of aligners (Geneious, ClustalW2, MUSCLE) to see how it affects the results. Instructions for the MIRA/bambus/hawkeye pipeline can be obtained at the MIRA website, some were written here by my colleague who's contact information is on the site also. This should help you decide if your strain has sequence not found in the reference or vice versa. There is also a MIRA discussion group that you can address specific questions to if you have problems.

            Comment

            • Adjuvant
              Member
              • Sep 2010
              • 13

              #7
              Apparently mwatson and I are interested in the same things.

              I'm also doing bacterial sequencing. I used novoalign to align my reads to several reference sequences, extracted the unaligned reads and performed velvet assembly on those. Blasting the resulting contigs shows quite a few that have sequence correseponding to my reference sequences at the ends of the contigs, but novel sequence in the middle. So in an effort to combine my alignments and my de novo assemblies, I did a pileup of my novoalign alignments, dumped the consensus to fastq, then separated the quality data to yield several consensus fasta sequences (corresponding to each of the reference genomes).

              Here's where I get stuck: the pileup fills gaps in the alignment with N's. When I look at my alignment in Tablet, however, I can see that not all gaps are equal. Many are clearly spanned by a lot of paired end reads, whereas others have no spanning pairs and so might be the sites where some of my de novo assembled contigs might fit. They would also be sites where I'd first like to start designing outward directed primers for Sanger sequencing.

              My question is: Is there a way to separate my alignment consensus sequences into contigs separated by these unspanned gaps? The way I'm doing it now is scanning through my alignment in Tablet looking for such gaps, then looking for those gaps in my consensus sequence and manually deleting the N's. It seems like there should be a better way.

              Thanks.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                Yesterday, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM
              • SEQadmin2
                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                by SEQadmin2

                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                05-06-2026, 09:04 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 12:03 PM
              0 responses
              17 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, Yesterday, 11:40 AM
              0 responses
              13 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-28-2026, 11:40 AM
              0 responses
              29 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-26-2026, 10:12 AM
              0 responses
              31 views
              0 reactions
              Last Post SEQadmin2  
              Working...