Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combine de novo and reference assembly

    Hi

    I'd be interested in anyone who can tell me about software that will/can combine de novo assembled contigs with a reference assembly.

    What I have are bacterial genomes and between 36 and 72bp reads.

    When I align to the reference, large parts of the genome align perfectly, but then I find gaps; If I do a de novo assembly, I can see that some of the contigs span the gaps, but I am doing this by eye using MUMmer, IGV and a few other bits and bobs.

    It seems to me obvious that someone would have written this, but I can't find anything....

    Mick

  • #2
    What assembler are you using for your de novo and reference guided assemblies?

    Have you tried MIRA3?

    Comment


    • #3
      OK, so I use either SOAPdenovo or Velvet for the de novo stuff, and I currently use Novoalign for the reference assembly.

      The problem I am trying to solve is that from the reference assembly, I have a SAM/BAM file that I can view in IGV, and if I find a gap, I want to know why there is a gap - and many times, one of the de novo contigs will span the gap, indicating that the gap in the reference assembly is due to a real gap in the genome, not just an alignment issue.

      Comment


      • #4
        You can also try the 14-day free trial of Geneious Pro.
        Download the latest version of Geneious Prime

        Comment


        • #5
          Originally posted by mwatson View Post
          OK, so I use either SOAPdenovo or Velvet for the de novo stuff, and I currently use Novoalign for the reference assembly.

          The problem I am trying to solve is that from the reference assembly, I have a SAM/BAM file that I can view in IGV, and if I find a gap, I want to know why there is a gap - and many times, one of the de novo contigs will span the gap, indicating that the gap in the reference assembly is due to a real gap in the genome, not just an alignment issue.
          I don't quite understand what you are describing. Are you saying you have "gaps" meaning areas where nothing maps to the reference, yet there is a de novo contig that could be used to span this "gap"? If so, to me it sounds like a region of divergence between the reference genome and your new genome (not a missing section in the new genome). Perhaps you can try adjusting the setting of your guided assembler? Or try including selected de novo contigs in the guided assembly as well?

          Comment


          • #6
            MIRA-based pipeline to view scaffolds

            Not sure what you really want is to "combine" a reference and your de novo stuff. Researchers that I have been assisting in the same kinds of studies have had good success using the MIRA assembler mentioned earlier, then using bambus to generate a .bnk file. Viewing the results using hawkeye provides a lot of information like the kind you appear to be looking for, especially useful when you have a lot of paired end information and want to see how the scaffolds fit together and where problems may lie. Definitely this is useful to see if a piece in the reference that is "missing" in your data is due to assembly quality issues because you can get an idea of read depth at the ends of the contigs. You can then compare the pertinent scaffolds of the assembly to a reference in a variety of ways, a good one that you can easily edit is in the Geneious package advertised in one of the replies especially because in that single package you can call a variety of aligners (Geneious, ClustalW2, MUSCLE) to see how it affects the results. Instructions for the MIRA/bambus/hawkeye pipeline can be obtained at the MIRA website, some were written here by my colleague who's contact information is on the site also. This should help you decide if your strain has sequence not found in the reference or vice versa. There is also a MIRA discussion group that you can address specific questions to if you have problems.

            Comment


            • #7
              Apparently mwatson and I are interested in the same things.

              I'm also doing bacterial sequencing. I used novoalign to align my reads to several reference sequences, extracted the unaligned reads and performed velvet assembly on those. Blasting the resulting contigs shows quite a few that have sequence correseponding to my reference sequences at the ends of the contigs, but novel sequence in the middle. So in an effort to combine my alignments and my de novo assemblies, I did a pileup of my novoalign alignments, dumped the consensus to fastq, then separated the quality data to yield several consensus fasta sequences (corresponding to each of the reference genomes).

              Here's where I get stuck: the pileup fills gaps in the alignment with N's. When I look at my alignment in Tablet, however, I can see that not all gaps are equal. Many are clearly spanned by a lot of paired end reads, whereas others have no spanning pairs and so might be the sites where some of my de novo assembled contigs might fit. They would also be sites where I'd first like to start designing outward directed primers for Sanger sequencing.

              My question is: Is there a way to separate my alignment consensus sequences into contigs separated by these unspanned gaps? The way I'm doing it now is scanning through my alignment in Tablet looking for such gaps, then looking for those gaps in my consensus sequence and manually deleting the N's. It seems like there should be a better way.

              Thanks.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              50 views
              0 likes
              Last Post seqadmin  
              Working...
              X